2014/05/29: Computing the standard deviation

I recently had to compute the standard deviation of a large number of lists of numbers that all where quite similar. In fact, they each deviated from given list in only two positions. Of course, in the very naive approach of first computing the mean and then summing up the squared differences there is not much to gain from the similarity.

Benefiting from the similarity is trivial, if computation is done the standard way, of first computing the count n, the sum s and the sum of squares ss and then obtaining the standard deviation as sqrt(n * ss - s * s) / n (multiply this by sqrt(n/(n-1)) to get the unbiased estimator).

However, it turned out, that, probably due to the small difference of the comparably large numbers n * ss and s * s, the numerical computation using this approach deviated too much (more than 1e-12 relative difference) from the value computed numerically using the naive method which was considered the gold-standard (and had to stay so in the stable branch).

The good news is that knowledge that n * ss - s * s is n times the variance also allows to compute the difference of variances. Moreover, that computation does not involve numbers as large as the sum of squares. So, taking count, sum, and naive variance as statistics was the approach I eventually took to keep track of the standard deviation.