eli5: In statistics, why do we square the deviation before dividing by n to get the variance instead of just using absolute values?


I’m assuming there’s a good reason this doesn’t work but I’ve never had it explained to me why.

In: 4

Absolute values are a *very* messy thing to use in math…they don’t play particularly nice with a lot of other algebraic manipulations, but squaring does and achieves basically the same result in this application so it’s a much tidier and elegant way of doing it that supports a lot more downstream manipulations.

More importantly, squaring increases the influence of extreme values. Since the whole point of variance is to measure the “extremeness” of the distribution, this is a good property. 10 values that are 1 away from the mean have the same absolute value sum as 1 value that’s 10 away from the mean, but if you square the differences and add them it’s 10 vs. 100. The latter is better at capturing the “width” of a distribution, which is the whole point of the exercise.

Both are used in statistics depending on what you’re doing. There isn’t much real reason other than that squaring makes the math easier. Another equally valid question is why do we say x^2 instead of |x^1.9999 | or |x^2.0001 |.

In fact, the use of other exponents is ubiquitous in math. Just look up the L-p norms. The 2 seems somewhat arbitrary, except that euclidean distances use the 2 as an exponent, so maybe there’s some physical intuition there. But for most statistical questions, the 2 is pretty arbitrary.

Also look up moments of a distribution. The first moment is the mean, which is the average of the raw values. the second moment is the variance, which is the average of squares. The third moment is skewness, which is the average of cubes, the fourth moment is kurtosis, which is the average of fourth powers, etc. Again, why not a 2.5th moment? There’s no real reason why not.

If you take the absolute values instead, what you get is called the “mean absolute deviation”. The mean absolute deviation and standard deviation are measures of roughly the same thing, and there is often a simple relationship between them, so in most contexts you could use either. The standard deviation is just usually more convenient to work with.

Mostly because of the normal distribution.

In practice and reality, most random processes (or at least a LOT of them) follow the normal distribution or something approaching it. And the mathematical function describing a normal distribution has a square in it (more precisely there is a “exponential(-x^2)” at some point). It follows that:

* The standard deviation (with squares) of a normal distribution is easy to compute from its mathematical formula.
* The average deviation (with absolute value) of a normal distribution is much more ugly to compute from its mathematical formula.

Then, it’s not just the normal distributions, there are a lot of mathematical formula where the standard deviations is more elegant to use than the average deviation.