Absolute values are a *very* messy thing to use in math…they don’t play particularly nice with a lot of other algebraic manipulations, but squaring does and achieves basically the same result in this application so it’s a much tidier and elegant way of doing it that supports a lot more downstream manipulations.
More importantly, squaring increases the influence of extreme values. Since the whole point of variance is to measure the “extremeness” of the distribution, this is a good property. 10 values that are 1 away from the mean have the same absolute value sum as 1 value that’s 10 away from the mean, but if you square the differences and add them it’s 10 vs. 100. The latter is better at capturing the “width” of a distribution, which is the whole point of the exercise.
Latest Answers