Warning: this started as ELI5, but I got excited.
—–
Essentially, the main metric mathematicians would _really_ like to use is variance – standard deviation squared. It has a lot of very good features; notably, for a lot of common probability distributions, it is additive in some way. To give a couple examples, binomial distribution (flip a coin of chance of p to get heads n times) has mean np and variance np(1-p). Two independent normally (the bell curve) distributed variables will sum to a normally distributed variable – with mean and variance being sums of their parts. In fact! Summing _any_ two independent variables will sum their variance.
Variance also very easily can be related to covariance (if variance of X is expectation of (X-EX)(E-EX), covariance of X and Y is expectation of (X-EX)(Y-EY)), which is one of the core metrics to measure relations between random variables (more commonly quoted correlation is just covariance divided by both standard deviations).
—–
As to why it’s so… _quieting mumbling dissipating into white noise._
In seriousness, one intuitive reason is that once you work with random variables, you often want to imagine them as a vector space – and independent variables correspond to perpendicular vectors. If you need to add them up, the length squared, thanks to uncle Pythagoras, is sum of squares of the constituent parts. I won’t go into detail why we swap random variables for vectors, or standard deviation for length, but suffice it to say – the underlying concepts in both cases are surprisingly similar.
—–
So it’s not so much that standard deviation is a better metric than your suggested average deviation; it’s that variance is a far better metric than both, but we like deviation to be comparable to the mean (instead of squared), and standard deviation is far easier to find from variance.
Latest Answers