# A dumb question regarding the standart deviation

170 views
0

EDIT: okay so i just understood what i’ve calculated is the MAD (mean average distance) of the population. And the use for the standart deviation is that showing how much a set is spread from the mean. The SD is also differentiable and rates outliers higher, please inform me if i got anything wrong.

&#x200B;

For example for a population of (1, 3, 5) the standart deviation is 1.63, what exactly does it represent? Shouldn’t the standart deviation represent how much a data point differs from the mean on average?

Each data point has a 2, 0, 2 difference from the mean, respectively. When we take the arithmetic mean of the 3 differences, we get 1.33, why is the standart deviation 1.63?

I am absolutely fascinated by statistics but sometimes i just don’t get things. I’d be really happy if someone explained this simply, thanks in advance

&#x200B;

&#x200B;

In: 0

>Each data point has a 2, 0, 2 difference from the mean, respectively. When we take the arithmetic mean of the 3 differences, we get 1.33, why is the standart deviation 1.63?

It’s not the arithmetric mean but the quadratic mean (abbreviated RMS for Root-Mean-Squares). It rates outliers higher and has some neat mathematical properties like being differentiable wich occasionally make things easier to work with.

The standard deviation is the square root of the mean of the _squared_ differences.

In this case your differences are (2, 0, 2), the squares are (4, 0, 4), the mean of that is 8/3, √(8/3) ≈ 1.63

You used the absolute value function to calculate your statistic. Absolute value is a little weird – if you look at the graph it has a sharp point at zero, which means the slope isn’t defined there, and that messes up the kinds of calculus that advanced statistics wants to use.

The y = x^2 function converts all real numbers to non-negative ones, but it’s nice and smooth. So that’s the first reason why you might suspect that it’s better. With more advanced statistic stuff (that I don’t understand too well) you can prove useful properties for this “sum of squared errors” statistic, called “variance.”

For example, if you have two random variables that are independent of each other, you add those together, and look at the distribution of the sum – *the variances also add nicely.* That’s a very useful property. If I remember correctly it’s part of how the central limit theorem is proven and also part of how best-fit models work.

Simplified, saying “now square the errors and add them” ends up working very well. But understanding the deeper reasons why requires getting into things like [moment functions](https://en.wikipedia.org/wiki/Moment_(mathematics)) and other parts of probability theory. These build on linear algebra and calculus.