someone please explain Standard Deviation to me.


First of all, an example; mean age of the children in a test is 12.93, with a standard deviation of .76.

Now, maybe I am just over thinking this, but everything I Google gives me this big convoluted explanation of what standard deviation is without addressing the kiddy pool I’m standing in.

Edit: you guys have been fantastic! This has all helped tremendously, if I could hug you all I would.

In: Mathematics

When you add and subtract a standard deviation to the mean, 68% of your data (age of participants) is within the interval.

That’s from 12.93 -. 76 all the way to 12.93+.76

If you add and subtract two standard deviations, 95% are within the interval.

That’s from 12.93 -2 * 0. 76 all the way to 12.93+2 * 0.76

If you tested another group and you got stdev >. 76 it would mean that the new group is more diverse, the ages are more spread out.

Conversely, if you tested a group with stdev<. 76 it would mean that their ages are more close to the mean value, less spread out.

My explanation might be rudimentary but the eli5 answer is:

Mean of (0,1, 99,100) is 50

Mean of (50,50,50,50) is also 50

But you can probably see that for the first data, the mean of 50 would not be of as importance, unless we also add some information about how much do the actual data points ‘deviate’ from the mean.

Standard deviation is intuitively the measure of how ‘scattered’ the actual data is about the mean value.

So the first dataset would have a large SD (cuz all values are very far from 50) and the second dataset literally has 0 SD

So far the answers you’re getting seem to only apply to the normal distribution (bell-curve) which is kind of misleading, since not all data is normally distributed and we use standard deviation in any case.

At its core, standard deviation is a way of telling you how spread out your data is. Of course there are other ways of doing this (range, average distance from mean etc.) but standard deviation has some nice properties that we like.

The best way of thinking about it I’ve found is geometrically. If you take a sample of n values from a distribution (such as the age of children in your example) and plot this as a point in n dimensions (so the first value is the first co-ordinate etc.) and also plot the point that has the mean in every co-ordinate, then the expected distance between those points is the standard deviation. In the case of a single dataset, you are computing exactly the distance between your data as a point and this mean-point.

We like this because this is exactly the value that the mean minimises – if you took any other value as the mean then this distance would be bigger.

The mean is the average of all the values.

The standard deviation is-in effect- a measure of the average distance of each value from the mean.

It takes the sum of the distances from each value to the mean squared, divides by the number of values, and takes a square root.

In basic terms, a small standard deviation means most of the values are close to the mean, while larger standard deviation means the values are more spread out away from the mean.

Almost all the other answers here are explaining SD in terms of normal distributions (“the bell curve”). No five year old needs to learn about normal distributions to understand SDs.

1) you have a mean, the average of all the data points in your set.
2) each one of those data points will have a variance between themselves and the mean.
3) you’d like to know what is the average amount of variance of those data points from the mean.

That’s it. That’s the standard deviation. The stuff about what it means for a normal distribution can come later.

Mean (or average) gives you a measure of a ‘center’ (in one definition) of a number of measurements.

Standard deviation (SD) gives you a measure of how much those measurements are spread out around that mean, i.e., how much the measurements “deviate” from that average. If you calculate two more values — mean plus SD and mean minus SD — it tells you that 2/3 of your measurements are within that range.

So, the smaller the standard deviation, the closer 2/3 of the measurements are to the mean.

In your example above, rounding off to make things simpler, 2/3 of the measurements are well within the age range of 12-14.

I’ll give my shot at it:

Let’s say you are 5 years old and your father is 30. The average between you two is 35/2 =17.5.

Now let’s say your two cousins are 17 and 18. The average between them is also 17.5.

As you can see, the average alone doesn’t tell you much about the actual numbers. Enter standard deviation. Your cousins have a 0.5 standard deviation while you and your father have 12.5.

The standard deviation tells you how close are the values to the average. The lower the standard deviation, the less spread around are the values.

In your data set you have an average age of 13. The standard deviating is close to one.

This means that, in the group, you’ll have some 12 and 14yo kids, too.

If the standard deviation were like 5, you could have an average of 13 still, but also have a bunch of 8 and 18yo kids.

ELI5: It’s literally just tells you how “spread out” the data is.

Low SD = most children are close to the mean age

High SD = most children’s age is away from the mean age


ELI10: it’s useful to know how spread out your data is.

The simple way of doing this is to ask “on average, how far away is each datapoint from the mean?” This gives you MAD ([Mean Absolute Deviation](

“Standard deviation” and “Variance” are more sophisticated versions of this with some advantages.

Edit: I would list those advantages but there are too many to fit in this textbox.

At one restaurant they cook their steaks perfectly every time. At another restaurant it’s a crapshoot whether your steak is served raw or burnt to a crisp. At both restaurants the average steak is cooked perfectly. The first restaurant has less variance/less standard deviation and the second restaurant has greater variance/standard deviation.

It’s a measure of how tightly clumped your date is around the mean. If your data has low standard deviation then all your datapoints are tightly clumped around your mean. If your data has high standard deviation then your datapoints are very spread out, with the mean somewhere in the middle.

Standard deviation is simply a commonly accepted way of measuring this spread. You calculate it as follows

– take every datapoint and work out how far from the mean it is, the simplest way to do that is simply minus the mean from it which will give you the distance if the datapoint is bigger than the mean and minus the distance if the datapoint is smaller than the mean
– square them all to make them all positive so they’re easier to compare (don’t worry we’ll undo this later)
– work out the average (ie the mean) of those answers
– take the square root of that average (to undo the fact that you squared them all earlier)

and that’s your standard deviation

Thanks, from reading the sum of all these comments and averaging the answer I actually understand 🙂


If you flip the words around it makes a LOT more sense.

Deviation (from the) standard. It tells you how much your dataset has a variation from the “standard” of said dataset.

If you have 100 chickens, and 99 of them are yellow, and 1 is red, your “average” is “yellow”, and your standard deviation is very very low, because only one chicken “deviates” (from the) “standard”.

I’ll try my best, with example similar to the top comment because it’s probably the easiest to understand. I just want to add some things that may make it easier to understand.

A is 5 years old and B is 30 years old. The average of the age of both A and B is (5 + 30)/2 = 17.5

C is 17 years old and D is 18 years old. The average of the age of both C and D is (17 + 18)/2 = 17.5

If you look at it, A and B, and C and D have the same average, but it doesn’t really tell you much about their actual age. This is where standard deviation may help you. Standard deviation is basically the range between the average and the data you want to see (in this case, the age of A B C D).

Standard deviation for C and D is 0.5. Where did 0.5 come from? 0.5 is the difference between the age of C or D and the average of C and D.

I made a graph that could help:

The same is also applied to A and B. The standard deviation of A and B is 12.5, meaning that there is 12.5 difference between age A or B with the average of A and B. A graph that could help:

Ok, stats major here and I finally understood it like this:

We have 10 data points or numbers. These 10 numbers have an average. What we want to find out is how dispersed are those numbers from the average.

So we start taking each of those 10 numbers, and subtracting it from the average to get the distance between them.

So now that we have the distance of each of the 10 points from the average, let’s sum up all the distance. Now if you divide the that total distance by the number of points there are, you therefore get the average distance of the data set from the average.

ADDITIONAL: Now of course, stats being stats, there are numerous nuisances – each one of those 10 numbers is either above or below the average so the distances will be negative and positive numbers. But like in real life, distance can’t be negative… So we square all the numbers and then take their square root to remove the negative sign. Then there also the degrees of freedom involved …but that’s for another day.

OK, let’s try this:

You have to make ten hamburgers out of 1 kilo of meat. Each burger should be 100 grams, right? So you form up your ten burgers, and decide to weigh them to see how close they are to your ideal 100 g burger.

You’re pretty good! 8 of your burgers are 100 g, one is 99, and one is 101. That’s almost perfect. If you put them in a row, they all look exactly the same.

Now, you give another kilo of hamburger to a six year old, and ask him to do the same. He makes 5 really big 191 g patties, and then realizes he’s almost out of meat, so the next four are 10,10,10, and 5 grams. When he puts his in a row, you see 5 enormous patties, and 4 bitty ones, and one itty-bitty one.

Obviously, these are two different ways of making burgers! But in each case, we have ten burgers, and in each case, the average weight is 100g. So they’re the same! But they’re clearly not the same. So how do we *describe* the difference, mathematically, between these two sets of burgers?

That’s what the Standard Deviation (SD) does for us. It tells us how far, on average, a member of a set (one of the burgers) is from the set’s average (our “ideal” burger of 100 g). When the SD is small, as it was in the first case, you will see all the burger weights clustered around the middle (the SD was 0.5). When the SD is large, as in the six-year old’s burgers, the weights will be all over the place (SD was 95).

How do you measure this? Easy – you take the difference from each element (burger) from the middle (the ideal 100 g burger), add the differences together, and divide by the number of elements (burgers). That tells you how far, on average, any burger might be from 100 g.

So, in our first case, we have eight burgers where “burger weight-ideal weight = 0”, one where it’s +1, and one where it’s -1. These add up to … zero! Does that make the SD zero as well?

In fact, in any set, adding up the differences will always add to zero. The differences on the minus side always equal the differences on the positive side. Try a few sets and see. To get over this, mathematicians use a trick of “squaring” each measurement first, (because this way, all the negative numbers get turned into positive ones), adding them all together as positive numbers, and then taking the square root of the total. This lets us add together all the burgers that were too heavy, and all the ones that were too small, and find out what the average difference between any burger and the ideal burger will be.

Let’s say you have a bunch of points on a graph and you find the line of best fit. That line would be floating out amongst the data points with a “distance” between the line and data point. If you take all those distances and average them, you have your standard deviation. It’s the average amount the average deviates from the data.

Let’s say Tom has $1 and Bill has $2. Obviously the average amount of money between Tom and Bill is $1.50, but Tom and Bill deviate from the average by $0.50. Let’s add a third person, Dave, with $6. The average amount of money is $3 between the three guys. Tom deviates by $2 ($3 is the average and Tom has $1; $3-$1=$2), Bill deviates by $1, and Dave deviates by $3. Average those deviations to get a standard deviation of $2. It’s the average distance from the average.

Here’s my way of thinking about it. Imagine you have a row of cans marked 1 through 10. You give a guy a BB gun, stand him 30 feet from the target, and tell him to shoot can 5 near the middle. Most of the time he hits can 5, but sometimes he hits can 6 or can 4, and there’s a few times he will hit cans further away from the targe. Maybe he hits a single 7. You tally up each time he hits a can.

What you’ll see is that there is a distribution of shots around the target, with the most number shots hitting can 5, and then quickly going down as you get further away from the center. The curve of this distribution looks like a bell, and it has a special name: the normal distribution. It appears a lot in nature where something is normally a certain value, but due to random chance it varies up or down from that value.

Now, the distribution of shots isn’t the same for each situation. What if you move the shooter to 100 feet away from the cans? Well, his accuracy is going to go down, so there’s a lot more shots that hit cans further from the center. If you tally up the new distribution, you notice the “bell” is wider than before. Fewer shots hit can 5, and more hit cans 9 or 10. But he is trying hard so still more shots hit the target than other cans.

The *width* of the distribution indicates the accuracy of the shooter. This width is measured using a mathematical formula called *stardard deviation*, also called “Sigma”. So the value of sigma tells you how accurate the shooter is – bigger sigma is less accurate, smaller sigma is more accurate.

It is important in science to be able to calculate this number because it gives you a numerical score for how accurate the shooter is, and it allows you to actually predict the chance of hitting any single can on the next shot. So if a shooter had a sigma score of 1, then most his shots (68%) are going to hit within one can of the mean – can 4, 5, or 6. We can also predict that this shooter is supposed to hit can 9 only once every three hundred shots. So if suddenly he starts hitting can 9 every ten shots, we know something changed with the situation – his sigma must be different now. At this point maybe he’s getting tired and needs a rest.

We have a good idea of what average (mean) means, so think of it like this: Standard deviation is the average difference from the average.

It’s just a measure of spread. The higher the standard deviation, the more spread out the data is from the mean.

If you look at the formula, it is the average of the square of the difference, which penalises large differences more.