– What is a standard deviation?

357 views

So I understand the concept of a distribution. I just don’t understand the concept of a standard deviation from the mean of that distribution. How can we tell what is 1 standard deviation away as opposed to 2 standard deviations?

I’d be very thankful for an explanation, thank you :).

In: 9

5 Answers

Anonymous 0 Comments

You find an average of x,y,z,a,b,c. You apply the formula and if the number is low then most of the numbers in the set are going to be close to the average. If it is high then it is going to be more spread out. So if the population standard deviation is 2 and the average is d, then you know that most of the population x,y,z,a,b,c are going to be tight grouped. If it is an SD of 50 for the same mean then you know the population is more spread out. On a graph, the SD of 2 would be flatter while the SD of 50 would have a sharp curve to it.

Anonymous 0 Comments

The standard deviation is sort of a bit like the average distance that a point is from the average value. Not quite, but I think that paints a decent picture. You use a formula to calculate what that is, and then you just look at 1, 2, or 3 times that added on or taken away from the mean. If the distribution is a normal distribution, then about 68% of the values are between (mean-SD) and (mean+SD).

Anonymous 0 Comments

I’ll explain this the way I understood it, so ill break it down to its simplest and most manual method.

Since you understand distribution, picture a variable distributed on a single plot line, a simple straight line. Let’s say you were trying to find the standard deviation of the weight of students in a class, for simplicity. The weights will be distributed on the line ranging from 50kg til 120kg, for instance, with a mean of 75kg. With the mean, you know the average, or “center” weight in terms of the plot line.

The problem with the average or mean, is that it does not describe how far a single weight can actually be from the mean. This is relevant if you want to make sure your population (the variable of your study) is not too far off from the value of the mean (like in production of materials where precision of thickness, height, weight, etc is important, you would want your entire batch to be near the mean value).
For example, you can find a mean of 75 if there were three students, one being 50kg, one being 75kg, and one being 100kg. But you could also find a mean of 75kg if the three students were 50kg, 50kg, and 125kg.

Now, you can find the difference between the weight of each student and the mean weight. On the plot line, this difference would be represented by the distance of each weight from the mean. Note the distance between all values and the mean.

Now you can find the AVERAGE DISTANCE between the values and the mean. A high average distance indicates one of two things. Either your values are generally spread out and far from the mean, OR, that there is an outlier value affecting your average. An outlier is basically a single value that is so far off the mean that its affect on the average is considerable. Take for example the 125kg student in the second example. Due to this high value, the mean is 75kg even though majority of the students weigh less in comparison.

Additionally, note that if you were to find the average of those distances using conventional means, your result would be zero. For instance, using the example, the distances are 25 (since 75-50), 25 (75-50), and -50 (75-125). Then 25 + 25 – 50 = 0

To bypass this, we square all values individually (square of a negative value is positive) before finding the average.

That is standard deviation to my understanding.

Anonymous 0 Comments

The mean tells you where the distribution is centered. The standard deviation tells you how wide or dispersed that distribution is. Note that there are different ways of measuring distribution center and spread (for example, median, range, absolute deviation, mode), each of which has their pros and cons.

Anonymous 0 Comments

Standard deviation is a way to talk about how much more or less something might be the next time you get one, and how often you’ll get a lot more or less.

Let’s say you get a big bag of M&M’s at the store and count how many M&M’s are inside (before you eat any). Because they are small and there are a lot of them, it makes sense that some bags might have a few more than others, but usually it’s about the same number.

If you got all of the bags at the store, you could count all of the M&M’s, and put the same number into each bag to “fix” them. This number is called the _mean_ and is a kind of _average_, though often when people say “average” the mean is the one they… mean. And if you have a few M&M’s left over, the mean has a fraction added to the number that went into each bag to account for the extra.

Now that we know what the mean is, we can talk about how much any of the bags _varied_ from the mean. Maybe one bag was 10 M&M’s short of the mean, or another 5 over. If we add up all of the differences, we’d get a total difference of 0 because it would all average out, that’s what the mean does. So we’d like a way to talk about the differences, a way that more or less both add to the amount of difference, and (importantly), the more it’s different the more “difference” it adds. After all, you’ll care more about buying one bag that’s 20 short than two bags that are 10 short each, because if one was 20 short, maybe the next is 20 short, too! So what statisticians do is _square_ the differences by multiplying each difference by itself. So +20 or –20 both become +400 (since negative numbers have positive squares), and two 10’s would be +100+100 = +200, so two 10’s is less _variance_ than one 20.

This variance now tells us how much variation there is in the number of M&M’s in the bags. But because we multiplied, for example, 20 M&M’s by 20 M&M’s, we got a result of a variance of 400 M&M’s-squared. We don’t eat squared M&M’s. So we take the square root of the variance to return to the unit we care about, M&M’s. That’s the _standard deviation_, standard because we’ve fixed the unit.

Now for each bag we can standardize the difference we measured by comparing the _error_, how many M&M’s the bag is over or under the mean, against the standard deviation. The unit (M&M’s) cancels out and we get a simple number without units, so we can compare standard deviations against other samples, like maybe a different size of M&M’s bag, or Reece’s Pieces.

That’s useful in itself, but it has certain mathematical properties that are more useful. For example, Chebyshev’s Theorem says that the number of M&M’s in a bag should be within two standard deviations of the mean at least 75% of the time. We don’t know how many M&M’s will be in the next bag we buy, but we have an idea of a range of how many to _expect_, unless something strange happened at the factory. If our data seems to have certain properties, we might be able to describe it as a “normally distributed variable” and, if that assumption is okay, we can make more accurate predictions on how often a bag will have a certain range of M&M’s in it. For example, if the M&M’s bags’ candy count is approximately normal, then we can say that we expect 68% of bags to have a number within one standard deviation of the mean, and 95% of the time to be within two standard deviations.

One thing we need to be careful about is that we don’t know the standard deviation of the M&M bag machine. Often we _can’t_ know, so statistics is about _estimating_ (guessing at) these unknowns. So when we calculate the standard deviation from a sample, we usually make it a little bit bigger because we don’t know everything, and the amount added is more when we have fewer samples since we didn’t study much. Because of that, you’ll see σ used to represent the usually unknown true standard deviation, but _s_ when we calculate from a sample, which is the correct value of the sample but only an estimate of what σ is. And with more math, we can figure out how near to _s_ σ is likely to be. How far you go depends on how much information you need.