The normal distribution is what results when we plot the activity of a random process. [Here’s](https://www.youtube.com/shorts/Vo9Esp1yaC8) a really good physical example of how a normal distribution works. Most of the little balls fall in the middle, few fall at the outer extremes. If you graphed how long it takes you to get to work (or school) every day, it would look like a normal distribution. The average time (highest point in the middle) might be 15 minutes. Sometimes it will be slower (accidents) and sometimes it will be faster (you hit all the green lights).
Statisticians use formulas to generate normal curves. We know how many observations there will be a certain distance to the right or left of the mean (center). This helps us decide if the particular process we’re looking at is random or not random.
Because the normal distribution often describes how statistics in nature are distributed, we can use the distribution to concisely describe data we collect. If my data fits a normal distribution and I want to describe it to you, all I need to give you is the average and the standard deviation. With that you can reconstruct the data and quickly visualize it.
We can also look for problems in our data. For instance, if I plot my data and it looks like a normal distribution with 2 peaks, that means I probably have a mix that should really be 2 different data sets. If I just blindly calculate the average the result won’t be meaningful. This sort of situation might happen if I collected the weight of a group of high-school seniors. There would be male football players mixed in with 90 pound girls. Averaging the 2 wouldn’t represent either group and the distribution would show that.
Likewise, If you end up with a nice-looking bell curve, but some values are much higher or lower, those outliers don’t fit in with the rest of the data. Leave them out when calculating an average. You might be looking at the typical weight of a dozen athletes, but 1 of them is a sumo wrestler. Maybe you are looking at income distribution and Jeff Bezos is in the data set. If Jeff and I are on the same chart it’s not going to look “normal”.
As someone who is not a not a scientist and not a statistician, it’s especially useful for me to be able to tell whether my data is meaningful, or messed-up.
Latest Answers