Or are there populations where the curve graphs will converge on either end instead of the middle? Is it a fixed rule in Statistics that we “should” always have a bell curve distribution? If not, why does it seem like my data must make a bell curve distribution? Is it a rule in nature that that are greatest amounts of something in a group while slope downwards by number and value of trait towards the gretest middle and from it downwards? What is the special trait about the bell curve that it is underscored so much?
In: 2
Let’s agree there are generally 3 “classes” of things that determine a trait in a population.
1. Things are consistent for every single member of the population, let’s call these the ‘natural tendency’ of a population. For example, every apple tree wants to grow 1,000 apples. You can think of this as the hardwired, “nature” part of determining how something with turn out.
2. Things are consistently *random* for every single member of a population. This is the noise, for example, some trees get visited by more bees, or some trees got a little extra fertilizer, or a little more sun, This makes some trees produce fewer, and others more apples. You can think of this as the experiential “nuture” part of determining how something will turn out.
3. Things are NOT random NOR consistent for every member of a population, for example, 1/3 of the trees producing 20,000 apples for some reason, and 1/10th of the trees producing 0 apples.
The third example is shows us there is a problem with our data. It tells us our data comes from 3 entirely different groups, the 1/10th are probably just not apple trees in the first place. The 1/3 super-producers and the remaining trees might just be different species. Either way with the information given, you *won’t* get a bell curve because your data is garbage. Best thing to do is throw away the 0 apple trees entirely and separate the super-producers and remaining trees into two groups.
Look at those two groups and should get nice bell curves. Why? What “law” of existence means a bell curve “must” happen?
I’d argue you have it backwards, it’s not that the universe demands bell curves but rather that the universe demands IF something has a ‘natural tendency’ (somethings don’t) and IF that natural tendency is exposed to noise that causes it vary a bit (somethings don’t) THEN the result is a bell curve. Specifically it’s the natural tendency that creates the peak, called the AVERAGE or MEDIAN, and its the noise that creates the tails on the curve.
That’s why bell curves are so important, because in science and physics and in daily life we’re almost always talking about systems that work like this. Test scores, heights, weights, apples growing on a tree, whatever. These are things that can be described as having a ‘natural tendency’ and then being subject to noise that combine to create a bell curve.
Not necessarily.
Bell curves ( normal distributions) arise when a group has a single average value, and additive fluctuations around that.
If the fluctuations are multiplicative (for example, % changes or probability changes, and factors influencing growth) then a log normal distribution will occur (looks similar to bell curve, but skewed towards lower values). Human weight is an example of this.
Sometimes these can be hard to distinguish from a normal distribution though, if the skew is small.
Some traits, like those whose likelihood changes over time at a fixed rate, will give an exponential distribution. For example, age of death in an ideal population (although bc of factors like birth rate, war, and health care accessibility, in reality it may not follow an exponential distribution).
Some traits depend on multiple underlying groups, which can create a bimodal (or several average value) distribution. This will have multiple peaks. For example, if you scored people based on their ability to distinguish colors, you would see 2 peaks: a higher peak for normal color vision and a lower peak for colorblind people. If you isolate one peak, it will tend to have a normal or log normal distribution.
In general, the fewer factors that influence a trait, the further from a bell curve it will look. Bell curves (and log normal) curves are special and common because they are a “stable” point in probability theory, if you add another underlying factor, you still get a bell curve. As you have fewer factors creating variation, the distribution can become more skewed.
Essentially, Bell curves arise when there are so many unmeasured factors in a dataset that they blur out the more intricate variations and underlying distributions of individual factors.
Sorta
But it doesn’t start with the bell curve. That just describes what happens. There’s no magic bell curve constant that propagates through nature making sure distributions are following the rules.
It’s more that because of entropy and fierce competition among life forms, things tend to either regress to the mean or, if the trait is desirable, rise up to the new peak.
If a predator gets really good at hunting, the prey will get really good at evading. Not all prey. Some die out. But those don’t show up in the distribution do they?
It’s a case of winning traits being “adopted” by all and/or of outliers being killed off.
You see this in non living systems too, as the most stable state is low energy. And outlier high energy systems decompose until stable.
But there are other distributions. Lawyer salaries tend to be bimodal in the US for instance. Lotta people making $80k and lotta people making $200k+. Not as many outside of that.
> What is the special trait about the bell curve that it is underscored so much?
The central limit theorem shows that (under some weak conditions) any variable that is the sum of a large number of independent random variables will be normally distributed. The Galton board – the thing where balls are dropped through a board with pegs in it – is the classic example of this. At each peg, a ball falls randomly to the left or right, and the position that the ball lands in is the sum of all these little movements. So you get the classic bell curve shape. This would still work if you held the board at an angle so the balls were biased towards one direction, or if you messed around with the positions of the pegs, within reason.
Situations like that seem to come up a great deal in nature, at least for quantities we care about, and at least approximately. Another factor is that the normal distribution is mathematically easier to work with than many distributions, so it’s often chosen to model things even if it isn’t a perfect fit.
But there are many, many, many situations where data follow a distribution that doesn’t look at all Gaussian. Anything that is bounded (e.g. people’s ages, which can’t be less than 0), or anything that is discrete (e.g. the number of kids you have, which can’t be fractional), or anything that spans many orders of magnitude (e.g. the lengths of a diverse set of organisms, from whales to bacteria) will clearly not be Gaussian. But sometimes they can be approximated by a normal distribution.
A simple way to understand a bell curve is to think of a pair of dice. One die has a flat distribution. The other die has a flat distribution. But when you add them together you find that 7 is far more common than 2 or 12. It looks like a bell curve.
Life is complicated and something like height is affected by a lot of different factors. There are many different genes involved. There are different kinds if food in different amounts. Even if the factors that contribute to height were flat the result the get when combined would look like a bell curve.
> Is it a fixed rule in Statistics that we “should” always have a bell curve distribution
No there isn’t, and there are lots of other distributions besides the normal distribution. Another common one is the [Pareto distribution](https://en.wikipedia.org/wiki/Pareto_distribution).
However, there is a mathematical reason why the normal distribution is so common – the central limit theroem. This states that if you take the average of N random variables, the distribution of that average will tend towards a normal distribution as N gets larger. Therefore, any quantity which represents an average of many contributing factors, will tend to be at least approximately normally distributed.
Note that a normal distribution covers the entire real line – all values have a nonzero probability. So a quantity like the height of a human population cannot strictly be normally distributed, because that would imply that a nonzero proportion of the population has negative height. But it is *approximately* normally distributed.
Latest Answers