Share & grow the world's knowledge!

- RSA0 on What is the difference between a quantum wave function and a probability distribution of a particle’s location, and why does the wave function require imaginary numbers?
- MeanMusterMistard on eli5, Why is eating so much longer with headphones in?
- captaindeadpl on eli5: How exactly does the Richter scale measure how strong an earthquake is? What happens at 10 (the top end, as Iunderstand it)?
- captaindeadpl on How does one go about the process of copyrighting something, like a design?
- captaindeadpl on How does Alcohol percentage work?

Copyright © 2022 AnswerCult

Assuming I’m understanding your question correctly, here is the answer.

A lot of statistical tests rely on the [Central Limit Theorem](https://en.wikipedia.org/wiki/Central_limit_theorem), which is one of the most important proofs in all of statistics/probability. The proof of the Central Limit Theorem is a bit complicated, so unless you really want to get into the math of it, you can just assume that it’s true.

In simple terms, the CLT says that the sum of independent and identically distributed random variables approaches the normal distribution even if the random variables themselves are not normally distributed.

For example, take a fair dice roll. The distribution of one roll is clearly uniform. You have the same odds of rolling a 1 as you do a 2, or a 3, or a 4 or 5 or 6. However, when you add up the results of two dice rolls together, the distribution is no longer uniform. Your odds of your two dice rolls summing up to 7 are significantly higher than the two dice rolls summing up to 2, so your distribution now has a peak in the middle and is smaller on both side. In fact, the more dice you add, the closer the distribution of their sum approaches a normal distribution. [Here](https://www.muelaner.com/wp-content/uploads/2013/07/central-limit-theorem.png) is a computer simulation of the outcomes of rolling between 1 and 6 dice. You can see that the more dice there are, the better their sum approximates a normal distribution.

As a result, even if you have no knowledge of the actual underlying distribution of your data, many statistical tests that are based off of normal distributions can still apply. Usually you still need to assume that your data is i.i.d however because the CLT requires that to be true to apply. Also you obviously need to have enough data points for the CLT to start to matter. For example if you only have two or three data points there will be a large amount of error if you assume that their sum is normally distributed, but if you have 1000 data points their sum will be so close to the normal distribution that any error that comes from you assuming that it is normal will be insignificant.