why is testing for normality seldom useless?


So I came across a particular statistical claim that you can still do parametric tests even if your data doesn’t satisfy normality assumption. When I ask why, they gave a bunch of jargons. How would you explain this to someone who’s stats training is just p-value go brr?

In: 0

Assuming I’m understanding your question correctly, here is the answer.

A lot of statistical tests rely on the [Central Limit Theorem](https://en.wikipedia.org/wiki/Central_limit_theorem), which is one of the most important proofs in all of statistics/probability. The proof of the Central Limit Theorem is a bit complicated, so unless you really want to get into the math of it, you can just assume that it’s true.

In simple terms, the CLT says that the sum of independent and identically distributed random variables approaches the normal distribution even if the random variables themselves are not normally distributed.

For example, take a fair dice roll. The distribution of one roll is clearly uniform. You have the same odds of rolling a 1 as you do a 2, or a 3, or a 4 or 5 or 6. However, when you add up the results of two dice rolls together, the distribution is no longer uniform. Your odds of your two dice rolls summing up to 7 are significantly higher than the two dice rolls summing up to 2, so your distribution now has a peak in the middle and is smaller on both side. In fact, the more dice you add, the closer the distribution of their sum approaches a normal distribution. [Here](https://www.muelaner.com/wp-content/uploads/2013/07/central-limit-theorem.png) is a computer simulation of the outcomes of rolling between 1 and 6 dice. You can see that the more dice there are, the better their sum approximates a normal distribution.

As a result, even if you have no knowledge of the actual underlying distribution of your data, many statistical tests that are based off of normal distributions can still apply. Usually you still need to assume that your data is i.i.d however because the CLT requires that to be true to apply. Also you obviously need to have enough data points for the CLT to start to matter. For example if you only have two or three data points there will be a large amount of error if you assume that their sum is normally distributed, but if you have 1000 data points their sum will be so close to the normal distribution that any error that comes from you assuming that it is normal will be insignificant.