How do statistical tests prove significance?

735 views

I did a biology undergraduate degree and often did reports where would statistically analyse our results. P value of less than 0.05 shows that the results are statistically significant. How do these tests actually know the data is significant? For example we might look at correlation and get a significant positive correlation between two variables. Given that variables can be literally anything in question, how does doing a few statistical calculations determine it is significant? I always thought there must be more nuance as the actual variables can be so many different things. It might show me a significant relationship for two sociological variables and also for two mathematical, when those variables are so different?

In: Mathematics

17 Answers

Anonymous 0 Comments

>For example we might look at correlation and get a significant positive correlation between two variables. Given that variables can be literally anything in question, how does doing a few statistical calculations determine it is significant?

I think you’ve been thrown by one of the (many) confusing things about statistical significance. It took it a while for this to click with me.

A test of significance has *nothing* to do with what you’re actually measuring. Figuring out the relationship between two variables is about picking the right test of correlation (and ensuring the logic of such a relationship).

Significance testing is about *sampling*. (Which in this case could also be repeating an experiment.)

Imagine I have two bags filled with loads of red and blue balls. I want to know if the proportions are different between the two bags, so I pull out a random sample from each bag.

Now imagine I have two big groups of men and women. I want to know if the proportions are the same between the two groups, so I draw a random sample from each.

I’m looking at completely different things, but the statistics of *sampling* are the same.

Now actually doing significance testing right is rather more complex than that. To start with you need to decide what level of significance you need. Mathematics can’t tell you that – “what level of risk am I willing to take that my results are down to chance?” A 95% confidence level is just a convention, and as another commenter has said, the convention is not always 95%.

Also, using significance tests assume the null hypothesis is true. This leads to a problem like screening tests – if most people don’t have a disease then you’ll get a lot more false positives than if most people do have a disease. In a similar way, if a hypothesis is unlikely to be true, it’s more likely you’ll get a false positive than if it’s likely to be true. But factoring that in means making a guess at how likely your hypothesis is to be true, which takes us into the contentious world of Bayesian statistics.

“The problem is that there is near unanimity among statisticians that p values don’t tell you what you need to know but statisticians themselves haven’t been able to agree on a better way of doing things.”

That last bit is probably a bit complex for ELI5… I might be able to explain it better if anyone wants. (Or [here](http://www.dcscience.net/2020/10/18/why-p-values-cant-tell-you-what-you-need-to-know-and-what-to-do-about-it/) is a more technical explanation for anyone who wants that.)

You are viewing 1 out of 17 answers, click here to view all answers.