How do statistical tests prove significance?

722 views

I did a biology undergraduate degree and often did reports where would statistically analyse our results. P value of less than 0.05 shows that the results are statistically significant. How do these tests actually know the data is significant? For example we might look at correlation and get a significant positive correlation between two variables. Given that variables can be literally anything in question, how does doing a few statistical calculations determine it is significant? I always thought there must be more nuance as the actual variables can be so many different things. It might show me a significant relationship for two sociological variables and also for two mathematical, when those variables are so different?

In: Mathematics

17 Answers

Anonymous 0 Comments

What you’re getting at is that “significance” doesn’t really mean “significance.”

A better term for “statistical significance” is “statistical discernibility.” You measure some X and Y 100 times and find a correlation of 0.42 with some standard error. Then you ask, “Okay, if the true correlation between X and Y were zero, how hard would it be to draw a sample of 100 with a sample correlation of 0.42 or more extreme?” The answer to that is your p-value. If the p-value is low, you’re saying “We can’t ever know the exact true correlation, but we can be very confident that it isn’t zero.” You’re saying your result can be discerned or distinguished from zero.

But, statistical significance doesn’t mean that it substantively matters. That’s a matter for effect size and the confidence interval around it. Suppose you’re researching the effects of eating blueberries on human longevity, and find a statistically discernible effect. If that effect is “You would have to eat the entire mass of the earth in blueberries every year to extend your life by one month,” it doesn’t really matter even if the p-value is 0.0000000001.

Statistical significance also doesn’t mean causality; the usual examples here are Tyler Vigen’s spurious correlations. X and Y can go together because:

* X causes Y
* Y causes X
* They both cause each other simultaneously
* There’s some Z that causes X and Y
* Other stuff I’m forgetting
* For literally no reason at all

Figuring out causality is, mostly, a research design question and not a statistics question. There are circumstances where causality is relatively straightforward statistically, but you have to be able to perform true experiments or have to luck into the right kind of data being available.

When you can’t do a true experiment and you don’t have that lucky kind of data, what you mostly do is ask “What would the world look like if I were right? What would it look like if I were wrong?” If you’re right, more X goes with more Y, and more A goes with less B, and so on. What you’d like to do here is have a whole set of things to look at, some of which are weird or surprising. You can see a lot of this with early covid or other epidemiological study — if it’s spread by air, then we should see these relationships between variables, but if it’s spread by droplets, we should see other relationships that we wouldn’t see if it were by air, and if it’s spread by contaminated water we should see yet other relationships.

You are viewing 1 out of 17 answers, click here to view all answers.