Statistical significance


Trying to do a research paper. One of my potential sources says “group a scored slightly, but significantly worse than group b”. I don’t understand how you can score slightly but significantly worse. Googled that phrase and it sounds like the source means “statistically significant”.

In: 6

Something being statistically significant or not is similar to a true or false statement.

However instead of false meaning it can’t be true, it’s more like you don’t know with enough certainty if it can be true. You don’t know because the data is not ‘good’ enough.

The question begins with how one could say that something is different from something else. Especially when that thing is a property or characteristic of a group when it might not be possible to test the entire population. Typically it is possible to only take samples of two populations and compare the average results.

The problem is that given variances in each population, we cannot be sure that our comparison actually measures an actual difference or whether it was pure random chance. This is the question of significance. In many researches, a 5% threshold is used and this is conventional but arbitrary (a researcher could decide to use higher or lower significance)

So if one is comparing some parameter in group A to group B at a 5% significance level and conclude there is a difference in that parameter between the two groups, there is a 5% chance that this conclusion is incorrect (ELI5 here). In other words, assuming group A and group B were NOT different, 5% of the time the measured difference could have come about through random chance.

The statistical significance is the “sureness” of the conclusion. Statistical significance says NOTHING about the IMPORTANCE of the difference.

Statistical significance means there is a cause and effect relationship between two things. Even though the difference in score was small, the math gives us reason to believe that difference is not random.

The reason for this is that the variance in score within each sample is much less than the difference in average score between the samples.

For example, let’s say you have 2 classes of students, class A and class B. Both classes take a test. Class A gets an average score of 80, and class B gets an average score of 75.

Now consider two more scenarios.

1. The scores of class A were: 75, 80, 85. The scores of class B were: 65, 75, 85.
2. The scores of class A were: 80, 80, 80. The scores of class B were: 75, 75, 75.

There’s nothing shocking about scenario 1. You could easily assume these students are randomly distributed according to intellect and knowledge. But look at at scenario 2. Something’s off. For every student in class A to score exactly 5 points higher, we have to believe there’s a reason: perhaps those students are smarter or perhaps their teacher covered the curriculum better.

In the medical field research outcomes are often measured in statistical significance (ie does it probably work?), and clinical significance (if it does something, is it worth the effort?).

This sounds like a scenario where it works, but not very well.

It means the difference is small but large enough that it can’t be chalked up to randomness.

Take a bunch of different groups and give them the same test and you’ll see slight differences between the scores, but in general the average will be more or less the same if you have enough people.

Group A and B presumably had some differences and they were testing to see what difference it makes in the test results. Maybe one group smokes weed and the other doesn’t, I don’t know.

The result being significant means that they’re confident that the difference between the groups was caused by that change (smoking or not smoking weed, or whatever it actually was) and not by random chance. Therefore they can be confident that the factor they changed does influence test results.