They say .014(the P-Value) is a “significant number”. Says who? Why? Isn’t any number “significant” if the distribution of data points is mostly around that area?
In: 4
The p-value is not a number in the data you are analysing.
Basically when you are studying a phenomen, you try make an hypothesis about how it works. Than you measure a bunch of data relevant this hypothesis.
There is a branch of statistics that deal with method for analysing data. You can compare the measured data with the ideal behaviour that would come from the behaviour you hypothesised and the output is a probability that the data does NOT actually come from the hypothesised behaviour.
The p-value is this number, so the lower the p-value the more likely it is that your hypothesis is actually correct and not mmere chance.
Let’s say you roll five dice, and all five dice come out with sixes on the first roll. This is an incredibly good roll, and it is pretty unlikely …. if the dice are fair.
Now, if you suspect that the dice may not be fair, you could ask yourself: “if these dice were fair, what would be the chance that I would get this great a roll?” And if the chance is less than 5% (the p-value), the dice are probably not fair.
(formally, you reject the null hypothesis that the dice are fair)
In terms of statistical significance, we often test whether or not a factor has an impact on an outcome. You can estimate how much of an impact there seems to be in your data, and if it looks like the impact is so large that it is unlikely that there is no impact, the result is statistically significant.
(you reject the null hypothesis of no impact)
The 5% p-value is somewhat arbitrary, but experience has shown that it gives a good balance between the two mistakes you can make. The first mistake is rejecting a hypothesis that turns out to be true (and should not have been rejected). The second mistake is not rejecting a hypothesis that turns out to be false (and should have been rejected). That’s why a 5% p-value has become somewhat standard.
(although many studies also use 10% and 1% p-values).
The p-value, or the statistical significance, tells you how likely it was that you could have gotten the same kind of data, and drawn the same conclusions, from random chance alone. A higher significance means it’s highly unlikely that the data was a random fluke.
Take a simplified example. You flip a coin 4 times and get heads all 4 times. Can you conclude the coin is biased toward heads? If you calculate the probability of getting 4 heads in a row with a fair coin, it is 1/16 = 0.0625. This is essentially the p-value. This tells you there’s about 6% chance you could have gotten this result by pure luck, which is rather high, and so the results are not significant enough to conclude the coin must be biased.
On the other hand, if you had gotten 10 heads in a row, then the p-value becomes 1/1024 = 0.000977. This is extremely unlikely to happen by chance alone, and so justifies a significant suspicion that the coin is not fair.
In real data analysis, calculating the p-value is a bit more complicated than the coin toss scenario, but it does represent the same idea. The question could be “given this number of smokers who developed lung cancer, can we conclude that smoking is linked to lung cancer?”, and the p-value will tell you what was the chance that those smokers would have developed lung cancer by chance, regardless of having smoked. When the p-value is low enough, that makes for significant evidence linking smoking to lung cancer.
The p-value is the probability that if there were a parallel universe with everything the same and the experimental variables you’re altering do nothing at all, then in that parallel universe you’d get a signal the size of the one you’re looking at or bigger.
5% is considered close enough to go have a better look. 1% means you might well be onto something. 0.05% means you probably have something. You can keep going.
These numbers are chosen because of how random numbers behave. The sum of a huge number of completely random numbers that don’t have anything to do with each other is normally distributed. The fewer reasons your data truly is the way it is, and the more those reasons depend on each other – or if they multiply rather than add – your random number stops being normal and you should use different stats to work out whether the thing looks true or not.
You always report a p-value with respect to a significance level α. In principle α can be any value of significance, and what it designates is the probability that something with “no effect” produces observations that appear to actually have a significant effect.
The most common level of α is 0.05, which corresponds to a 95% confidence level that you aren’t accidentally measuring an accidental effect. If your p-value comes out to less than 0.05, you reject the null hypothesis (no effect) at a significance of 0.05. In ELI5 as I can, this means you are over 95% sure that the effect you measure is real and not just an accidental fluke.
You are correct that 0.05 is arbitrary. You could pick 90% confidence, in which case if you had a p-value less than 0.1 you could more or less be 90% sure that an effect exists. Or if your p-value is less than 0.01 you would say it’s significant at α=0.01. Generally if you get a very, very small p-value you report α much smaller than 0.05 to emphasize that you’re more certain that an effect exists. 0.05 is pretty much chosen because 95% certainty is accepted as a high level of confidence. Obviously 95% certainty is a healthy level of confidence, though anyone who plays Dungeons and Dragons or any other game with 20-sided die can tell you that events with 5% probability happen all the time. With studies coming out reporting p-values all the time you’re bound to get some amount of fluke measurements in the mix.