AnswerCult

Question

780 viewsJanuary 1, 2024

Question 100.55K May 11, 2023 0 Comments

They say .014(the P-Value) is a “significant number”. Says who? Why? Isn’t any number “significant” if the distribution of data points is mostly around that area?

In: 4

9 Answers

Answer 1 · 2023-05-11T07:27:17+00:00

The p-value is not a number in the data you are analysing.
Basically when you are studying a phenomen, you try make an hypothesis about how it works. Than you measure a bunch of data relevant this hypothesis.
There is a branch of statistics that deal with method for analysing data. You can compare the measured data with the ideal behaviour that would come from the behaviour you hypothesised and the output is a probability that the data does NOT actually come from the hypothesised behaviour.
The p-value is this number, so the lower the p-value the more likely it is that your hypothesis is actually correct and not mmere chance.

Answer 2 · 2023-05-11T07:26:53+00:00

You always report a p-value with respect to a significance level α. In principle α can be any value of significance, and what it designates is the probability that something with “no effect” produces observations that appear to actually have a significant effect.

The most common level of α is 0.05, which corresponds to a 95% confidence level that you aren’t accidentally measuring an accidental effect. If your p-value comes out to less than 0.05, you reject the null hypothesis (no effect) at a significance of 0.05. In ELI5 as I can, this means you are over 95% sure that the effect you measure is real and not just an accidental fluke.

You are correct that 0.05 is arbitrary. You could pick 90% confidence, in which case if you had a p-value less than 0.1 you could more or less be 90% sure that an effect exists. Or if your p-value is less than 0.01 you would say it’s significant at α=0.01. Generally if you get a very, very small p-value you report α much smaller than 0.05 to emphasize that you’re more certain that an effect exists. 0.05 is pretty much chosen because 95% certainty is accepted as a high level of confidence. Obviously 95% certainty is a healthy level of confidence, though anyone who plays Dungeons and Dragons or any other game with 20-sided die can tell you that events with 5% probability happen all the time. With studies coming out reporting p-values all the time you’re bound to get some amount of fluke measurements in the mix.

Answer 3 · 2023-05-11T09:10:30+00:00

The p-value, or the statistical significance, tells you how likely it was that you could have gotten the same kind of data, and drawn the same conclusions, from random chance alone. A higher significance means it’s highly unlikely that the data was a random fluke.

Take a simplified example. You flip a coin 4 times and get heads all 4 times. Can you conclude the coin is biased toward heads? If you calculate the probability of getting 4 heads in a row with a fair coin, it is 1/16 = 0.0625. This is essentially the p-value. This tells you there’s about 6% chance you could have gotten this result by pure luck, which is rather high, and so the results are not significant enough to conclude the coin must be biased.

On the other hand, if you had gotten 10 heads in a row, then the p-value becomes 1/1024 = 0.000977. This is extremely unlikely to happen by chance alone, and so justifies a significant suspicion that the coin is not fair.

In real data analysis, calculating the p-value is a bit more complicated than the coin toss scenario, but it does represent the same idea. The question could be “given this number of smokers who developed lung cancer, can we conclude that smoking is linked to lung cancer?”, and the p-value will tell you what was the chance that those smokers would have developed lung cancer by chance, regardless of having smoked. When the p-value is low enough, that makes for significant evidence linking smoking to lung cancer.

Answer 4 · 2023-05-11T09:07:12+00:00

Let’s say you roll five dice, and all five dice come out with sixes on the first roll. This is an incredibly good roll, and it is pretty unlikely …. if the dice are fair.

Now, if you suspect that the dice may not be fair, you could ask yourself: “if these dice were fair, what would be the chance that I would get this great a roll?” And if the chance is less than 5% (the p-value), the dice are probably not fair.

(formally, you reject the null hypothesis that the dice are fair)

In terms of statistical significance, we often test whether or not a factor has an impact on an outcome. You can estimate how much of an impact there seems to be in your data, and if it looks like the impact is so large that it is unlikely that there is no impact, the result is statistically significant.

(you reject the null hypothesis of no impact)

The 5% p-value is somewhat arbitrary, but experience has shown that it gives a good balance between the two mistakes you can make. The first mistake is rejecting a hypothesis that turns out to be true (and should not have been rejected). The second mistake is not rejecting a hypothesis that turns out to be false (and should have been rejected). That’s why a 5% p-value has become somewhat standard.

(although many studies also use 10% and 1% p-values).

Answer 5 · 2023-05-11T09:22:27+00:00

The reason P=0.05 is so commonly used is because a Belgian astronomer somewhat arbitrarily decided that he could tolerate 1/20 false positives.

Later in 1935, a chain smoking agricultural scientist (Fisher, sometimes erroneously called the father of the p-value) reasoned in a similar manner:

>The value for which P=0.05, or 1 in 20, is 1.96 or nearly 2; it is convenient to take this point as a limit in judging whether a deviation ought to be considered significant or not. Deviations exceeding twice the standard deviation are thus formally regarded as significant. Using this criterion we should be led to follow up a false indication only once in 22 trials, even if the statistics were the only guide available. Small effects will still escape notice if the data are insufficiently numerous to bring them out, but no lowering of the standard of significance would meet this difficulty.

This answers why an alpha of 0.05 was chosen and subsequently why researchers want a p-value lower than said cut off. Indeed, there was some justification of using 0.05, but that particular value is still quite, or even completely, arbitrary.

>We are not interested in the logic itself, nor will we argue for replacing the .05 alpha with another level of alpha, but at this point in our discussion we only wish to emphasize that dichotomous significance testing has no ontological basis. That is, we want to underscore that, **surely, God loves the .06 nearly as much as the .05**. Can there be any doubt that God views the strength of evidence for or against the null as a fairly continuous function of the magnitude of p?

Consequently, it would be preferable to view the evidence against the null as a continuous function instead of a binary decision based on some sort of level of significance, but it’s not surprising that we ended up the way we did if you know the surrounding contexts in which Fisher, Neyman and Pearson invented null hypothesis testing. Having a controlled long run proportion of false positives is quite useful in manufacturing and quality control, but it might not be as suitable as the sole focus for groundbreaking scientific research.

This answers why a certain level was chosen, but what a p-value actually is and why it’s almost always described and interpreted incorrectly (there’s several examples in this thread alone) is a different question.

Answer 6 · 2023-05-11T09:17:35+00:00

The p-value is the probability that if there were a parallel universe with everything the same and the experimental variables you’re altering do nothing at all, then in that parallel universe you’d get a signal the size of the one you’re looking at or bigger.

5% is considered close enough to go have a better look. 1% means you might well be onto something. 0.05% means you probably have something. You can keep going.

These numbers are chosen because of how random numbers behave. The sum of a huge number of completely random numbers that don’t have anything to do with each other is normally distributed. The fewer reasons your data truly is the way it is, and the more those reasons depend on each other – or if they multiply rather than add – your random number stops being normal and you should use different stats to work out whether the thing looks true or not.

Answer 7 · 2023-05-11T10:25:34+00:00

True story: when Pearson made the first tables for the first statistics textbook there were two of them. The tables were for 0.05 and 0.001. The widespread use of those values denoting significance is tied more to tradition/conversation than anything else. Researchers used those figures because they were available, now academic worship them. Pretty hilarious actually.

Answer 8 · 2023-05-11T12:25:02+00:00

For arbitrary reasons. Today, given everything we know about easy frequentist statistics traps and caveats to fall into, p-hacking, “researcher degrees of freedom”, the replication crisis, etc. I’d honestly give the advice of ignoring anything above 0.001, and being suspicious of anything above 0.0001. And if you’re in a position to write a paper, pre-register, and use something more reasonable like Bayesian statistics.

Answer 9 · 2023-05-11T16:33:33+00:00

Short answer: if the test was run properly, the p-value indicates that there is a real difference between groups and is not due to chance the closer it is to 0. P-values closer to 1 indicate that the difference of individual data points between groups is more likely chance and within normal variability.

AnswerCult

Why do researchers choose to use the “P-Value” rule in data analysis?

9 Answers

Search questions

Popular Questions

Latest Answers