In the simplest possible terms, it’s the likelihood that whatever your experiment is testing isn’t making a difference to the result. The lower the p-value, the more statistically significant your results are.
Basically, it’s the answer to the question: “How sure are we that the result is due to the experiment and not due to things that we aren’t testing?”
You can see a pattern due to random luck and you could misinterpret it to suggest some underlying factor that isn’t really there. P-value measures how likely (or unlikely) it would be for this particular result to appear just by random chance. The smaller it is, the more likely that the result is meaningful and not just lucky.
Imagine you give a drug to 2 people who are moderately sick, and they both get better. It’s totally possible they both got lucky and would have gotten better anyways without the drug. It’s going to be really hard to tell with only 2 people, so if you analyze the P value you would find it’s likely high, indicating there is a large chance you just got lucky and you can’t take any meaningful lessons from that study.
However if you don’t give 1000 people a drug, and find only 20% get better on their own, then you do give 1000 people a drug and 80% get better, that’s a very strong pattern outside the “random luck” behavior you were able to observe. So if you analyzed that P value it would likely be small, indicating it was more likely that the drug really did cause this result, and it wasn’t just luck.
So you want to test something, let’s say that some value B is equal to 0.
You don’t observe this B. You observe data from which you calculate a statistic that measures this value B. We will call this statistic Bhat
Imagine the real actual value of B is indeed 0. We will call that our null hypothesis.
From the data that we observe we calculate Bhat. Let’s say it’s 0.5. Now we ask this : what are the chances of Bhat being equal to 0.5 when the real value of B is 0 ?
This probability is your p-value.
Basically what is the probability of having these datas if we’re in a world where our null hypothesis is true.
If this p-value is too low, that means we would take considerable risk in saying “B is different than 0 !”. If the p-value is high we can say that our data is consistent with B=0 and so we don’t reject this hypothesis.
When you test or experiment on something the idea is very typically to have 2 complementary assertions (or hypothesis). Say you’re trying to discover if factor X has any effect on outcome Y.
Null hypothesis: X has no impact on outcome Y
Alternative hypothesis: X has an impact on outcome Y
Experiments or samples are taken to determine which of these are likelier to be true – and this experiment results in outcome Z. To be conservative, we start by ASSUMING that the null hypothesis holds or is true. The p-value measures “how likely am I to achieve an experimental outcome Z assuming the null hypothesis is true”.
A low p-value means that outcome Z is less likely to occur if the null hypothesis is true. In other words a low p-value gives credence to the idea that the alternative hypothesis is more explanatory of outcome Z.
Say you’re flipping a particular coin, you think the coin is not a fair coin. An experiment is conducted where the coin is flipped 1000 times. The null hypothesis is “the coin is fair” and the alternative is that “the coin is unfair”.
If the outcome is that there were 501 heads and 499 tails, you will get a p-value that is pretty high. This means that this particular outcome is rather likely if the coin is fair. If the outcome is that there were 700 heads and 300 tails, you will get a very low p-value. This indicates that the null hypothesis is less likely to be true and the alternative hypothesis “the coin is unfair” is likelier to be true.
ELI5:
Let’s assume you believe that a medicine helps against a disease.
So you do an experiment and give 10 sick patients your medicine.
Usually 50% of the patients die. With the medicine in your experiment, only 40% die. So does your medicine actually help?
50% death rate is only an average over a large group and 10 people is a small group. By pure luck, sometimes only 40% die even without any medicine. The p value is the probability that a result happens by pure luck without the effect you want to test.
So a large p-value means: this result happens often, even without any medicine. Your result is not very significant, it doesn’t say a lot about your medicine because it happens all the time.
A small p-value means: this result happens very rarely without medicine. The result is significant and at least a hint that the medicine might actually help, because if it doesn’t this is an unlikely result.
Say I have a coin and i want to know “is this coin fair?”
I toss the coin 100 times and it comes up heads 60 times and tails 40 times. Intuitively this seems kind of close to fair, but also a bit skewed. Was this just random variance? Or is this a large enough sample size that a 60-40 split is alarming?
P values give a way to reason about this scenario by asking “if the coin *is* fair, how unlikely is this result?” It turns out that in this case it’s about a 2.8% chance of getting 60 or more heads (and similarly for 60 or more tails).
It’s at this point that people tend to misinterpret p values. The statement people *want* to be able to make is “there is a 2.8% chance that this coin is fair,” but p values do not allow you to make that statement, at least on their own. The p value only says “if the coin is fair then you’d see this result 2.8% of the time.”
Turning a p value into the probability that some hypothesis is correct generally requires knowing some unknowable information. In this toy example that information would be the probability that coins are fair which may be knowable for the right setup, but for more real-world applications it could be something like “the probability that another subatomic particle exists with XYZ properties” (where that probability is either 0 or 1, but we don’t know which). This makes p values somewhat frustrating since they’re so close to making the statement we want, and yet getting that final inch is out of reach.
What p values are very well equipped for is stopping you from publishing results as significant if it turns out you just got lucky. If you took a threshold of p < 0.05 then you might declare that the coin is unfair, but with a more stringent threshold like p < 0.01 you’d declare the test to be inconclusive. With a threshold of p < 0.05 what you’re saying is that you’re OK with calling 1 in 20 fair coins weighted, regardless of how any weighted coins get judged. Different disciplines tend to set p value thresholds at different levels, based on the available data collection. For example, particle physicists like to aim for p < 1/1,000,000 or lower.
p-value (probability value) is a representation of how likely it would be that a test statistic would arise when your null hypothesis is true.
So, I say “gravity doesnt exist” as my null hypothesis.
if my p-value was say .95, it would mean there’s a 95% chance that my test statistic could be produced if gravity didn’t exist. if it was .05 then my statistic only has a 5% chance of being possible if it were true gravity didn’t exist, so the data in my sample must have came about because gravity does exist.
The simplest way to think about it is that it’s the probability of a thing.
In common stats usage, it’s the probability that your null hypothesis is true. Most of the time, your null hypothesis boils down to the idea that two samples came from the same population. That is, that the two samples are not different. (More precisely that they have different means.) Typically, the whole reason you are doing the test is that you kinda suspect that your samples are different so you kinda hope that null hypothesis is wrong.
So the p-value is the probability of the thing you don’t want. That’s why people go looking for very low p values. Why .05 is the common cutoff? That’s a whole ‘nother issue.
Latest Answers