: p-value in statistics


: p-value in statistics

In: 10

The p-value is basically a measure of how likely two different groups of data are actually different.

Let’s do a sports example. Player A shoots 10 free throws and makes 4. Player B shoots 10 free throws and makes 5.

Now, we could argue that Player A is better than Player B. However, given the small amount of free throws, it’s entirely possible that they’re equally as good. If I asked them to take another 10 free throws, A might make 6 and B might make 4.

When analyzing Player A’s free throws and Player B’s free throws, we want to estimate the likelihood that they’re equally talented. This is the p-value. The p-value is the likelihood that both players are equally skilled. So if you see a p-value of 0.6, that means there’s ~60% chance they’re equally skilled.

Now, this is an oversimplification. The exact definitions from a mathematical standpoint are different and more specific. But this example illustrates the general idea.

EDIT: As u/banana_stand_savings has indicated, there becomes some issues with my explanation when you’re actually doing research. P-values are often misinterpreted, and my explanation isn’t an exact definition. [Here is the Wikipedia page on the topic.](https://en.wikipedia.org/wiki/Misuse_of_p-values)

The p-value is the probability of a random sample producing an estimate of the same value *or more extreme* than the estimate produced by the actual sample. If you study mathematical statistics you’ll learn more about the assumptions in that statement.

A p-value is related to the null and alternative hypothesis. Those sound weird but they’re a little different than a hypothesis you might have seen in science class.

A null hypothesis says that there is nothing special about the relationship between two things. (Ex: Eating sugar does not increase obesity)

The alternative hypothesis says that there is something special about the relationship. (Ex: Eating sugar does increase obesity)

Now, we have to collect a lot of data from a bunch of people eating different amounts of sugar and seeing if they gain weight. But data is rarely clear. So they use what’s called a t-test to see if there is a legit difference between the group of people that ate sugar and the group of people that didn’t. The t-test spits out a number. This number, between 0 and 1, is compared to the p-value, a threshold. If the p-value is greater than 0.05, you reject the alternative hypothesis. So if we did this experiment and found that our p value was .06, we could say that random error or chance could cause variation of some people being obese and some not. Now we *could* set the p-value at 0.3 or 0.8, but it’s kind of defaulted at 0.5.

This doesn’t mean that the experiment failed or that the other side is “wrong”, it just means that in a world where sugar does not lead to obesity and the two sets of people were the same, here’s how likely you are to see the same results you got. So when that number is really small (p-value less than 0.5), it’s really not expected. (Note: this doesn’t mean impossible, just improbable)

Houses in my neighborhood cost on average $500,000. Houses in your neighborhood cost on average $400,000. Is my neighborhood’s housing more expensive than yours? Looks like it, but we need to be more analytical than that. If you run a statistical test and see a p value of less than 0.05, then it’s OFTEN CONSIDERED safe to say that my neighborhood is more expensive than yours.

For folks who still find these explanations overwhelming, a very small p-value is an indication that the data is true. The smaller the number and closer to zero, the less doubt there is.

So if you see a statement like “four out of five people with blonde hair are actually brunettes who bleach their hair (p<0.1)” that is not a *great* indication that most blondes are faking it. If that number changes to (p<0.0005), that’s actually a good sign that it’s true (I made those stats up, not real, obvs).

The important thing to know is that the p-value is derived from the existing *data*, and how the pure numbers relate to each other, not reality. If you knew that the above second statistic was based on information gathered at a hair salon that specializes in hair coloring, then maybe you can’t apply that factoid to the real world, just people who go to a salon.

So, a p-value is basically quality control of a data set and its result. But you need to know how the data was obtained and under what conditions before you should just accept a statement because of an accompanying small p-value. In science we are trying to move away from p-values for many reasons, but partly because you can hack the data to get a number you want and therefore claim whatever you want. “The p-value is good so it must be true!”

The p-value is the chance of a result to be due to statistical chance/randomness alone.

It is measured on a scale from 1 = 100% to 0 = 0%. 0.02 would be 2%. 0.002 = 0.2% etc.


1)”Eating tomatoes causes lung cancer (p-value: 0.8)”

That means, there is a 80% chance of a observing a result like this if we just assumed randomness/coincidence. Hence, the result seems pretty unconvincing.

2) “Smoking causes lung cancer (p-value: 0.008)”

That means there is a 0,8% chance of obersving a results like this if we assumed randomness/coinicidence. This is way more convincing.



As the comments rightfully pointed out, this is a simplified case where we assume the null hypothesis to be randomness. This is the most common null hypothesis by a margin. However, it’s true that the null hypothesis could be different. A more precise definition, I enjoy: “The P value means the probability, for a given statistical model that, when the null hypothesis is true, the statistical summary would be equal to or more extreme than the actual observed results” ([Source](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5665734/#B2)). It’s just not very 5-year old friendly….

You want to check if Superman is better than batman at winning the bad guys, so you read one hundred comics of each and take note of how often they win. You note that superman wins 75% of the time and batman 70%. However that’s not a big difference and you haven’t read every single comic and watched every single film, perhaps if you read another one hundred comics, the results would be different. So we need to check how sure we can be that it’s not due to chance.

You need to run some kind of test to know that. First you make a statement that you’re trying to reject (superman and batman are equally as strong). This is your null hypothesis and, for the moment, we will assume it is true because we don’t know any better. Then you make a statement that contradicts the null hypothesis and is what you’re trying to prove (superman is stronger). This is your alternative hypothesis and it’s called alternative because, as I said, we assume the null hypothesis is true until we know better. Thus, you run a statistical test (in this case a T test, but that’s not that relevant right now). This test gives you a p-value which is the odds of getting such a result or a more extreme one of the null hypothesis is true. In our case, it tells you the odds of Superman beating the bad guys at least 5 points more often than batman assuming that they are both as likely to do so. So, if your p-value is 0.04, it means that there’s a 4 out of 100 odds. Very unlikely so the alternative hypothesis is way more likely to be true. If the p-value is larger than a certain random threshold (usually between 0.01 and 0.1) you decide that the odds are big enough for the null hypothesis to stand a good chance of being true and you “fail to reject the null hypothesis” which means “maybe the alternative hypothesis is true but I can’t tell for sure”.

I hope that helped.

Seems like most of these answers lost the ELI5 aspect of this:

Let’s say we’re playing a game- flip a coin, heads I win, tails you win. We play 10 times and I win 9 and you win 1. You think I’m cheating. But what’s the chance (probability) that we’d get those results or better (for me) if the coin was fair?

That probability is the p-value. If it’s a low enough chance, then you would assume the coin wasn’t fair (and “reject the null hypothesis”)