Imagine you have a virus that infects 50% of the test mouse population. The first important thing to understand is that there is always a variance. It means that with this virus, if you have 10 mice, you expect that always exactly 5 mice are infected, but in reality some days the mice are lucky and only 4 are infected. Or, very rarely only 3. Some other days 6 or rarely 7 will catch it.
Now let’s assume you have a cure that prevents the infection. But the cure is not 100% efficient, it only saves *some* of the mice. Now you see the same principles apply. Let’s say your cure saves 2 of the 5 infected mice, so now the virus will infect 3 out of 10. But it can be exactly 3 or sometimes just 2 or 4. Variances apply.
Now you try out your cure and treat 10 mice with and 10 without. Let’s say the cure-treated group has 3 infected and the untreated group has 6 infected.
Now the question is the following. How sure you can be that the cure *really* works? Even without the cure, sometimes you get only 3 infected mice. Sometimes you get one lucky cage and one unlucky cage.
The answer is that you never really know for sure that the cure really really worked. However you have mathematical tools to figure out how *unlikely* it is to get a cage of 3 versus a cage of 6. Let’s say your calculations tell you that if you did the experiment 100 times and your cure does nothing, then only 5 times you would get this 3+6 results. That’s your p-value,the 5 out of 100,or 1 out of 100 or whatever.
Long ago scientists agreed that 5% is an okay risk. Which means that although you don’t know for sure that your cure works, you accept it as a risk. Maybe you had a very lucky cage of mice and a very unlucky cage, but you dismiss this option because such a setup is very unlikely. Such unlikely that your cure most probably works.
Now as you see it means that 5 out of a 100 cures actually don’t work. We just don’t know which 5 it is. Because in 5 cases the mice were just simply lucky but it’s impossible to tell apart from the working cure.
So what is p-hacking? Let’s assume you are a scientist but now you have 100 candidate cures for the virus. You want to test which ones work so you take 10 mice for each and in some cases you have 3 uninfected, or 4 or 5 etc. Let’s say you capture 5 of the 100 candidates that worked.
But did they really? As you see, mice are just simply lucky sometimes. Actually if you use pure water as a cure, maybe 5 out of 100 cages would *look like* a cure that works.
Problem is that a lot of scientists don’t understand that doing a mass experiment and taking the 5% that worked is a problem. They may genuinely believe that they captured 5 out of 100 good cures and they don’t realize that this is illegal. You can always have 100 experiments and just have random 5% that seem to work. That’s why when you do a mass experiment, you don’t calculate p-value one by one as if they were independent. Instead you have to combine them and do a different math.
If someone does not combine the calculation and does not tell about the 95 failed experiments and only tells about the 5 successful-looking ones, that is p-hacking. As you see, it is very difficult to tell if someone kept something secret so you never know if a result is genuine or hacked.
Latest Answers