Scientific findings are based on data, typically a small sample of all of the data it would be possible to collect. The eternal anxiety in science is that if you had, by chance, collected a slightly different dataset, you may have found a different results or no interesting result at all.
In rides Statistics to the rescue. It provides a way of saying how likely that scenario is. In particular, it can provide a “p value” which is the probability that a given result is big simply by chance. By convention, a result is “statistically significant” if this probability is less than 1 in 20.
P-hacking is the practice of abusing this convention. The simplest way would be to run the same experiment 20 times (or more). Even if the hypothesis of the experiment is false, this gives you a decent chance that one of the experiments will turn up a statistically significant result simply by chance. More subtly, you can test 20 different hypotheses. Odds are at least one will be statistically significant, and you can make that the focus of the paper.
In an era of big data and fast computers, p-hacking has become common and easy. There’s always a different subset of the data, outcome, or covariate specification to test, and it’s quick to code and run those statistical tests all at once. A red flag for p-hacking is when a paper has results that are “spotty” in that only certain specifications see (often contradictory) effects for no good reason. e.g. “Chemical X was found to induce hair growth in lactating mothers but hair loss in men over 60.” A good defense against p-hacking is asking researchers to “pre-register” what hypotheses they will test and why *before* they get the data, then only accepting papers that adhere to those pre-determined plans.
Latest Answers