Hypothesis testing and p-values are really a simple concept at their core: if we assume X is true, then what is the probability of observing something like Y?
The assumed “X” is the null hypothesis. You can think of it like the “default” state; researchers will often use a null like “no relationship exists between A and B” (although there are many other kinds of null hypotheses besides this). The “probability of observing something like Y” is the p-value.
Now, if the probability of observing something like Y is considerably low after assuming X, then we generally infer that X is “unlikely” to be true. After all, if X was true, then Y probably wouldn’t have happened. This is called “rejecting the null hypothesis”, where we state some reasonable confidence that X isn’t true. Of course, we could be wrong about this; however, the *brilliance* of hypothesis testing is that we control *exactly how often* this situation occurs. There is a bit of math involved here, but suffice to say this: by only rejecting p-values below 5%, that means we only make this mistake in **at most** 5% of cases (in the long-run). More specifically: rejecting p-values below 5% means that we mistakenly reject the null in less than 5% of experiments. This bolsters confidence in our conclusions, since we can be sure that there is a guaranteed error rate (otherwise, how would we be able to trust results from a process without guarantees?).
Some notes:
– There is another kind of error, where we don’t reject the null hypothesis when we should. The current hypothesis testing paradigm doesn’t explicitly provide guarantees on this type of error, although it can be measured and quantified. For reference, these are called “type 2 errors”, whereas mistakenly rejecting the null is called a “type 1 error”.
– When we calculate p-values, those calculations *assume that the null hypothesis is true*. This is why p-values cannot be interpreted as “the probability that the null hypothesis is true”: the p-value already assumed that was the case. Remember, the p-value is only measuring “the probability of observing something like Y (assuming that X is true)”. It does not make any commentary on the likelihood of other realities besides X, nor does it discuss the likelihood of Y under those alternate realities.
Latest Answers