In statistical analysis we use a system of 2 hypotheses and probability values. The hypotheses are known as the null and alternate hypothesis. The null hypothesis is simply the statement that IF x is true then there is no pattern. The alternate says IF the null isn’t true then there may be a pattern. The p-value is simply the likelihood we place to the data we record. In theory the data we collect should follow a bell curve. Meaning low values and high values occur less frequently. Out p-value is the line we draw saying “if we have too x amount of our data above or below this, then there’s too much noise to call this a pattern. ”
Typically we set a p-value of .05. meaning that is 95% of all the data we collected falls within our expected range then the alternate hypothesis is true and there may be a pattern. If greater than 5% of the data we collected falls outside our expected range then there is no pattern.
P-hacking is setting an inappropriately high p-value or changing your p-value after the fact to reject your null hypothesis. Or it’s stopping the collection of data at an inappropriate time because your data is beginning to approach your p-value.
It’s academically dishonest and let’s you make claims about patterns that may not be present.
Latest Answers