what “statistical power” actually means, why one statistical test would be more powerful than another when applied to the same data, and when one might want to use a more or less powerful test.

1.02K views

I have a science background, but stats has always been a weak point for me. The tests I’m thinking of specifically are Fisher’s Exact Test, Barnard’s Test, and Boschloo’s test, but I’d like to understand the concept generally. Fisher’s is generally the go-to standard for my field, but from my understanding, Barnard’s is *sometimes* more powerful than Fisher’s, and Boschloo’s is *”uniformly”* more powerful than Fisher’s. To my not-understanding brain, that sounds like Boschloo’s should have long since made Fisher’s obsolete, so I’m looking for clarification on what “power” actually means, as well as why something like Barnard’s could be more powerful in some cases but not in others.

In: Mathematics

2 Answers

Anonymous 0 Comments

To quote [Wikipedia](https://en.wikipedia.org/wiki/Power_of_a_test):

> The power of a binary hypothesis test … is the probability that the test correctly rejects the null hypothesis when a specific alternative hypothesis is true.

The examples you are using are for contingency tables, so finding if there is a link between the categories of data. Your null hypothesis would be that there is no link, your alternative hypothesis would be that there is some link. When you do the test you get a p-value, which is the probability of getting the data you have if the null hypothesis is true.

So there are four outcomes:

1. there is a link, and the test says there is a link,
2. there is a link but the test says there isn’t a link (Type II error / false negative),
3. there isn’t a link but the test says there is a link (Type I error / false positive),
4. there isn’t a link, and the test says there isn’t a link,

The power of the tests tells you about 1. It is the chance – if there *is* a link between the categories – that the test will give you that result.

This isn’t always the most important thing to care about, and there are different measures of how useful a statistical test is, but there will be situations where it matters.

So the idea of tests being more powerful is that a more powerful test will do a better job of correctly saying there is a link (or correctly rejecting a null hypothesis). Or to put it another way, a less powerful test is more likely to give you a Type II error (a false negative).

With Fisher’s and Barnard’s, there will be some situations (from reading Wikipedia, this might be when the underlying sampling distribution is hypergeometric) where Barnard’s test will be more powerful – i.e. if you apply both tests to the same data you are less likely to get a Type II error from the Barnard’s Test. But other times Fisher’s test will be more powerful.

As to why you would use a less powerful test, I haven’t studied any of these tests in detail I can think of a couple of possibilities; firstly power isn’t necessarily the most important factor – it might be more important to minimise Type I errors than Type II errors, or focus on things like sensitivity and specificity. Secondly, it might be that the more powerful test is harder or more time-consuming to run.

You are viewing 1 out of 2 answers, click here to view all answers.