I always have trouble making sense of noninferiority studies. If treatment B is noninferior to A, I can understand that. But when it says B is NOT noninferior to A, why can’t they just say B is inferior to A? What’s the difference?

Example of such a trial: https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(22)00537-2/fulltext

In: 0

With any trial of a drug or placebo there is a likely effect estimate and a confidence about that estimate. If another drug or placebo is tested and the effect size falls clearly above a margin for the first estimate, then all you can say is that it was not worse. Since you are not comparing them to see which is better. There are superiority margins sometimes that people define. But often the way a study is conducted makes this kind of assertion problematic.

It’s complicated, but non-inferiority trials are done when it would be unethical to give people placebos. For example, consider a promising new cancer drug. In a standard superiority test, you’d compare the efficacy of the drug to a placebo. It would be unethical, however, to give sugar pills to cancer patients. So you do a study (often a meta study) to show that the new drug is *at least* not inferior to existing treatments.

The study was designed from the start as a non-inferiority study. That means there is a single yes/no question at the heart of the statistical analysis: “Is B non-inferior to A?” For this study, “non-inferior” means “B is at least 88% as good as A.” It’s actually *less* confusing to consistently use the terminology “non-inferior / not non-inferior” especially because we’ll also be talking about p-values and confidence intervals.

When we measure the value of anything, there’s always some uncertainty. We can’t ever say a drug is exactly 84.0% effective; we can only say that we’re 95% confident that it’s between 80% and 88%, say. In a study like this, we’ve measured two numbers – the effect for A and the effect for B – both with some degree of uncertainty. Let’s say we think A is between 80% and 90% effect, and we think B is between 85% and 95% effective. Is B *better* than A? We can’t say for sure; their confidence intervals overlap too much. The p-value of the hypothesis test that the difference in means is not exactly zero would be around 0.25, higher than the 0.05 usually used in scientific papers, so we wouldn’t be able to reject the null hypothesis. Or, in other words, even though B *seems* better than A, the difference is too small to be statistically significant. However, if we ask the question, Is B *non-inferior* to A, then we can get a clear-cut “yes” to our question with a p-value less than 0.05.

So when they say “B is NOT non-inferior to A”, they’re not saying B is *definitely* inferior to A, they’re saying they weren’t able to prove beyond a reasonable doubt that B is non-inferior to A because the difference wasn’t statistically significant. Maybe that means B is inferior to A, but maybe it just means the study was underpowered and needs to be repeated with a larger sample size.

The most common reason to use a non-inferiority study design is that you suspect both interventions have roughly the same effect, but one has other desirable qualities, such as being a lot cheaper, having fewer side effects, or being tolerable to patients who have an adverse reaction to the other. In these cases, it’s not necessary to prove beyond a reasonable doubt that B is statistically significantly better than A, or even as good as – if it’s even in the same ballpark, it’s good enough. The non-inferiority design requires considerably smaller sample size and is therefore cheaper and faster to conduct.

This is a great discussion:

http://www.nephjc.com/news/2019/7/8/understanding-the-vortex-of-non-inferiority-trials

Choice quote: “Inability to prove non-inferiority does not conclude that the test intervention is inferior, it only means that it is “not non-inferior”. This outcome could be a result of an underpowered trial.”

So, statistical power is key. If you don’t get enough participants to reduce the influence of random noise, you can’t trust your results. If both study arms only had 3 patients complete the trial, then who cares if you didn’t show noninferiority? Besides “showed inferiority” and “showed noninferiority,” the third possibility is “didn’t show a damn thing.”

This is relevant to noninferiority trials, which don’t need as many participants because you’re only trying to show “eh, the new thing isn’t all that bad,” which takes fewer people to prove than “the new thing is this many points better than the old thing.”

The other key thing is that the margin of noninferiority is usually arbitrary/a judgment call. A study can fail to show noninferiority to the margin they used, but the new treatment can still look good enough to be worth talking about. One great example:

https://www.nejm.org/doi/full/10.1056/NEJMoa1502599#:~:text=For%20the%20treatment%20of%20chlamydia%20infection%2C%20the%20Centers%20for%20Disease,twice%20daily%20for%207%20days.

It compares an antibiotic that cured about 97% of chlamydia with a single dose, versus one taken twice daily for seven days that cured 100%. It narrowly missed the preset margin for noninferiority, yet which would you choose for a patient who never remembers to take medicine?

TL;DR: Failing to show noninferiority doesn’t show inferiority, because it might have just not showed a damn thing.

## Latest Answers