Eli5: What is the Simpson’s paradox in statistics?

315 views

Can someone explain its significance and maybe a simple example as well?

In: 3585

14 Answers

Anonymous 0 Comments

Say we want to see whether a medicine is effective at preventing heart attack in elderly populations. We see that among those taking the medicine, 5% suffer heart attacks compared to 3% of those who don’t. Seems like the medicine is counterproductive right?

Say you look deeper in the data and find that among those with high risk factors, 20% of those without the medicine suffer heart attacks compared with 6% that do take the medicine. Meanwhile, among those without high risk factors, 2% who don’t take the medicine suffer heart attacks, while 0.2% who take the medicine do. That means the medicine reduced the rate of heart attacks for both high risk and low risk people! However, an overwhelming majority of high risk people take the medicine, compared with maybe half or so of the low risk people. And since high risk people have such a higher baseline of risk, this means that those taking medicine are more likely to get heart attacks than those who don’t even though the medicine itself makes them less likely.

Tldr: Simpson’s paradox is when a correlation reverses itself once you control for another variable.

Anonymous 0 Comments

[removed]

Anonymous 0 Comments

Prime example is the widespread adoption of metal helmets by soldiers during WW1 lead to an huget increase in the number of soldiers hospitalized with head injuries. At first blush it would seem that the helmets caused more head injuries but the number of soldiers dieing of head trauma on the battlefield significantly decreased.

Anonymous 0 Comments

A good example would be through wage statistics. Overall since 2000, the US population makes 1% more now than they did back then. However, when you look at every category of education level such as high school dropout, high school diploma only, some college, Bachelor’s degree or higher, every category had their wages decrease. Despite everyone making 1% more overall, each individual category decreased. How is this possible you might ask? Simpsons paradox is the explanation.

The answer lies within the data itself. Now there is a much higher group of people that have a Bachelor’s or higher and on average earn more overall. They moved from one group such as high school diploma only to college graduate where the average income is higher. This is despite the fact that the average income for Bachelor’s or higher still went down, just that there are more people in the category now.

It is significant because you can draw multiple conclusions from the same exact set of data. One person can say wages went up overall (which they did) while another can say that they went down overall (which they also did for each category). Simpsons paradox can give multiple correct or seemingly opposite answers when looked at a different way.

Anonymous 0 Comments

Here’s an example just about anyone asking this question is familiar with:

Suppose both your math and your science classes assign grades based (only) on a combination of test scores and homework grades.

In your math class, test scores are weighted 80% and homework is weighted 20%. Your test average was a 90% and homework average was 50%, so your overall grade is .9*80% + .50*20% = 82%.

In your science class, test scores and homework scores are both weighted 50%. Your test average was 94% and homework average was 60%, so your overall grade is .94*50% + .60*50% = 77%.

So even though you scored better in science in both test taking (90%->94%) and homework (50%->60%), the change of weights made your overall science grade lower than your math grade (82%->77%).

Anonymous 0 Comments

[removed]

Anonymous 0 Comments

I feel like this paradox is what causes some people (especially old people) to be hesitant to go to the hospital sometimes. They claim that their friends all died at the hospital, so theoretically, if they don’t go to the hospital, they should not die. The problem with that is the old person AND their friends that have died already have a higher chance of death due to old age and chronic illnesses etc. so while it may look like going to the hospital may end up with him dead, his/her chances outside the hospital are most certainly worse. This is just one way this paradox plays out.

Anonymous 0 Comments

Let’s say that you want to know whether Steph Curry is a better shooter than Shaq. Curry makes 3pt shots at a better rate than Shaq, and (let’s say) Curry also makes layups at a better rate than Shaq.

The paradox is that while Curry is a better shooter than Shaq in both categories, Shaq has a better combined shooting rate than Curry. The explanation is because Shaq takes way more layups than 3pt shots, and layups overall are higher percentage than 3’s.

In other words, Simpson’s paradox is when you’re measuring something that looks better in both Group A and Group B individually, but looks worse in when combined. It happens because there’s more of one group than another when comparing across treatments.

So it’s an example of confounding variables where you need to identify the groups that are secretly influencing your comparisons between treatment and control.

Anonymous 0 Comments

Recent example. Covid vaccine in Israel. Majority of people were vaccinated and a small portion of the population was unvaccinated. Antivaxxers pointed out that people in hospitals were mostly vaxxed and therefore the vaccine doesnt work right? Well DUH of course when almost everybody is vaxxed then they are the ones who get into the hospitals. The vaccine was still helping save lives. It’s like saying 100% of humans who breathe air DIE so air is poison! Thats the paradox you need to look at the data the right way.

Anonymous 0 Comments

This video parody that my professor made explains it in a very easy and funny way: https://youtu.be/nGqzoqXZch0