When study statistics results are reported, what does it mean when authors say “results upon controlling for XYZ factors”?

291 views

I don’t fully understand what controlling for a factor in a experiment means, especially when it comes to real world studies with large number of people in the trials. For e.g. ” Yogurt consumers had a higher DGAI score (ie, better diet quality) than nonconsumers. *Adjusted for demographic and lifestyle factors and DGAI*, yogurt consumers, compared with nonconsumers”
Looking for an intuitive way to understand what controlling for factors means.
Thank you in advance!

In: 3

7 Answers

Anonymous 0 Comments

There’s a lot of ways to do it, but ultimately it means trying to do what you can to make it so those differences don’t matter.

One of the cheapest/easiest ways to do that is to make it so both groups have a roughly equal makeup of people for the factors you are trying to control. So you mention:

>Adjusted for demographic and lifestyle factors

This likely means that both groups the yogurt consumers and the yogurt nonconsumers had a similar makeup. The same percentage of low-income earners, high-income earners, old people, young people, etc. And for lifestyle, it’s probably talking about how often people work out or stuff like that. So you got the same mix of sedentary people and active people in each group.

So in this case you want basically the only difference between each group to be whether they eat yogurt or not.

This means that any differences in the data are unlikely to be the result of the demographics or lifestyle because each group basically had the same demographics and lifestyle make up.

Obviously, this isn’t perfect and you rarely get the *exact* same makeup. But a good research team will do there best to get as close as possible to that.

Anonymous 0 Comments

Hopefully someone can speak to the details of HOW they do it, but I can at least answer the WHAT part of the question. A study is trying to look for how thing A relates to thing B, but there’s usually also things C-Z that relate to A and/or B. Controlling for those things is trying to take them out of the answer as much as possible to isolate just A and B as best they can.

For example, say I did a study to see if owning a dog made you healthier. I might find that people who own dogs take more walks. To see if it’s really the dog making them healthier, I also want to look at people who do those things and don’t have a dog. Correcting for those variables would be looking at whether people that take as many walks as dog owners but don’t have a dog. The more variables that you can make the same, the better the chance that it’s your studied variable tied to the studied effect.

Anonymous 0 Comments

You simply include/exclude people of particular categories from the dataset and/or measure the effect on just that group of people.

If you identify a statistically significant difference in outcome between groups, you can mathematically subtract this influence from the origninal result in proportion to the relative size of the smaller group within the whole dataset.

Anonymous 0 Comments

Sometimes you know that a group has a certain advantage, so you can correct for that to “get it out of the way”.

**Dumb example: God decides to bless Alabama by letting everyone in the state live five years longer.** (I’m using numbers from [https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_life_expectancy](https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_life_expectancy) for this.) Alabama’s life expectancy jumps to 80.5.

New Jersey’s life expectancy is also 80.5. An AL person says to a NJ person, “looks like our health care’s just as good as yours”. Can you see what the NJ person’s going to say?

(Quick pause for all the New Jersey jokes.) Probably something like “without those extra 5 years God gave you, you’d be at 75.5, so no, your health care is nowhere near as good”. Subtracting those 5 years off would be called “controlling for God’s favor”, and it means we know God favors you, but we want to look at factors other than that.

This is important for stuff like (totally making up the numbers here) if you know that a minority usually does 6% worse on some academic test, and you find a district where they only do 2% worse. If we “control for their minority-ness”, we’d say “wow, that’s 4% better than we’d expect to see, we should check out what that district is doing”.

Anonymous 0 Comments

Let’s pretend that we think owning a super-fancy-super-speedy sportscar is somehow “good for your health” so we do a study of a million people’s car purchases and compare that to their ages at death.

Out of 1,000,000 folks only 10 had super-fancy-super-speedy sportscars, and on average they lived to be 85 years old. The other 999,990 people did not have super-fancy-super-speedy sportscars, and they lived to be 75 years old on average. Wow! Super-fancy-super-speedy sportscars “owners” live 10 years longer on average than “non-owners”! We have confirmed our hypothesis!

Or have we? What else could have caused the difference? Well, for one thing, super-fancy-super-speedy sportscars are super expensive… so what happens if, instead of just “owner” versus “non-owner”, we also study income tax data so that we can take “rich” versus “not-rich” into account?

In that case we might find that there are 10 “rich owners”, 10 “rich non-owners”, 999980 “non-rich non-owners”, and 0 “non-rich owners” of super-fancy-super-speedy sportscars. So what were the average lifetimes for each of those sub-groups? The “non-rich non-owners” are still 75 on average, there are no “non-rich owners”, the 10 “rich-owners” lived to 80 on average, and the 10 “rich non-owners” lived to 90 on average… so what does any of it all mean now?

Well, when wealth is taken into account… it looks (on average) like the “rich” live longer than the “non-rich” regardless of whether they bought a super-fancy-super-speedy sportscar or not. Furthermore, if we compare only “rich non-owners” versus “rich owners” we see that (adjusted for “rich”) the “owners” actually live 10 years shorter on average than the “non-owners”!

So, “controlling for XYZ… we find ABC” (as a summary statement) essentially means that the researchers made sure to separate groups into smaller subgroups (based on XYZ) to check whether the statistical effect (ABC) still persists. It doesn’t mean the study accounted for any-and-every possibility (maybe their critics believe they should have considered IJK, too) but it does mean that they did the work to show, that at the very least, that ABC isn’t just XYZ in disguise (the way “owners” was actually “rich” in disguise in the silly sportscar example).

—————————–

As for what “…demographic and lifestyle factors…” specifically means, you’d have to dive into the methodological details of the specific article. Maybe they were really really overly thorough and broke out a multitude of sub-sub-sub-groups down based on gender, age, wealth, ethnicity, sexual orientation, left/right-handedness, favorite color, etc. or maybe they were lazy and split it only based on gender then called it a day hoping nobody would dig into the specifics. That same “adjusted for demographic and lifestyle factors…” line might be an understated statement or an overreaching one, but the only way to know for sure is to check the specifics of the methodology.

Anonymous 0 Comments

Here is a very simple example. Suppose you conduct a survey and you find that taller people are more likely to have a certain opinion. But someone points out that, on average, men are taller than women, so maybe what’s really going on is that men are more likely to have that opinion, and height really has nothing to do with it. There are numerous different ways we might be able to check this. We can simply look through our results and see whether it is the case that men are more likely to have that opinion. If the answer is no, then that would seem to settle it. We could also look through our results to see whether taller men are more likely to hold the opinion than shorter men, and the same with women.

This can get much more complicated, because sometimes there are many plausible confounding factors, and sometimes only limited data are available on them.

> Adjusted for demographic and lifestyle factors

Vague language like this can sometimes be a bit of a smokescreen. Did they really control for *all* demographic and lifestyle factors, or even all the ones that are reasonably likely to be relevant? In most cases, this isn’t even feasible.

There can also be questions about whether it really makes sense to control for a certain factor, or whether it’s part of the effect you’re looking for. For example, suppose someone does a study demonstrating that people from rich families are more likely to be hired to a certain position. Someone might object that educational levels were not controlled for. But the study author might argue that educational levels are essentially a proxy for wealth, so it wouldn’t make sense to control for them.

Anonymous 0 Comments

The motivation there is a good faith effort to try to understand what is “really going on” because things and systems can be really complicated and lots of variables can come into play in unexpected ways.

An easy example is jobs. Let’s say you are trying to understand if more or less people are employed in a city compared to two years ago. So you take some data and look at various statistics. However, you recognize that jobs change throughout the year. In November and December the number of jobs increases because stores hire more people to deal with the holiday shopping season. Likewise when summer break hits and the college kids leave town and go home the jobs decrease from the school and from the places near the school etc.

So if you are going to make a good faith effort to try to determine if there are more or less jobs in your city now compared to two years ago, you can’t ignore these facts. If you compare the current year July jobs with the December jobs two years ago then you are doing it wrong. You have to find ways to account for or “control for” the typical fluctuations in jobs throughout the year.

This is a relatively simple example. Things that are more complex may require a lot of variables to control for.