statistically control for a variable

379 views

I know some statistics but I still have a hard time grasping what “controlling for a variable means”. For me, it means that you want to isolate the variance explained by a particular variable by controlling for variables that contribute with confounding variance.

E.g., I want to predict ice-cream sales. As a predictor, I choose outside temperature. Let’s say that this explains 25% of the variance in ice cream sales. Now, let’s say that I want to control for what time of the day it is. People might buy more ice cream around lunch than in the morning. This is confounding since I only want to know how much variance outside temperature contributes with. So, I control for time of day. Now, when I do this, the variance explained by temperature should decrease – right?

Or, does “controlling for” simply means including time of day as a predictor, just like outside temperature?

In: 7

7 Answers

Anonymous 0 Comments

With statistics, there are often 2 (or more) different ways of doing things: via study design on the front end, or via modeling method on the back end.

“Controlling for” a variable just means “we have taken steps to make sure this variable probably isn’t affecting things (much)”. There are multiple ways to do this.

Generally, the more variables you can “control for” with study design, the better, because this makes the data you gather cleaner. In this case, that would look something like only sampling data points at the same time of day, say 2pm each day. Then, when you’re examining sales vs temperatures, you can be reasonably confident that time of day isn’t confounding things.

Of course, this might make data collection a lot harder, so we can also use statistical methods to “control for” this affect in our model after the fact. Like you mentioned, this generally looks like including that variable as a predictor, so that its effect will be largely captured in its own coefficient, and then the coefficient of the temperature is less affected by time of day.

If you’ve “controlled for” a potentially confounding variable, it essentially means you have removed the variance in your data that you think is attributable to that variable. If you did this via study design, the explained variance % of your remaining variables should *increase*, because now hopefully more of the remaining variance is truly due to the effect you’re measuring. If you did it via stats, it will depend on the exact method you use (simple linear, mixed effects, etc.).

You are viewing 1 out of 7 answers, click here to view all answers.