I know some statistics but I still have a hard time grasping what “controlling for a variable means”. For me, it means that you want to isolate the variance explained by a particular variable by controlling for variables that contribute with confounding variance.
E.g., I want to predict ice-cream sales. As a predictor, I choose outside temperature. Let’s say that this explains 25% of the variance in ice cream sales. Now, let’s say that I want to control for what time of the day it is. People might buy more ice cream around lunch than in the morning. This is confounding since I only want to know how much variance outside temperature contributes with. So, I control for time of day. Now, when I do this, the variance explained by temperature should decrease – right?
Or, does “controlling for” simply means including time of day as a predictor, just like outside temperature?
In: 7
“Controlling for” a variable means doing something to make sure it doesn’t vary.
So, if you’re looking at ice cream sales varying with temperature, and you want to control for time of day, you could just make sure that your statistics are all at the same time of day. If you look at ice cream sales between 1 and 2 PM every day, you have successfully controlled for time of day: since it’s the same for all of them, any differences between the data must be due to some other factor.
In short, you want to make sure that everything except the thing you’re trying to measure is consistent.
There are statistical methods you can do to check this if it’s not in a context where you can control how the data is collected. But generally the idea is to rule out that variable as a possible explanation.
Latest Answers