I know some statistics but I still have a hard time grasping what “controlling for a variable means”. For me, it means that you want to isolate the variance explained by a particular variable by controlling for variables that contribute with confounding variance.
E.g., I want to predict ice-cream sales. As a predictor, I choose outside temperature. Let’s say that this explains 25% of the variance in ice cream sales. Now, let’s say that I want to control for what time of the day it is. People might buy more ice cream around lunch than in the morning. This is confounding since I only want to know how much variance outside temperature contributes with. So, I control for time of day. Now, when I do this, the variance explained by temperature should decrease – right?
Or, does “controlling for” simply means including time of day as a predictor, just like outside temperature?
In: 7
You control for the time of day by binning your data according to the time of day. So you then decide whether ice cream sales correlate with outside temperature only within each time of day slot separately. You don’t care whether time of day also predicts ice cream sales, because you already know that it (very likely) does, and it’s not the question that you want to explore.
Latest Answers