I know some statistics but I still have a hard time grasping what “controlling for a variable means”. For me, it means that you want to isolate the variance explained by a particular variable by controlling for variables that contribute with confounding variance.
E.g., I want to predict ice-cream sales. As a predictor, I choose outside temperature. Let’s say that this explains 25% of the variance in ice cream sales. Now, let’s say that I want to control for what time of the day it is. People might buy more ice cream around lunch than in the morning. This is confounding since I only want to know how much variance outside temperature contributes with. So, I control for time of day. Now, when I do this, the variance explained by temperature should decrease – right?
Or, does “controlling for” simply means including time of day as a predictor, just like outside temperature?
In: 7
Yes, it means including time of day as a predictor. When you compute the best fit line, with one predictor the equation will look like this:
sales = x(temp) + C
with C being a constant or what your sales would be if temp were 0. But if you were to compute the best fit line with time of day as well, the equation would look like this
Sales = x(temp) + z(hour) + C
Note: I’m only using “z” to represent the second coefficient because I’m on mobile and can’t do subscripts. Normally you would denote all the coefficients as “x” with subscripts on them (x0,x1,x2…) this graph is still only 2 dimensions.
When you compute the best fit line this way, you can be more certain that the time of day is not influencing the coefficient for your temperature variable because it’s already factored into the model with its own coefficient. I hope this helps. I used some statistical jargon, because you seem to already be introduced to single variable analysis.
Latest Answers