What is a random effect in a mixed-effects model?



What is a random effect in a mixed-effects model?

In: Mathematics

I’ll assume you are familiar with OLS and that we don’t need to go too hard into the specific issues of data structure, measurement, causal inference, or other complications. Here’s a basic rundown:

Random effects are when you explicitly model the effect of something as, itself, a random variable. Say we have a list of patients with a headache, their doctors, how much pain medicine the patients receive from said doctors, and how much better or worse their headache is after taking the medicine. We can model the effect of pain medicine with something like

Y = A + B * x + e

Where Y is the pain measured at the end, x is the amount of medicine received, A is a constant (the model intercept for x = 0) and e is just residual variation (and assuming x, y are random variables). The estimate of B we get from this is the fixed effect of a unit change in the dosage of medicine (e.g. if we measure that in pills, how much one pill changes Y, or in milligrams, how much one milligram changes Y).

But maybe each patient responds to the same dose of pain medication differently. Well, suddenly B is also a random variable like x. Maybe we suppose the doctors have something to do with it. We can account for this by altering our model:

Y = A + B_i + B_x + e

B_i = B + u_i

Where B_i is the mean of the individual estimates for B and u_i is a random variable, and each B_i corresponds to a specific doctor i. Now, on a doctor-by-doctor basis, the model will estimate a different intercept — this corresponds to random changes in the outcome based on which doctor someone sees. This is random-intercept model.

Maybe the doctors have nothing to do with it and it is, in fact, the patients’ variation in response that we need to consider. We can do something like what we did above:

Y = A + (B + u_j)x + e

Here the random term is specific to each patient instead of each doctor, and it changes the estimate of B for each patient — giving a different slope. Then we would say that “(B + u_j) is the effect of a unit dose change of ibuprofen on patient j”. There is still some overall mean response (the original model), but this specifically gets at how individuals vary, much as how the other mixed model got at how change varied by doctor.

These explicit random effects for different groupings of the data allow us to ask more specific questions, and can greatly change the effect estimates for our variables of interest. The tradeoff is that they are computationally expensive to fit and require a lot of data in order to preserve adequate degrees of freedom at each level to produce the estimates.