Regularization with linear regression


I’m taking a class that is covering this topic right now. And having trouble finding a simplified explanation. I get that it helps over fitting, but what is it really doing?

In: Mathematics

It usually works by shrinking the coefficients you get at the end. The coefficients get shrunk because there is a penalty applied. Normally you’re essentially minimizing the square error by finding the best coefficients. But in regularization, you add a penalty for each additional coefficient based on it’s size. So unless a coefficient explains a lot of variance to overcome the penalty, it will get shrunk toward 0.

This whole process is nice because a regular linear regression optimizes explained variance, but at the cost of additional bias (the coefficients are fit to the data you have, not the data you don’t). The penalty introduces some bias in the likelihood being optimized (shrinkage) to try to find the optimum balance of variance explained and minimal bias. Rather than just optimize variance explained.