Others have explained that a regression is just making a line go through some data points as best it can. Others talked about the R^2 term and what it means, but I feel they either don’t explain why or aren’t ELI5-level, so here’s my attempt.
When you draw a line through the data points, you need to have a way of deciding how “good” it is. What’s the best way to do that?
If you imagine it’s a 2-d plot, with an x and a y axis, then one thing you can do is measure the distance between each point and the line. You do that by subtracting the value of the line at the given point’s x coordinate from the point’s y coordinate. So if you had a point at (1, 1.5) and the line went through (1, 1), then the distance is 0.5. We don’t really care if the line is above or below the point, so we take the square of that distance so that negative distances and positive distances are always equivalent. It also has the benefit of giving larger distances a bigger impact. That’s called the “square residual” for that point. If you add those up for _all_ the points, you get something called the “residual sum of squares.”
That is a pretty good way to determine how well a line fits, and you can actually use it directly to find the best line – you can just keep changing the line until the residual sum of squares is as low as it will go.
However, depending on your dataset and how many points there are, that number can vary wildly anywhere from zero to infinity (for infinite points). That makes it hard to compare linear regressions on different datasets.
So, we look at another measure as well. We take the _average_ y-value of all the points, called the _mean_, and we measure the distance of each point from that mean, using the same squaring trick to make sure it’s always positive. If we add all of these up, now we have a measure of the _variance_ of the data – we know how much it is spread out in general. We call that the total sum of squares.
If you take 1 – SS_res / SS_tot, you get the R^2 coefficient. This number is bounded between 0 and 1, where if it’s zero, it means the line is just going through the average of all the points and isn’t predicting anything (aside from the average), and if it’s 1, it means it exactly goes through every data point, and exactly predicts everything. Anything in between gives you a good measure of how well the line fits the data – and most importantly, it has meaning relative to other datasets and regression lines.
You can interpret it as telling you what fraction of the variance in the data is _unexplained_ by the regression. In other words, if I propose that a linear model fits my data, and R^2 is 1, that means there is nothing else happening in that data that my line does not explain. If R^2 is zero, then it means my line explains exactly nothing in the data, and every single thing that is happening is caused by _something else_ that isn’t included in that line. If it’s 0.5, it means roughly half of the variance is explained by the line, but half is caused by something else.
That’s why, if you are proposing that a model should be linear, you want R^2 to be as high as possible. If it’s 0.8, it means 20% of what you observe isn’t explained by that model. Depending on the application, that could be more or less important, but in general it means that you would expect any estimates made by your model to have errors with a variance of about 20%.
Latest Answers