– Someone please explain R-squared regression to me!

992 views

Seriously. I will need it for work. I haven’t been able to understand that shit from any online resource until now. Please ELI5

In: 380

15 Answers

Anonymous 0 Comments

Regression is a method for fitting a line to some data. Think “best fit line” from school. You are trying to find a line that follows the pattern of your data point.

R-squared is one way we evaluate how well the line fits the data.

Anonymous 0 Comments

R^2 is not a very useful metric for many reasons. First, it is not a test of any hypothesis. You may, arbitrarily, create experiments where R^2 is very, very low, but a clear relationship exists. Likewise, you can find setups where R^2 is very high, but there is no relationship whatever. You will receive many responses that will claim the R^2 provides a measure of “goodness of fit” or, worse, that R^2 tells you the “explanatory power” of your model. This is all metastatistical nonsense, snake oil, plain bs.

R^2, also, tells us very little (and can even mislead) about nonlinear relationships. And this is hardly an exhaustive list of the problems that defenders of the R^2 should grapple with.

There are many useful tools for regression, but R^2 isn’t one of them. I encourage you to read many viewpoints on the matter, those for and against, and then decide if you really believe in the R^2 biz.

Anonymous 0 Comments

just curious but did you study something non technical like arts or linguistics?

Anonymous 0 Comments

This simple demonstration does a great job ELI5ing this one https://twitter.com/pinakilaskar/status/1329748899347767296?s=21&t=jk1WFO6mLziQvaQixyBl9A

Anonymous 0 Comments

Lots of people have explained you’re trying to draw a line with roughly same number of points above and below that line (i.e. line of best fit). I don’t think anyone has explained the following though:

For every data point, the distance between the line you drew, and each point, is a RESIDUAL. In simple linear regression, the model you’re fitting is attempting to minimize the SUM OF SQUARED RESIDUALS.

What does this look like? [Here’s a visualization.](https://ibb.co/kQ303jx) Don’t worry what the data are. The image on the left shows what the residuals look like, the one on the right shows the SQUARED residual. Add all the squares up, and that’s the sum of squares. You want that sum of squares to be as low as possible.

This demonstrates why a point that’s really far away from the line has a huge impact on the line’s trajectory – because the square gets a lot bigger.