Multiple regressions



Also single linear regressions…
I’m STRUGGLING through my data analysis unit and high school maths was a long long time ago.
Forever grateful if you can help me through this fresh hell!

In: Mathematics

you have a scatterplot. You’ve measured the value of y when you knew the value of x to get corresponding values. You try to plot a line through this and figure out what line is most accurate. To determine how accurate the line is, you take your x value, plug it into your line equation, and compare the calculated y value with the actual y value, and square that result to eliminate negatives and positives canceling each other out. you do this for every point, sum all the squared errors, and you get a SSE (or sum of squared errors). You now want to minimize this error so you try a new line equation and do the same steps and see if it removes your error. Maybe rather than using y = mx+b, you use an equation of y = e^x or y = ax^2+bx+c to find a better fit.

Now, rather than trying a bunch of lines, you can use step-wise regression to determine terms that aren’t needed because overfitting data is bad. if you have 100 data points, you could use a formula like y = ax^100+bx^99+…+z and get 0 error but this equation probably doesn’t describe what’s happening. Having a simple equation is best, and most usable.

Also, in regression, rather than trying a bunch of different y=mx+b formulas, you can use derivatives to calculate an exact solution. you’re goal is jsut to find a line close to all the points.

You can do this same process but y = ax1^2+bx2^2+cx1+dx2 + e and do a quadratic regression for two values. this is the same process as before, just a little tougher mathwise, but still easy for a computer.