What is the normal equation?


Linear regression usually involves building a simple intelligence (single-neuron neural network) to estimate a value or values given data. I wondered about ways to estimate without having to build an intelligence (“what is ‘intelligent’ about linear regression? it is a system that learns from data — how remove ‘intelligence’ from this? maybe remove gradient descent?”), and I remembered the Normal Equation. What are you all’s thoughts?

In: 0

Linear regression does not require the idea of a single-neuron network. Instead one can treat the problem entirely with linear algebra. One method of solving it is the normal equation: https://mathworld.wolfram.com/NormalEquation.html. This is a way of rewriting the equation to make it approachable by some solution methods.

It is possible to solve equations like this with the conjugate gradient method, as you mentioned (but, again, this problem can be understood entirely within linear algebra, and the concept of neural networks and “intelligence” isn’t necessary at all). But there are other options. For example, the LU or Cholesky factorizations – these are both direct methods, meaning that no iteration is necessary. Another option, which is very old, is Gauss Seidel https://en.m.wikipedia.org/wiki/Gauss–Seidel_method, which was developed by Gauss in 1823 and Seidel in 1874.

I would encourage you to consider linear regression as distinct from a system which “learns” from data. Linear regression is simply a mathematical tool for working with data and getting certain pieces of information out of it. To say that linear regression learns from data would be like saying addition or subtraction learn from data.