Wnat is so special about neural network architecture of neurons and activation functions? Isn’t it just a giant nonlinear function at the end?



I understand it started with mimicking brain and neurons, etc. But why not try to fit a giant polynomial or some funky nonlinear function like `x1^7*tanh(x2)/log(x3)+…`. to the the data? Why does it have to be neurons & activation functions? Des “the NN architecture” allow easier computation of gradients for optimization?

In: Engineering

One of the main benefits is that while it is a giant function, that function is composed of relatively few identical simple sub functions (the neurons) that only have a few weights/coefficients to change.

That means that the entire thing can be processed mostly in parallel, which is really well suited to certain types of processors (notably GPUs, and specialized neural net processing chips) and that only the coefficients need stored, and the general network structure, not a specific function for each individual neuron.

For image recognition in particular the development of CNNs provided a huge jump in neural net capability by explicitly assuming an image as input, and using tricks like parameter sharing and pooling to reduce the memory and processing required as compared to traditional neural nets.