I understand it started with mimicking brain and neurons, etc. But why not try to fit a giant polynomial or some funky nonlinear function like `x1^7*tanh(x2)/log(x3)+…`. to the the data? Why does it have to be neurons & activation functions? Des “the NN architecture” allow easier computation of gradients for optimization?