Back-propagation in Neural Networks


I had a conversation recently with one of my friends who is a Sci-fi enthusiast, especially when it comes to artificial intelligence, but has no background in AI/ML. I attempted to explain how basic neural networks work but struggled to make the back-propagation method intuitive. I wonder if anyone here can describe it without going into the details of probability distributions, activation functions, gradient descent and the like.

In: 6

> gradient descent

Impossible. Because the first major question is: “why do back prop?” and the answer is: “gradient descent”. If you want to explain now ANN works, you better approach it the other way around. You explain that the goal is “gradient descent” without having to explain how it is being done i.e. backprop

Draw a line on a graph – y = x^2 +5
Or even a 3rd or 4th order line. Doesn’t matter.

Pick a spot on the line. An ML neural net is being “trained” when you move that dot to a lower and lower spot on the line. Once it stops moving the training is done. That’s GD in 1 dimension.

2d would be like finding the lowest spot on a think lumpy comforter.

3d is trickier – find the lowest temperature spot in a room.

All the same process. Pick a spot, move to where it’s lower. Repeat. (Obviously you would do better and avoid local mins if you picked 1000s of points and added some randomness to the decent … but that’s an optimization).

ML nets are in the 1000s or even millions of dimensions. But they all do the same. Give it enough input and find the lowest error array of values that will give the most correct answers.

It’s not HAL 9000 / CyberDyne black magic.

Back propagation is the process of transferring (“propagating”) updates through the neural network based on current performance and is the core technique used in gradient descent learning.


Imagine you are trying to find the highest point on a mountain, but there is a very heavy fog, so you can only see 1 metre in front of you. In addition to your eyes, you could also use your ears to listen for echoes, or your phone’s compass combined with an old map to work out the best way to go. Gradient descent is the process of taking one step at a time in the “best” direction that you can identify currently. Back propagation is taking the direction of the “best step” to calibrate your ears, eyes and compass, allowing you to slowly realise that your compass is consistently off by 45 degrees, or that an echo that lasts 5 seconds means you’re going in the wrong direction and building that into your next calculation of the “best step”.