Eli5: how does a program like chatgp actually “learn” something

501 views

Is there a fundamental difference between how a program like chatgp learns and how a human learns?

In: 2

18 Answers

Anonymous 0 Comments

Other’s have given a good analogy for Genetic Algorithms, but chatgpt learns using gradient descent, not GA. For gradient descent the usual analogy is.

Imagine you are a blind person stranded on the side of a mountain. You need to find water, and know that rivers usually run through valleys. How do you find your way down the mountain? Well, you can take tiny little steps around you, feeling for the direction of the slope. Once you find the direction that points downhill, you walk in that direction for a little while, and repeat the process in your new location to see if the direction of ‘downhill’ has changed. You can continue this process until you reach the river.

In this analogy, your current location represents the weights of the neural network. The downhill direction is the gradient. The number of steps before repeating is called the learning rate. The topography of the mountainside is your cost function, which measures how good your model predictions are. When you reach the river, you’ve minimized the cost function, and the neural network is generally pretty good at producing the outputs you want it to.

How does the network know what you want it to output, though? The input data during this process has been manually annotated by humans. The cost function calculates how far away the models predictions are from those annotations. When we look for the ‘downhill’ direction, we calculate the direction which we can change the weights to reduce this cost. By reducing the cost, the neural networks predicted values move closer to the annotations.

But what if, on your way down the mountain you get stuck in a hole, and you can’t get out of it? Then you’re stuck! Gradient descent is only gaurenteed to find local minima, it does not gaurentee you find the global minimum of the entire ‘cost landscape’. There are some technique’s to combat this, like periodically increasing the learning rate so you can ocassionally take more steps to try and get out of any ‘holes’ you’ve found yourself in.

What if, on you’re way down, you find yourself on an extremely flat plateau, where there is no ‘downhill’ direction? Also stuck! This is called a ‘vanishing gradient’ that really plagued early ML models. A lot of resources have been poured into making network architectures that are robust against this.

You are viewing 1 out of 18 answers, click here to view all answers.