Eli5: how does a program like chatgp actually “learn” something

479 views

Is there a fundamental difference between how a program like chatgp learns and how a human learns?

In: 2

18 Answers

Anonymous 0 Comments

Other’s have given a good analogy for Genetic Algorithms, but chatgpt learns using gradient descent, not GA. For gradient descent the usual analogy is.

Imagine you are a blind person stranded on the side of a mountain. You need to find water, and know that rivers usually run through valleys. How do you find your way down the mountain? Well, you can take tiny little steps around you, feeling for the direction of the slope. Once you find the direction that points downhill, you walk in that direction for a little while, and repeat the process in your new location to see if the direction of ‘downhill’ has changed. You can continue this process until you reach the river.

In this analogy, your current location represents the weights of the neural network. The downhill direction is the gradient. The number of steps before repeating is called the learning rate. The topography of the mountainside is your cost function, which measures how good your model predictions are. When you reach the river, you’ve minimized the cost function, and the neural network is generally pretty good at producing the outputs you want it to.

How does the network know what you want it to output, though? The input data during this process has been manually annotated by humans. The cost function calculates how far away the models predictions are from those annotations. When we look for the ‘downhill’ direction, we calculate the direction which we can change the weights to reduce this cost. By reducing the cost, the neural networks predicted values move closer to the annotations.

But what if, on your way down the mountain you get stuck in a hole, and you can’t get out of it? Then you’re stuck! Gradient descent is only gaurenteed to find local minima, it does not gaurentee you find the global minimum of the entire ‘cost landscape’. There are some technique’s to combat this, like periodically increasing the learning rate so you can ocassionally take more steps to try and get out of any ‘holes’ you’ve found yourself in.

What if, on you’re way down, you find yourself on an extremely flat plateau, where there is no ‘downhill’ direction? Also stuck! This is called a ‘vanishing gradient’ that really plagued early ML models. A lot of resources have been poured into making network architectures that are robust against this.

Anonymous 0 Comments

Chatgpt has for reference the entire internet as it existed up to 2021. You can go look at google for the string “Is there a fundamental difference between…” Now, in several examples (all of them actually,) look at the text after that. Further, look at how that phrase ‘leads to’ the words chatgpt and human. Chatgpt makes a summary of the text it finds related to the words you’re interested in, and returns that to you.

Chatgpt is Kim Peek meets Chauncey Gardner — it knows everything, and can relate and discuss it, but doesn’t know What it knows, or Why.

Anonymous 0 Comments

Chatgpt has for reference the entire internet as it existed up to 2021. You can go look at google for the string “Is there a fundamental difference between…” Now, in several examples (all of them actually,) look at the text after that. Further, look at how that phrase ‘leads to’ the words chatgpt and human. Chatgpt makes a summary of the text it finds related to the words you’re interested in, and returns that to you.

Chatgpt is Kim Peek meets Chauncey Gardner — it knows everything, and can relate and discuss it, but doesn’t know What it knows, or Why.

Anonymous 0 Comments

Other’s have given a good analogy for Genetic Algorithms, but chatgpt learns using gradient descent, not GA. For gradient descent the usual analogy is.

Imagine you are a blind person stranded on the side of a mountain. You need to find water, and know that rivers usually run through valleys. How do you find your way down the mountain? Well, you can take tiny little steps around you, feeling for the direction of the slope. Once you find the direction that points downhill, you walk in that direction for a little while, and repeat the process in your new location to see if the direction of ‘downhill’ has changed. You can continue this process until you reach the river.

In this analogy, your current location represents the weights of the neural network. The downhill direction is the gradient. The number of steps before repeating is called the learning rate. The topography of the mountainside is your cost function, which measures how good your model predictions are. When you reach the river, you’ve minimized the cost function, and the neural network is generally pretty good at producing the outputs you want it to.

How does the network know what you want it to output, though? The input data during this process has been manually annotated by humans. The cost function calculates how far away the models predictions are from those annotations. When we look for the ‘downhill’ direction, we calculate the direction which we can change the weights to reduce this cost. By reducing the cost, the neural networks predicted values move closer to the annotations.

But what if, on your way down the mountain you get stuck in a hole, and you can’t get out of it? Then you’re stuck! Gradient descent is only gaurenteed to find local minima, it does not gaurentee you find the global minimum of the entire ‘cost landscape’. There are some technique’s to combat this, like periodically increasing the learning rate so you can ocassionally take more steps to try and get out of any ‘holes’ you’ve found yourself in.

What if, on you’re way down, you find yourself on an extremely flat plateau, where there is no ‘downhill’ direction? Also stuck! This is called a ‘vanishing gradient’ that really plagued early ML models. A lot of resources have been poured into making network architectures that are robust against this.

Anonymous 0 Comments

Other’s have given a good analogy for Genetic Algorithms, but chatgpt learns using gradient descent, not GA. For gradient descent the usual analogy is.

Imagine you are a blind person stranded on the side of a mountain. You need to find water, and know that rivers usually run through valleys. How do you find your way down the mountain? Well, you can take tiny little steps around you, feeling for the direction of the slope. Once you find the direction that points downhill, you walk in that direction for a little while, and repeat the process in your new location to see if the direction of ‘downhill’ has changed. You can continue this process until you reach the river.

In this analogy, your current location represents the weights of the neural network. The downhill direction is the gradient. The number of steps before repeating is called the learning rate. The topography of the mountainside is your cost function, which measures how good your model predictions are. When you reach the river, you’ve minimized the cost function, and the neural network is generally pretty good at producing the outputs you want it to.

How does the network know what you want it to output, though? The input data during this process has been manually annotated by humans. The cost function calculates how far away the models predictions are from those annotations. When we look for the ‘downhill’ direction, we calculate the direction which we can change the weights to reduce this cost. By reducing the cost, the neural networks predicted values move closer to the annotations.

But what if, on your way down the mountain you get stuck in a hole, and you can’t get out of it? Then you’re stuck! Gradient descent is only gaurenteed to find local minima, it does not gaurentee you find the global minimum of the entire ‘cost landscape’. There are some technique’s to combat this, like periodically increasing the learning rate so you can ocassionally take more steps to try and get out of any ‘holes’ you’ve found yourself in.

What if, on you’re way down, you find yourself on an extremely flat plateau, where there is no ‘downhill’ direction? Also stuck! This is called a ‘vanishing gradient’ that really plagued early ML models. A lot of resources have been poured into making network architectures that are robust against this.

Anonymous 0 Comments

With “machine learning” you’re asking the machine to make a prediction if something will happen based on certain conditions.

So say you have a bunch of statistics from bunch of basketball games. You want to see how well your model takes in a bunch of information about games in the past (who played, how many points they scored, rebounds, turnovers, etc) and whether or not it can accurately predict which team won those games. If it’s very good at predicting the outcomes of those games you could apply that to when it predicts which team will win in the future.

So first you take a portion of those games with the win-loss result and use that to “train” the model. It understands that when all 5 starters score 10+ points that team wins most of the time. Then you give the model the rest of the data without that win-loss result information and ask it to guess whether or not the team won. Then you compare its predictions with reality. If its a good match then, great. If not then you either need to improve the model or maybe find more data that can help refine things. If you have the box scores of a 100 games you give the model 20 of those games to train and then test on the remaining 80.

This cycle of training and testing your data is how the model “learns” to be more accurate when making a prediction. You give it some information, ask it to make a prediction based on what it knows, and then see how well it does.

ChatGPT is advanced since its doing more than statistical modeling but the idea is similar, it’s looking at a ton of information and trying to make a prediction of what you want not only because of what you asked but what everyone who has ever asked a similar question before has gotten for an answer.

Anonymous 0 Comments

With “machine learning” you’re asking the machine to make a prediction if something will happen based on certain conditions.

So say you have a bunch of statistics from bunch of basketball games. You want to see how well your model takes in a bunch of information about games in the past (who played, how many points they scored, rebounds, turnovers, etc) and whether or not it can accurately predict which team won those games. If it’s very good at predicting the outcomes of those games you could apply that to when it predicts which team will win in the future.

So first you take a portion of those games with the win-loss result and use that to “train” the model. It understands that when all 5 starters score 10+ points that team wins most of the time. Then you give the model the rest of the data without that win-loss result information and ask it to guess whether or not the team won. Then you compare its predictions with reality. If its a good match then, great. If not then you either need to improve the model or maybe find more data that can help refine things. If you have the box scores of a 100 games you give the model 20 of those games to train and then test on the remaining 80.

This cycle of training and testing your data is how the model “learns” to be more accurate when making a prediction. You give it some information, ask it to make a prediction based on what it knows, and then see how well it does.

ChatGPT is advanced since its doing more than statistical modeling but the idea is similar, it’s looking at a ton of information and trying to make a prediction of what you want not only because of what you asked but what everyone who has ever asked a similar question before has gotten for an answer.

Anonymous 0 Comments

With “machine learning” you’re asking the machine to make a prediction if something will happen based on certain conditions.

So say you have a bunch of statistics from bunch of basketball games. You want to see how well your model takes in a bunch of information about games in the past (who played, how many points they scored, rebounds, turnovers, etc) and whether or not it can accurately predict which team won those games. If it’s very good at predicting the outcomes of those games you could apply that to when it predicts which team will win in the future.

So first you take a portion of those games with the win-loss result and use that to “train” the model. It understands that when all 5 starters score 10+ points that team wins most of the time. Then you give the model the rest of the data without that win-loss result information and ask it to guess whether or not the team won. Then you compare its predictions with reality. If its a good match then, great. If not then you either need to improve the model or maybe find more data that can help refine things. If you have the box scores of a 100 games you give the model 20 of those games to train and then test on the remaining 80.

This cycle of training and testing your data is how the model “learns” to be more accurate when making a prediction. You give it some information, ask it to make a prediction based on what it knows, and then see how well it does.

ChatGPT is advanced since its doing more than statistical modeling but the idea is similar, it’s looking at a ton of information and trying to make a prediction of what you want not only because of what you asked but what everyone who has ever asked a similar question before has gotten for an answer.