Why when training ML models do we not take the model with the highest accuracy

725 views

Its pretty common for an ML model to lose accuracy at various points during training. But presumably that means they are worse so why do we take the last version instead of the one that had the highest accuracy?

In: 1

12 Answers

Anonymous 0 Comments

Hopefully someone more qualified to answer this comes by, but my understanding is that more training with lower accuracy is better simply because it likely has a better set of parameters to work from.

Imagine a ML model that tries to determine if a number is prime without doing a sieve. And your random number generator keeps accidentally churning out multiples of 2, 3 and 5. The model tries some approach but lands on “always false” as one solution and it keeps working…until it lands on a real prime and fails. Now, do you want to stop at 100% accuracy because “all numbers are probably not prime” or do you want to keep training it on data to see if it comes up with a better solution?

Anonymous 0 Comments

When the model’s accuracy is very high, it’s likely to be “overfit,” which would mean it is only accurate for the data you trained it with. In other words, it doesn’t generalize. It can’t deal with data it hasn’t seen before.

There are methods to combat this, such as testing the trained model with a different set of data than the data you trained it with. However, if you pick from a set of trained models based on their test results, you’re effectively using the test data as part of the training data, which defeats the purpose entirely.

Anonymous 0 Comments

Your goal is to find the highest mountain. but you’re kind of blind, and can only measure how tall the mountain is when you reach the peak of it. Once you reach that peak you might need to try a different mountain, which requires going back down.

There are also situations where the AI is recognizing specific images in the training data, rather than the characteristics of those images you want to distinguish. Or it might get hung up on a correlation in the training data, for example if you have a model that’s supposed to distinguish dogs from cats, you might accidentally make a leash detecting AI, and need to add a bunch of images of dogs without leashes to steer the AI back to what you want, even if it means you perform worse on the original data set.

Anonymous 0 Comments

ELI5: what’s an ML model?

Anonymous 0 Comments

Q:what’s an ML model? Initialisms make it hard to google correctly.

Anonymous 0 Comments

Another comment explains overfitting which is one reason. Another reason is that you might be trying to predict something rare.

Let’s say you’re making pencils and you want the computer to tell if a pencil is defective. Let’s say 1 out of 1000 pencils is defective. Your highest accuracy model might be the one that always guesses the pencil is good, because that will be right 99.9% of the time. But that model isn’t really doing anything!

A similar thing happens when trying to identify cancer cells or other health problems. You really don’t want to miss the rare cases when there’s a problem.

There are other numbers to look at, like true positive rate, true negative rate, positive predictive value, and negative predictive value. You can think of these kinda like accuracy for just one case, yes accuracy and no accuracy. Which one you should use depends on what you’re doing.

Anonymous 0 Comments

Imagine a mountain range. You’re wandering around trying to find the tallest peak but you can’t look around the horizon to try and find the highest one. The only thing you can do is keep walking around and noting when you climb higher and lower. If you only followed where you climbed higher, you could wind up just scaling a random mountain and never finding a different mountain that could be higher.

AI/ML is basically this. The training data basically creates a mountain range and the training itself is essentially randomly scaling the mountains until you find one that is most likely to be the highest. If you only ever went up, you will only be stuck on 1 mountain rather than checking the hundreds of others.

Should be noted that this is why AI/ML is essentially applied statistics. The mountains are basically the likely hood of a given generated “answer” being the “right” one. The definition of answer/right is subject to interpretation by the humans looking at the results.

Anonymous 0 Comments

>But presumably that means they are worse so why do we take the last version instead of the one that had the highest accuracy?

You do want the model with the highest accuracy, but you need to be careful of how you calculate “accuracy.” For example, if you’re training some ML model on pictures, but you only train on cat pictures and you test the “accuracy” of your model on even more cat pictures, your model might be great at making predictions on cat pictures, but might have poor performance when all of a sudden you show it a picture of a banana.

Anonymous 0 Comments

Suppose you trained to run on a particular path. Suppose you trained to be 100% proficient at running on that particular path maybe for a specific race on that path. However, now suppose as a consequence that your running skills weren’t transferable to running anywhere else. That’s what 100% accuracy would entail while training a model. If you wanted to be able to run on any other path, you would have to completely retrain from scratch, crawling on all fours.

Anonymous 0 Comments

ELI5: what is an ML model?