Why when training ML models do we not take the model with the highest accuracy


Its pretty common for an ML model to lose accuracy at various points during training. But presumably that means they are worse so why do we take the last version instead of the one that had the highest accuracy?

In: 1

Hopefully someone more qualified to answer this comes by, but my understanding is that more training with lower accuracy is better simply because it likely has a better set of parameters to work from.

Imagine a ML model that tries to determine if a number is prime without doing a sieve. And your random number generator keeps accidentally churning out multiples of 2, 3 and 5. The model tries some approach but lands on “always false” as one solution and it keeps working…until it lands on a real prime and fails. Now, do you want to stop at 100% accuracy because “all numbers are probably not prime” or do you want to keep training it on data to see if it comes up with a better solution?

When the model’s accuracy is very high, it’s likely to be “overfit,” which would mean it is only accurate for the data you trained it with. In other words, it doesn’t generalize. It can’t deal with data it hasn’t seen before.

There are methods to combat this, such as testing the trained model with a different set of data than the data you trained it with. However, if you pick from a set of trained models based on their test results, you’re effectively using the test data as part of the training data, which defeats the purpose entirely.

Your goal is to find the highest mountain. but you’re kind of blind, and can only measure how tall the mountain is when you reach the peak of it. Once you reach that peak you might need to try a different mountain, which requires going back down.

There are also situations where the AI is recognizing specific images in the training data, rather than the characteristics of those images you want to distinguish. Or it might get hung up on a correlation in the training data, for example if you have a model that’s supposed to distinguish dogs from cats, you might accidentally make a leash detecting AI, and need to add a bunch of images of dogs without leashes to steer the AI back to what you want, even if it means you perform worse on the original data set.

ELI5: what’s an ML model?

Q:what’s an ML model? Initialisms make it hard to google correctly.