Why when training ML models do we not take the model with the highest accuracy

719 views

Its pretty common for an ML model to lose accuracy at various points during training. But presumably that means they are worse so why do we take the last version instead of the one that had the highest accuracy?

In: 1

12 Answers

Anonymous 0 Comments

We assume that prior instances of higher accuracy are due to overfitting, which other people have already talked about. Basically, something that might be accurate on one dataset may not be accurate on another because you don’t want the model to learn the noise. But another point is robustness. Imagine you are on a hill. We tend to want models that are more likely to stay accurate despite small disturbances. So in likelihood, if the model “fell off” a peak, that probably means the peak was very steep and not very robust. We want to be in a “flat” area in the end.

Anonymous 0 Comments

I think the measurement of accuracy is not accurate. You are measuring something like the true accuracy + noise. The accuracy of model is really just showing how well it does with whatever validation set you use, which is subset of all tasks you are going to give it. If all you ever wanted to do was use the model against the validation set, then sure, select the model that shows highest accuracy with the validation set, I guess.