We assume that prior instances of higher accuracy are due to overfitting, which other people have already talked about. Basically, something that might be accurate on one dataset may not be accurate on another because you don’t want the model to learn the noise. But another point is robustness. Imagine you are on a hill. We tend to want models that are more likely to stay accurate despite small disturbances. So in likelihood, if the model “fell off” a peak, that probably means the peak was very steep and not very robust. We want to be in a “flat” area in the end.
Latest Answers