3
$\begingroup$

There exist cases where one can "overfit" on the validation set. Although it is easier to overfit on the training set, the distributions of the validation and test set may not match, in which case tuning hyperparameters on the validation set could result in subpar performance on the test set.

It is common for neural networks to use early stopping on the validation set to determine what is a good place to stop training. However, in the scenario where there is a mismatch between the validation and test distributions, early stopping may result in lower performance. So, what are some alternatives to early stopping that could potentially address this issue?

$\endgroup$
7
  • 2
    $\begingroup$ Notice that same applies to whatever else you use your validation set for. $\endgroup$ Commented Apr 15, 2019 at 16:49
  • $\begingroup$ @Tim In this case, I'm only using it for early stopping. $\endgroup$ Commented Apr 15, 2019 at 17:09
  • $\begingroup$ Regularize the model, so that early stopping is not needed any more ? $\endgroup$ Commented Apr 15, 2019 at 17:12
  • $\begingroup$ @Thomas How does that help in terms of determining a stopping condition? You could try stopping at various epochs and look at test performance for each of those, but repeatedly querying the test set would not be good, right? $\endgroup$ Commented Apr 15, 2019 at 17:14
  • 1
    $\begingroup$ See also: stats.stackexchange.com/a/339412/163572 $\endgroup$ Commented Apr 16, 2019 at 11:53

1 Answer 1

2
$\begingroup$

"the distributions of the validation and test set may not match" it's normal that they do not match exactly. You should understand that training process is fitting your model weights to training data. However training data is only a part of whole problem. Training NN is based on belief that fitting to this samples would provide better quality of prediction on whole problem space. Usually you don't want to fit with much higher score on train that on validation set because it could leads your model to focus on very specific of part of space even if it leads to get worse score on whole space, which is called overfitting.

The question is when to stop training? Assuming you have: train, validation and test. Test couldn't be used in training process, so you can choose the best stopping point only based on train and validation, which naturally leads you to early stopping. You can also random guess how many epochs you need, which doesn't look better than early stopping. There is no other ways in general. In very specific problems you can use expert level domain knowledge.

Unfortunately if yours data is not representative for a problem the best solution is getting higher quality datasets.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.