A (simplified) typical workflow in machine learning might be:
- Train $m$ models on a training set.
- Validate the $m$ models on a validation set to yield the best model with parameters $\theta$.
- Retrain the best model on all available data (training and validation) which should yield a model with different parameters $\theta'$.
Isn't it possible that parameters $\theta'$ do not perform so well on unseen real-world data? How do you know that the parameters $\theta'$ (from training on all available data) are better than the parameters $\theta$ (from training on just the training set)?