Can I use GridSearchCV.best_score_ for evaluation of model performance?

Question

Scikit-learn page on Grid Search says:

Model selection by evaluating various parameter settings can be seen as a way to use the labeled data to “train” the parameters of the grid.

When evaluating the resulting model it is important to do it on held-out samples that were not seen during the grid search process: it is recommended to split the data into a development set (to be fed to the GridSearchCV instance) and an evaluation set to compute performance metrics.

Does it mean that the GridSearchCV.best_score_ from the Grid Search object shouldn't be used for model performance evaluation? Why is that the case?

I've been using my GridSearchCV scores as my performance estimates, because I wanted to get a reliable score over several runs (and standard deviation), and running a separate cross-validation after Grid Search gives me overestimated scores because some of the data in the CV validation sets was already seen by the Grid Search. Is this an incorrect approach?

Harshad Patil · Accepted Answer · 2023-07-05 11:14:08Z

Yes, the GridSearchCV.best_score_ should not be used as a final measure of model performance. The reason is that this score is optimistic, it is the best score obtained on the validation set during the grid search, but it does not guarantee that this is the best score the model can achieve on unseen data.

The grid search process involves tuning the hyperparameters of the model to find the best combination that gives the highest score on the validation set. This means that the model is indirectly "fit" to the validation set, because the hyperparameters are chosen based on their performance on this set. Therefore, the best_score_ is likely to be an overestimate of the true performance of the model on unseen data.

To get a more reliable estimate of model performance, it is recommended to hold out a separate test set that is not used during the grid search process. After the grid search, you can evaluate the model on this test set. This gives you an estimate of how well the model is likely to perform on new, unseen data.

Your approach of running a separate cross-validation after the grid search is not incorrect, but it may indeed give overestimated scores if some of the data in the CV validation sets was already seen by the Grid Search. To avoid this, you could split your data into three sets: a training set for the grid search, a validation set for the cross-validation, and a test set for the final performance evaluation.

Stack Exchange Network

Can I use GridSearchCV.best_score_ for evaluation of model performance?

1 Answer 1

Hot Network Questions

Can I use GridSearchCV.best_score_ for evaluation of model performance?

1 Answer 1

Related

Hot Network Questions