Timeline for Why splitting the data into the training and testing set is not enough
Current License: CC BY-SA 3.0
4 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Mar 27, 2017 at 0:55 | comment | added | KevinKim | @jlimahaverford why the internal cv does not produce good measure of the actual algorithm performance? I think mathematically it is. Because when you fit your model, say LASSO with specific hyperparameter $\alpha$, then you apply it on another piece of the data (that you didn't use to fit your model) in the internal cv process, then this error rate should be an unbiased estimator of the true prediction error rate for your specific $\alpha$, isn't it? | |
| Aug 26, 2015 at 11:33 | comment | added | jlimahaverford | Henry, I don't think you are understanding external cross validation. You can "do this repeatedly with the test set," repeatedly holding out some portion of your full data for test purposes while executing your full training procedure on the rest (which may include internal cross validation). External cross validation is still typically done in folds, and allows for all of the original data to at some point be in the test set. | |
| Aug 26, 2015 at 10:11 | comment | added | Henry | Personally, I would not use the phrase "external cross validation", as I would see cross validation as the repeated splitting off of different validation sets from the training set for model selection and tuning purposes. You cannot meaningfully do this repeatedly with the test set, as that is as a one-off proxy for future as-yet-unknown data used to judge the performance of the final model. | |
| Aug 26, 2015 at 3:33 | history | answered | jlimahaverford | CC BY-SA 3.0 |