Better accuracy with validation set than test set

Question

I trained a model with some algorithms like random forest, logistic regression and so on.

My dataset was split into 80% CV train data (so actually 60% of data to train the model and 20 % for testing with cv). I built my model and now I'm using the last 20% (which I didn't used the whole time) but I'm wondering why is the predictive accuracy better with the validation set than with the training set?

specificity validation: 0.62962963
sensitivity validation: 0.94761905

on the left is the mean, on the right is the standard deviation (+/-) with a 10 times cross validation

specificity_train: 0.55 (+/- 0.10)
recall_train: 0.94 (+/- 0.02)

Did I something wrong? Thought this is the normal / best way to test, if the model is overfitted or not. Or is it OK due to the standard deviation?

Off the top of my head, conditions likely to produce better accuracy with a validation set include a small N and a narrow distribution of the outcome variable (e.g., 98% "0" and 2% "1"). — rolando2
– rolando2, Commented Nov 28, 2015 at 1:44
I have in the training set, 26% of "0" and in the validation set, 27% of "0". I think, it is equally distributed. The training set hat 1151 samples and the test set 291 samples. — auronsen
– auronsen, Commented Nov 28, 2015 at 1:56
For this type of question, the absolute counts (total cases in both classes for test set and cross validation as well as the absolute counts of correctly recognized cases for both validation schemes are important: the precision of sensitivity & specificity depends on the absolute number of cases in the denominator). — cbeleites
– cbeleites, Commented Nov 29, 2015 at 15:02

cbeleites · Accepted Answer · 2015-11-29 17:43:38Z

Reconstructing the absolute counts from your information:

test set:

 total cases correct cases correct approx. 95 % c.i. class 0 81 51 63 % 52 % - 73 % class 1 210 199 95 % 91 % - 97 %

Comparing the confidence intervals for sensitivity & specificity for the test set with the observed sensitivity & specificity for the cross validation, both CV results are actually within the confidence interval.

You can similarly compute confidence intervals for the cross validation results.

If all your models (LR, rF, ...) consistently show this difference, then I'd become suspicious on whether the test set does indeed differ in some important way from the training set.
You explain that you kept the last 20% for independent testing, and that may be cause of trouble if there's some drift in the data.

Whether this comparison of hold out and cross validation is the best option depends hugely on whether you have enough samples to afford setting aside the hold out set. You basically pay for the unbiasedness by much wider confidence intervals due to fewer test cases.
You can, however calculate beforehand whether the uncertainty of the holdout test allows you to draw the conclusions you need to draw.

If you need literature, we have a paper where we discuss this in more detail: Beleites, C. et al.: Sample size planning for classification models., Anal Chim Acta, 760, 25-33 (2013)
(also available on arXiv: 1211.1323)

You can calculate binomial confidence intervals e.g.in R by

library (binom) binom.confint (51, 81)

Side note: standard deviation over the cross validation folds is a somewhat awkward measure as it conflates model stability with test sample size.

sivi · Accepted Answer · 2019-12-20 11:28:18Z

This is normal

training accuracy > validation accuracy > test accuracy

But some times when you have low amount of data there could be changes in this equation

For example initially on first batches of learning you will often see that:
training accuracy < validation accuracy
That is because the algorithm is yet to "learn" enough about the structure of the data

Stack Exchange Network

Better accuracy with validation set than test set

2 Answers 2

Linked

Hot Network Questions

Better accuracy with validation set than test set

2 Answers 2

Linked

Related

Hot Network Questions