Timeline for What is the difference between test set and validation set?

Current License: CC BY-SA 4.0

26 events

when toggle format	what		by	license	comment
Dec 4, 2020 at 16:33	review	Suggested edits
Dec 4, 2020 at 16:58
S Nov 3, 2019 at 18:11	history	suggested	hossein hayati	CC BY-SA 4.0	grammar and structure
Nov 3, 2019 at 12:56	review	Suggested edits
S Nov 3, 2019 at 18:11
Aug 15, 2019 at 2:17	review	Suggested edits
Aug 15, 2019 at 9:53
Mar 20, 2019 at 3:55	comment	added	WestCoastProjects		This answer is not precise (/ possibly not even correct) on the use of the `validation` set: see the answer below .
S Jul 20, 2018 at 19:48	history	suggested	CommunityBot	CC BY-SA 4.0	inserted missing word. "because you need --this-- for supervised learning."; improved wording: In many cases this is the data "in which" you are
Jul 20, 2018 at 19:37	review	Suggested edits
S Jul 20, 2018 at 19:48
Mar 5, 2018 at 4:45	comment	added	alltom		@KevinKim You train a model with examples from the training set, then evaluate the model with examples from the validation set—which it has never seen—to choose the model that generalizes the best. The model that does the best on the validation data with no additional training is most likely to do the best on other data sets (such as the test set), so long as they're all drawn from the same distribution.
Feb 28, 2018 at 16:37	comment	added	KevinKim		@alltom I see. But there is still some logic that I am missing: If the validation set is used for model selection, i.e., choose the model that has the best performance on the validation set (rather than the model that has the best performance on the training set), then is it just another overfitting? i.e., overfitting on the validation set? Then how can we expect that the model with the best performance on the validation set will also have best performance on the test set among all the models I am comparing? If the answer is no, then what's the point of the validation set?
Sep 26, 2017 at 5:49	comment	added	Aadnan Farooq A		Is it possible i can use the validation set for the testing?
Jul 20, 2017 at 13:56	comment	added	alltom		@KevinKim user695652 is saying that you will underestimate the true test error if you use the test set to train hyperparameters (size of model, feature selection, etc) instead of using a validation set for that. If you're saying that you don't train any hyperparameters, then you also don't need a validation data set.
Jul 20, 2017 at 13:49	comment	added	alltom		@YonatanSimson Models don't usually generalize well enough that you could train in only one location and have it work well in the other one, so the only reason you would do that is if you don't care about your model working as well as possible, but do care about testing how well your model generalizes. When your test set comes from the same distribution as the training set, it still tells you how much you overfit because the data isn't exactly the same, and overfitting is about working only on the exact data in your training set.
Jun 23, 2017 at 11:18	comment	added	Sudip Bhandari		Is it like validation is testing against the known, and 'testing' is against the unknown?
Apr 13, 2017 at 12:44	history	edited	CommunityBot		replaced http://stats.stackexchange.com/ with https://stats.stackexchange.com/
Mar 26, 2017 at 4:22	comment	added	KevinKim		@user695652 I see you quote the Elements of Statistical Learning. But I don't understand intuitively why this is true? When I train my model on the training data set, I did not use any data in the test data set. Also, if I didn't do any feature engineering, i.e., I just use the original set of features in my data set, then there shouldn't be any information leakage. So in this case, why I still need the validation set? Why if I just use the test set, it will underestimate the true test error?
Feb 3, 2016 at 10:36	comment	added	Yonatan Simson		What is the correct way to split the sets? Should the selection be random? What if you have pictures that are similar? Won't this damage your ability to generalize? If you have two sets taken in separate locations wouldn't it be better to take one as training set and the other as the test set?
Oct 13, 2015 at 10:52	comment	added	xiaohan2012		The validation set is often used to tune hyper-parameters. For example, in the deep learning community, tuning the network layer size, hidden unit number, regularization term(wether L1 or L2) depends on the validation set
S Sep 29, 2015 at 10:48	history	suggested	Colin Brady	CC BY-SA 3.0	Minor grammar fix.
Sep 29, 2015 at 9:56	review	Suggested edits
S Sep 29, 2015 at 10:48
Jun 2, 2015 at 20:09	comment	added	user695652		@Sebastian [If you only use the test set: ]"The test set error of the final chose model will underestimate the true test error, sometimes significantly" [Hastie et al]
S Dec 15, 2014 at 3:08	history	suggested	CommunityBot	CC BY-SA 3.0	better word
Dec 15, 2014 at 2:44	review	Suggested edits
S Dec 15, 2014 at 3:08
Nov 9, 2014 at 14:42	comment	added	Sebastian Graf		Is it because of overfitting? Or because we want some independent statistics based on the test result, just for error estimation?
Nov 9, 2014 at 14:31	comment	added	Sebastian Graf		Why wouldn't I choose the best performing model based on the test set, getting rid of the validation set altogether?
Mar 1, 2013 at 15:17	vote	accept	xiaohan2012
Nov 28, 2011 at 11:50	history	answered	Alexander Galkin	CC BY-SA 3.0

toggle format