Skip to main content
26 events
when toggle format what by license comment
Dec 4, 2020 at 16:33 review Suggested edits
Dec 4, 2020 at 16:58
S Nov 3, 2019 at 18:11 history suggested hossein hayati CC BY-SA 4.0
grammar and structure
Nov 3, 2019 at 12:56 review Suggested edits
S Nov 3, 2019 at 18:11
Aug 15, 2019 at 2:17 review Suggested edits
Aug 15, 2019 at 9:53
Mar 20, 2019 at 3:55 comment added WestCoastProjects This answer is not precise (/ possibly not even correct) on the use of the validation set: see the answer below .
S Jul 20, 2018 at 19:48 history suggested CommunityBot CC BY-SA 4.0
inserted missing word. "because you need --this-- for supervised learning."; improved wording: In many cases this is the data "in which" you are
Jul 20, 2018 at 19:37 review Suggested edits
S Jul 20, 2018 at 19:48
Mar 5, 2018 at 4:45 comment added alltom @KevinKim You train a model with examples from the training set, then evaluate the model with examples from the validation set—which it has never seen—to choose the model that generalizes the best. The model that does the best on the validation data with no additional training is most likely to do the best on other data sets (such as the test set), so long as they're all drawn from the same distribution.
Feb 28, 2018 at 16:37 comment added KevinKim @alltom I see. But there is still some logic that I am missing: If the validation set is used for model selection, i.e., choose the model that has the best performance on the validation set (rather than the model that has the best performance on the training set), then is it just another overfitting? i.e., overfitting on the validation set? Then how can we expect that the model with the best performance on the validation set will also have best performance on the test set among all the models I am comparing? If the answer is no, then what's the point of the validation set?
Sep 26, 2017 at 5:49 comment added Aadnan Farooq A Is it possible i can use the validation set for the testing?
Jul 20, 2017 at 13:56 comment added alltom @KevinKim user695652 is saying that you will underestimate the true test error if you use the test set to train hyperparameters (size of model, feature selection, etc) instead of using a validation set for that. If you're saying that you don't train any hyperparameters, then you also don't need a validation data set.
Jul 20, 2017 at 13:49 comment added alltom @YonatanSimson Models don't usually generalize well enough that you could train in only one location and have it work well in the other one, so the only reason you would do that is if you don't care about your model working as well as possible, but do care about testing how well your model generalizes. When your test set comes from the same distribution as the training set, it still tells you how much you overfit because the data isn't exactly the same, and overfitting is about working only on the exact data in your training set.
Jun 23, 2017 at 11:18 comment added Sudip Bhandari Is it like validation is testing against the known, and 'testing' is against the unknown?
Apr 13, 2017 at 12:44 history edited CommunityBot
replaced http://stats.stackexchange.com/ with https://stats.stackexchange.com/
Mar 26, 2017 at 4:22 comment added KevinKim @user695652 I see you quote the Elements of Statistical Learning. But I don't understand intuitively why this is true? When I train my model on the training data set, I did not use any data in the test data set. Also, if I didn't do any feature engineering, i.e., I just use the original set of features in my data set, then there shouldn't be any information leakage. So in this case, why I still need the validation set? Why if I just use the test set, it will underestimate the true test error?
Feb 3, 2016 at 10:36 comment added Yonatan Simson What is the correct way to split the sets? Should the selection be random? What if you have pictures that are similar? Won't this damage your ability to generalize? If you have two sets taken in separate locations wouldn't it be better to take one as training set and the other as the test set?
Oct 13, 2015 at 10:52 comment added xiaohan2012 The validation set is often used to tune hyper-parameters. For example, in the deep learning community, tuning the network layer size, hidden unit number, regularization term(wether L1 or L2) depends on the validation set
S Sep 29, 2015 at 10:48 history suggested Colin Brady CC BY-SA 3.0
Minor grammar fix.
Sep 29, 2015 at 9:56 review Suggested edits
S Sep 29, 2015 at 10:48
Jun 2, 2015 at 20:09 comment added user695652 @Sebastian [If you only use the test set: ]"The test set error of the final chose model will underestimate the true test error, sometimes significantly" [Hastie et al]
S Dec 15, 2014 at 3:08 history suggested CommunityBot CC BY-SA 3.0
better word
Dec 15, 2014 at 2:44 review Suggested edits
S Dec 15, 2014 at 3:08
Nov 9, 2014 at 14:42 comment added Sebastian Graf Is it because of overfitting? Or because we want some independent statistics based on the test result, just for error estimation?
Nov 9, 2014 at 14:31 comment added Sebastian Graf Why wouldn't I choose the best performing model based on the test set, getting rid of the validation set altogether?
Mar 1, 2013 at 15:17 vote accept xiaohan2012
Nov 28, 2011 at 11:50 history answered Alexander Galkin CC BY-SA 3.0