In the Coursera video lecture by Prof. Andrew Ng, he discusses about some basic good practices in Machine Learning. At the time stamp of around 11mins, in this video lecture, https://www.youtube.com/watch?v=ISBGFY-gBug the learning curve is shown which is a plot of cross-validation error and training error vs the size of the training set. I am doing the k fold cross-validation method for hyperparameter tuning and model selection.
In this scenario,
- consider the variable
Xdatato be the entire feature set which is split into training set,DataTrainthat is used in thek foldsetup and is further split into training subset and validation subset. - So, using
DataTrainwe havetrainDataandtestDatafor the k fold setup. Then there is an independent test set, denoted by the variable
DataTest.When using k fold cross validation method, to plot the learning curve, would training error be the misclassification error on
DataTrainand cross-validation error be the misclassification error using the validation subset,testData?