I am building a real time machine learning module, which is not based on a huge** sample size, with hyper parameter grid search and cross validation process. I am looking for any insight/advice, as I`m considering one of these options:
Use cross validation grid search to look for the best hyper-parameter (HP) fit, and once I found the best HP combination, use it to re-train my classifier on the whole sample set.
Break in advance my training set to 2 sub-set, use cross validation by iteratively re-break my training set into train/test sets which searching for the best HP combination, and use the trained classifier with the best HP without going through the last re-train stage described in 1.
Do the same as 1, but keep the same random seed as I re-train the classifier.
The trade off, as I see it, is between getting a bit more sample size for training, while at the same time loosing assurance that the performance will be good and I`m not overfilling.
Again any though/insight into my dilemma are welcome.
*Note that I`m using random-forest and extra trees classifiers during grid search.
**My sample size is typically between a few hundreds and a few thousands for each class, and the number of features is between 70 and 1500, typically.