How might I emulate R's selectNcomp in scikit learn?

Asked 9 months ago

Viewed 49 times

R's mvr package has a useful function, selectNcomp, for automated selection of retained latent variables in cross-validated PLS models, through implementation of the heuristic one standard error rule: 'The approach "onesigma" simply returns the first model where the optimal CV is within one standard error of the absolute optimum (Hastie, Tibshirani and Friedman, 2009). Note that here we simply use the standard deviation of the cross-validation residuals, in line with the procedure used to calculate the error measure itself'

How might one mimic this functionality in Python? From my SKlearn grid search, I have available the mean RMSE per selected latent variable across all folds of the validation, and the associated standard deviation across all folds, but I am unsure how/if it is possible to calculate standard error from this information, or if it is indeed meaningfully different from simply using the standard deviation as a limit. I am not necessarily asking for specific programming pointers, just an understanding of the underlying statistics and if an equivalent metric could be used. This seemed more appropriate to ask here. Thank you.

Hastie, T., Friedman, J. and Tibshirani, R. The Elements of Statistical Learning: data mining, inference, and prediction, Springer (2013), 10th printing with corrections, paragraph 7.10.