Skip to main content

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

Required fields*

4
  • $\begingroup$ I really appreciate your help Mary!!! Could you please have a look at the update in my question? I am not sure about my approach when there is no test data and when I need to do CV on whole data. Thank you again.. $\endgroup$ Commented Mar 14, 2017 at 15:36
  • $\begingroup$ Hi, @renakre ;) Your first suggestion indeed evaluates on the training data, and, as you say, it is preferable to do a global CV. When you do this, make sure that all steps of your analysis are nested inside the CV loop (as explained in the answer), otherwise the results will be biased. $\endgroup$ Commented Mar 15, 2017 at 17:40
  • $\begingroup$ Thank you very much for updating your answer!!! I see that you used pipeline. So, in this case there is no need for calling the fit() or transform() methods $\endgroup$ Commented Mar 15, 2017 at 18:47
  • $\begingroup$ Exactly. cross_val_predict does so implicitly. $\endgroup$ Commented Mar 16, 2017 at 7:09