I'm trying to understand K fold cross validation as I'm using it for the first time for my text classification. However I'm quite confused on how to implement it in python
I have a data frame where data is my text to be predicted and label is the prediction values (0 or 1). I currently used a train test split approach and used Multinomial NB on the vectorized data.
from sklearn import model_selection from sklearn.naive_bayes import MultinomialNB from sklearn.feature_extraction.text import CountVectorizer # split the data into training and testing datasets X_train, X_test, y_train, y_test = model_selection.train_test_split(df['data'], df['label'], random_state=1) vect = CountVectorizer(ngram_range=(1,2), max_features=1000 , stop_words="english") X_train_dtm = vect.fit_transform(X_train) X_test_dtm = vect.transform(X_test) nb = MultinomialNB() nb.fit(X_train_dtm, y_train) y_pred_class = nb.predict(X_test_dtm) I just wanted to know how can I implement a 5 fold validation in a similar way. I looked into a lot of examples but was quite confused how to do it in a right way as I'm a beginner.