please refer to the notebook at the following address
this portion of code,
scores = cross_val_score(LogisticRegression(), X, y, scoring='accuracy', cv=10) print scores print scores.mean() generates the following error in a window 7 64bit machine
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-37-4a10affe67c7> in <module>() 1 # evaluate the model using 10-fold cross-validation ----> 2 scores = cross_val_score(LogisticRegression(), X, y, scoring='accuracy', cv=10) 3 print scores 4 print scores.mean() C:\Python27\lib\site-packages\sklearn\cross_validation.pyc in cross_val_score(estimator, X, y, scoring, cv, n_jobs, verbose, fit_params, score_func, pre_dispatch) 1140 allow_nans=True, allow_nd=True) 1141 -> 1142 cv = _check_cv(cv, X, y, classifier=is_classifier(estimator)) 1143 scorer = check_scoring(estimator, score_func=score_func, scoring=scoring) 1144 # We clone the estimator to make sure that all the folds are C:\Python27\lib\site-packages\sklearn\cross_validation.pyc in _check_cv(cv, X, y, classifier, warn_mask) 1366 if classifier: 1367 if type_of_target(y) in ['binary', 'multiclass']: -> 1368 cv = StratifiedKFold(y, cv, indices=needs_indices) 1369 else: 1370 cv = KFold(_num_samples(y), cv, indices=needs_indices) C:\Python27\lib\site-packages\sklearn\cross_validation.pyc in __init__(self, y, n_folds, indices, shuffle, random_state) 428 for test_fold_idx, per_label_splits in enumerate(zip(*per_label_cvs)): 429 for label, (_, test_split) in zip(unique_labels, per_label_splits): --> 430 label_test_folds = test_folds[y == label] 431 # the test split can be too big because we used 432 # KFold(max(c, self.n_folds), self.n_folds) instead of IndexError: too many indices for array I am using scikit.learn 0.15.2, it is suggested here that may a specific problem for windows 7, 64 bit machine.
==============update==============
I found the following code actually works
from sklearn.cross_validation import KFold cv = KFold(X.shape[0], 10, shuffle=True, random_state=33) scores = cross_val_score(LogisticRegression(), X, y, scoring='accuracy', cv=cv) print scores ==============update 2=============
it seems due to some package update, I can no longer reproduce such error on my machine. If you are facing the same issue on a windows 7 64bit machine, please let me know.
y?cv?X.shape[0] == 6366also?cv=10will try do stratified 10-fold CV,KFoldwill not.cv=StratifiedKFold(y, 10)explicitly would have been my next diagnosis step, if all else was equal.