0

I'm attempting to implement cross validation on the results from my KNN classifier. I have used the following code, which returns a type error.

For context, I have already imported SciKit Learn, Numpy, and Pandas libraries.

from sklearn.cross_validation import cross_val_score, ShuffleSplit n_samples = len(y) knn = KNeighborsClassifier(3) cv = ShuffleSplit(n_samples, n_iter=10, test_size=0.3, random_state=0) test_scores = cross_val_score(knn, X, y, cv=cv) test_scores.mean() 

Returns:

 --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-139-d8cc3ee0c29b> in <module>() 7 cv = ShuffleSplit(n_samples, n_iter=10, test_size=0.3, random_state=0) 8 9 test_scores = cross_val_score(knn, X, y, cv=cv) 10 test_scores.mean() //anaconda/lib/python2.7/site-packages/sklearn/cross_validation.pyc in cross_val_score(estimator, X, y, scoring, cv, n_jobs, verbose, fit_params, score_func, pre_dispatch) 1150 delayed(_cross_val_score)(clone(estimator), X, y, scorer, train, test, 1151 verbose, fit_params) 1152 for train, test in cv) 1153 return np.array(scores) 1154 //anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __call__(self, iterable) 515 try: 516 for function, args, kwargs in iterable: 517 self.dispatch(function, args, kwargs) 518 519 self.retrieve() //anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in dispatch(self, func, args, kwargs) 310 """ 311 if self._pool is None: 312 job = ImmediateApply(func, args, kwargs) 313 index = len(self._jobs) 314 if not _verbosity_filter(index, self.verbose): //anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __init__(self, func, args, kwargs) 134 # Don't delay the application, to avoid keeping the input 135 # arguments in memory 136 self.results = func(*args, **kwargs) 137 138 def get(self): //anaconda/lib/python2.7/site-packages/sklearn/cross_validation.pyc in _cross_val_score(estimator, X, y, scorer, train, test, verbose, fit_params) 1056 y_test = None 1057 else: 1058 y_train = y[train] 1059 y_test = y[test] 1060 estimator.fit(X_train, y_train, **fit_params) TypeError: only integer arrays with one element can be converted to an index 
1
  • Please specify whether your y variable is or derives from a pandas.DataFrame Commented Apr 4, 2014 at 18:54

1 Answer 1

1

This is an error related to pandas. Scikit learn expects numpy arrays, sparse matrices or objects that behave similarly to these.

The main issue with pandas DataFrames is due to the fact that indexing with [...] chooses columns and not lines. Line indexing in pandas is done through DataFrame.loc[...]. This is unexpected behaviour for sklearn. The error probably came from line 1058, where the code is failing to extract the train sample.

To remedy this, if your y is one DataFrame column, try converting your column to array type

y = y.values 

Otherwise pandas-sklearn is possibly an option.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.