3

I am running Python 3 using spyder 2, and when I attempt to run this code:

from sklearn.preprocessing import LabelEncoder cv=train.dtypes.loc[train.dtypes=='object'].index print (cv) le=LabelEncoder() for i in cv: train[i]=le.fit_transform(train[i]) test[i]=le.fit_transform(test[i]) 

I get this error:

le=LabelEncoder() for i in cv: train[i]=le.fit_transform(train[i]) test[i]=le.fit_transform(test[i]) Traceback (most recent call last): File "<ipython-input-5-8739984f61b2>", line 3, in <module> train[i]=le.fit_transform(train[i]) File "C:\Users\myname\Anaconda3\lib\site-packages\sklearn\preprocessing\label.py", line 127, in fit_transform self.classes_, y = np.unique(y, return_inverse=True) File "C:\Users\myname\Anaconda3\lib\site-packages\numpy\lib\arraysetops.py", line 195, in unique perm = ar.argsort(kind='mergesort' if return_index else 'quicksort') TypeError: unorderable types: str() > float() 

Oddly enough, if I call the encoder on a specified column in my data, the output is successful. For instance:

le.fit_transform(test['Race']) 

Results in:

le.fit_transform(test['Race']) Out[7]: array([2, 4, 4, ..., 4, 1, 4], dtype=int64) 

I've tried:

float(le.fit_transform(train[i])) str(le.fit_transform(train[i])) 

Both have not worked.

Could someone please help me out?

1 Answer 1

3

Had the same problem too. Turns out that I missed checking for missing values. Check if you have any left (in your case):

print(train.apply(lambda x : sum(x.isnull()))) print(test.apply(lambda x : sum(x.isnull()))) 

If you have some either replace them with a parameter (mean, med, mod...) or simply encode them as a String, i.e. for an arbitrary variable VAR :

parameter = train[VAR].mean() # parameter = "Nan" train[VAR].fillna(parameter, inplace = True ) 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.