Imputation of categorical variables in python/scikit

Question

I have a csv file with 23 columns of categorical string variables i.e. Gender, Location, skillset, etc.

Several of these columns have missing values. No column is missing more than 20% of its data so I would like to impute the missing categorical variables.

is this possible?

I have tried from sklearn_pandas import CategoricalImputer

imputer=CategoricalImputer(strategy='most_frequent', axis=1) imputer.fit(df[["Permission", "Hope"]]) imputer.transform(df)

but I am getting this error: NameError: name 'categoricalImputer' is not defined

Will I have to Hotcode each of the 23 columns to intergers before I can impute? or is it possible to impute missing categorical string variables?

just open python in the console and then type sklearn.__version__ — pythonic833
– pythonic833, Commented Mar 28, 2018 at 22:14
you should update to version 0.20. Here is just run pip install git+git://github.com/scikit-learn/scikit-learn.git or check this github.com/scikit-learn/scikit-learn/issues/10579 — pythonic833
– pythonic833, Commented Mar 28, 2018 at 22:42

pythonic833 · Accepted Answer · 2018-03-28 22:43:59Z

CategoricalImputer is only introduced in version 0.20. So update with pip install git+git://github.com/scikit-learn/scikit-learn.git or check the github issue https://github.com/scikit-learn/scikit-learn/issues/10579

Collectives™ on Stack Overflow

Imputation of categorical variables in python/scikit

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related