0

I have a csv file with 23 columns of categorical string variables i.e. Gender, Location, skillset, etc.

Several of these columns have missing values. No column is missing more than 20% of its data so I would like to impute the missing categorical variables.

is this possible?

I have tried from sklearn_pandas import CategoricalImputer

imputer=CategoricalImputer(strategy='most_frequent', axis=1) imputer.fit(df[["Permission", "Hope"]]) imputer.transform(df) 

but I am getting this error: NameError: name 'categoricalImputer' is not defined

Will I have to Hotcode each of the 23 columns to intergers before I can impute? or is it possible to impute missing categorical string variables?

2
  • 1
    just open python in the console and then type sklearn.__version__ Commented Mar 28, 2018 at 22:14
  • 1
    you should update to version 0.20. Here is just run pip install git+git://github.com/scikit-learn/scikit-learn.git or check this github.com/scikit-learn/scikit-learn/issues/10579 Commented Mar 28, 2018 at 22:42

1 Answer 1

2

CategoricalImputer is only introduced in version 0.20. So update with pip install git+git://github.com/scikit-learn/scikit-learn.git or check the github issue https://github.com/scikit-learn/scikit-learn/issues/10579

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.