I have a csv file with 23 columns of categorical string variables i.e. Gender, Location, skillset, etc.
Several of these columns have missing values. No column is missing more than 20% of its data so I would like to impute the missing categorical variables.
is this possible?
I have tried from sklearn_pandas import CategoricalImputer
imputer=CategoricalImputer(strategy='most_frequent', axis=1) imputer.fit(df[["Permission", "Hope"]]) imputer.transform(df) but I am getting this error: NameError: name 'categoricalImputer' is not defined
Will I have to Hotcode each of the 23 columns to intergers before I can impute? or is it possible to impute missing categorical string variables?
pip install git+git://github.com/scikit-learn/scikit-learn.gitor check this github.com/scikit-learn/scikit-learn/issues/10579