3

Is there a way for a Scikit-learn Imputer to look for and replace multiple values which are considered "missing values"?

For example, I would like to do something like

imp = Imputer(missing_values=(7,8,9)) 

But according to the docs, the missing_values parameter only accepts a single integer:

missing_values : integer or “NaN”, optional (default=”NaN”)

The placeholder for the missing values. All occurrences of missing_values will be imputed. For missing values encoded as np.nan, use the string value “NaN”.

1
  • First replace all the values you want to impute to Nan, as given in the answer below. Commented Jun 12, 2018 at 6:20

2 Answers 2

5

Why not to do this manually in your original dataset? Assuming you are using pd.DataFrame you can do the following:

import numpy as np import pandas as pd from sklearn.preprocessing import Imputer df = pd.DataFrame({'A': [1, 2, 3, 8], 'B': [1, 2, 5, 3]}) df_new = df.replace([1, 2], np.nan) df_imp = Imputer().fit_transform(df_new) 

This results in df_imp:

array([[ 5.5, 4. ], [ 5.5, 4. ], [ 3. , 5. ], [ 8. , 3. ]]) 

If you want to make this a part of a pipeline, you would just need to implement a custom transformer with a similar logic.

Sign up to request clarification or add additional context in comments.

1 Comment

Looked at the documentation for replace(). It also accepts regex, so even if there were another variable 'C' in your example which I don't want to change, that can be done. Thank you
2

You could chain multiple imputers in a pipeline, but that might become hectic pretty soon and I'm not sure how efficient that is.

pipeline = make_pipeline( SimpleImputer(missing_values=7, strategy='constant', fill_value=10), SimpleImputer(missing_values=8, strategy='constant', fill_value=10), SimpleImputer(missing_values=9, strategy='constant', fill_value=10) ) 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.