0

I created a set of random missing values to practice with a tree imputer. However, I'm stuck on how to overwrite the missing values into the my dataframe. My missing values look like this:

enter image description here

from sklearn.experimental import enable_iterative_imputer from sklearn.impute import IterativeImputer df_post_copy = df_post.copy() missing_mask = df_post_copy.isna() imputer = IterativeImputer(max_iter=10, random_state=0) imputed_values = imputer.fit_transform(df_post_copy) df_copy[missing_mask] = imputed_values[missing_mask] 

Results in:

ValueError: other must be the same shape as self when an ndarray 

But the shape matches...

imputed_values.shape (16494, 29) 

The type is:

type(imputed_values) numpy.ndarray 

What I have tried since it is the right shape is to convert it to a pandas dataframe:

test_imputed_values = pd.DataFrame(imputed_values) 

When I try:

df_copy[missing_mask] = test_imputed_values[missing_mask] 

I get the same as above:

enter image description here

How do I use a mask to insert the imputed values where needed?

2 Answers 2

1

imputer.fit_transform(...) returns both the original values and the (previously) missing values. If you want an updated DataFrame, something like

imputed_values = imputer.fit_transform(df_post_copy) df_post_copy.loc[:, :] = imputed_values 

should work.

Sign up to request clarification or add additional context in comments.

Comments

1

imputed_values = imputer.fit_transform(df_post_copy)

imputer.fit_transform returns a numpy array after filling in the missing values.

So, imputed_values has all the missing values filled in. You could convert the dataframe the usual way.

pd.DataFrame(imputer.fit_transform(df_post_copy))

will return the original dataframe with missing values filled in.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.