Imputing missing values using sklearn IterativeImputer class for MICE

Question

I'm trying to learn how to implement MICE in imputing missing values for my datasets. I've heard about fancyimpute's MICE, but I also read that sklearn's IterativeImputer class can accomplish similar results. From sklearn's docs:

Our implementation of IterativeImputer was inspired by the R MICE package (Multivariate Imputation by Chained Equations) [1], but differs from it by returning a single imputation instead of multiple imputations. However, IterativeImputer can also be used for multiple imputations by applying it repeatedly to the same dataset with different random seeds when sample_posterior=True

I've seen "seeds" being used in different pipelines, but I never understood them well enough to implement them in my own code. I was wondering if anyone could explain and provide an example on how to implement seeds for a MICE imputation using sklearn's IterativeImputer? Thanks!

If you are willing to forego sklearn you can try miceforest. — Francis Laclé
– Francis Laclé, Commented Dec 16, 2021 at 15:13

Stanislas Morbieu · Accepted Answer · 2019-10-29 21:44:04Z

IterativeImputer behavior can change depending on a random state. The random state which can be set is also called a "seed".

As stated by the documentation, we can get multiple imputations when setting sample_posterior to True and changing the random seeds, i.e. the parameter random_state.

Here is an example of how to use it:

import numpy as np from sklearn.experimental import enable_iterative_imputer from sklearn.impute import IterativeImputer X_train = [[1, 2], [3, 6], [4, 8], [np.nan, 3], [7, np.nan]] X_test = [[np.nan, 2], [np.nan, np.nan], [np.nan, 6]] for i in range(3): imp = IterativeImputer(max_iter=10, random_state=i, sample_posterior=True) imp.fit(X_train) print(f"imputation {i}:") print(np.round(imp.transform(X_test)))

It outputs:

imputation 0: [[ 1. 2.] [ 5. 10.] [ 3. 6.]] imputation 1: [[1. 2.] [0. 1.] [3. 6.]] imputation 2: [[1. 2.] [1. 2.] [3. 6.]]

We can observe the three different imputations.

Would it be correct to pool the three imputations into a single set? If so, how would you accomplish this? I'm probably misunderstanding your explanation, but it looks like I would be creating 3 different datasets, each representing a different imputation seed.
It is indeed creating 3 different datasets. How to use it depends on your final task (classification, regression, etc. or just to infer the missing values of your features?). I would suggest to ask another question, and it is probably better on Cross Validated than Stack Overflow.
@GlennG. were you able to figure out how to pool the datasets into a single dataset? I am also currently in the same position, and would like to fill the missing values in my features.

GSA · Accepted Answer · 2022-09-01 20:02:37Z

A way to go about stacking the data might be to change @Stanislas' code around a bit like so:

mvi = {} # just my preference for dict, you can use a list too # mvi collects each dataframe into a dict of dataframes using index: 0 thru 2 for i in range(3): imp = IterativeImputer(max_iter=10, random_state=i, sample_posterior=True) mvi[i] = np.round(imp.fit_transform(X_train))

combine the imputations into a single dataset using

# a. pandas concat, or pd.concat(list(dfImp.values()), axis=0) #b. np stack dfs = np.stack(list(dfImp.values()), axis=0)

pd.concat creates a 2D data, on the other hand,np.stack creates a 3D array that you can reshape into 2D. The breakdown of the numpy 3D is as follows:

axis 0: num of iterated dataframes
axis 1: len of original df (num of rows)
axis 2: num of columns in original dataframe

create a 2D from 3D

You can use numpy reshape like so:

np.reshape(dfs, newshape=(dfs.shape[0]*dfs.shape[1], -1))

which means you essentially multiply axis 0 by axis 1 to stack the dataframes into one big dataframe. The -1 at the end just means that whatever axes is left off, use that, in this case it is the columns.

Collectives™ on Stack Overflow

Imputing missing values using sklearn IterativeImputer class for MICE

2 Answers 2

3 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Linked

Related