1

I was trying to do multiple imputation in python. My motivation is driven by the mice package in R, however, I am looking for something equivalent in python. I found the IterativeImputer of sklearn. Following documentation and some posts on SO I am able to produce multiple imputed sets. However, this the imputed values are drawn from a distribution by setting sample_posterior = True. But this is not what I am looking for. I would like to draw the values not from a distribution but to be a real sample. I.e. as in R, draw from those values that are in the same leaf in a decision tree. (see page 94 https://cran.r-project.org/web/packages/mice/mice.pdf). Is there a way to change the "prediction" of a decision tree within the IterativeImputer to drawing a random observation of the same leaf?

Documentation: https://scikit-learn.org/stable/modules/impute.html

Post on SO: IterativeImputer - sample_posterior and Imputing missing values using sklearn IterativeImputer class for MICE

1 Answer 1

0

miceforest does what you are looking for. It implements mean matching by default, which will pull from real samples in the data.

However, miceforest uses lightgbm as a backend. This may or may not be what you want.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.