1
$\begingroup$

I am using the IterativeImputer from sklearn and I notice that it changes the data shape. Initially I have an (X,5) array where all columns except for the last one contain the missing value (which has been given as the missing value to the imputer). At the end the output array which is only (X,1) and specifically keeping the last column where the only real data exist. I would expect to return an (X,5) array. What could be wrong? Probably I am missing something but I do not see it...

import numpy as np from sklearn.experimental import enable_iterative_imputer from sklearn.impute import IterativeImputer missing_value = -999 imputer = IterativeImputer(missing_values=missing_value) aa = [[-999, -999, -999, -999, 1.2300000000000004], [-999, -999, -999, -999, 0.4599999999999973], [-999, -999, -999, -999, 0.18999999999999773], [-999, -999, -999, -999, -999], [-999, -999, -999, -999, 0.4400000000000013], [-999, -999, -999, -999, 0.41000000000000014], [-999, -999, -999, -999, -0.21999999999999886], [-999, -999, -999, -999, 0.7900000000000027], [-999, -999, -999, -999, -999]] imputer.fit( aa) result = imputer.transform( aa ) print(result) 

with a result of

 [[ 1.23 ] [ 0.46 ] [ 0.19 ] [ 0.47142857] [ 0.44 ] [ 0.41 ] [-0.22 ] [ 0.79 ] [ 0.47142857]] 
$\endgroup$

1 Answer 1

1
$\begingroup$

Features that are entirely missing cannot be meaningfully imputed, and by default are dropped by transform. See the documentation for the parameter keep_empty_features:

If True, features that consist exclusively of missing values when fit is called are returned in results when transform is called. The imputed value is always 0 [...]

$\endgroup$
3
  • $\begingroup$ I totally understand that there is no meaning and i will have a postprocessing apporach to remove all these (this is a part of a larger project, where in most cases the imputer does not fail). I am puzzled because in the past (perhaps a couple of years ago) that I first used the function it did not fail. Unless this is a new feature.... $\endgroup$ Commented Apr 1, 2024 at 21:44
  • $\begingroup$ I just noticed that indeed that keep_empty_features is "New in version 1.2.". So probably when i was using it at the beginning it wouldn't behave like that. $\endgroup$ Commented Apr 1, 2024 at 21:46
  • $\begingroup$ I actually checked and I am not using the latest version (my scikit-learn version is 0.23.2 - i know its old!). So I do not find the keep_empty_features available. Anyhow i have found a workaround and I am reshaping the output of the imputer by adding columns with some values. I have checked with previous results and they are the same. So I do not know what went wrong but at least I am getting the same results. Anyways as these are missing too many values they will not be used at the end (but want to be consistent in my code). $\endgroup$ Commented Apr 2, 2024 at 20:29

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.