0

I need to fill the missing temperature value with the mean value of that month using Imputer() in scikit-learn.

First I split the dataframe into groups based on the month. Then I called the imputer function to calculate the mean for that group and fill in the missing values.

Here is the code I wrote but it didn't work:

def impute_missing (data_1_group): imp = Imputer(missing_values='NaN', strategy='mean', axis=0) imp.fit(data_1_group) data_1_group=imp.transform(data_1_group['datetime']) return(data_1_group) for data_1_group in data_1.groupby(pd.TimeGrouper("M")): impute_missing(data_1_group) 

Any suggestion?

4
  • 1
    Here is the code I wrote but it didn't work. How did it not work? What is the exact error? Commented Dec 12, 2016 at 22:02
  • this is what I got TypeError: float() argument must be a string or a number Commented Dec 13, 2016 at 14:01
  • Now I changed it to : grouped = data_1.groupby(pd.TimeGrouper("M")) f = lambda x: x.fillna(x.mean()) transformed = grouped['temperature'].transform(f) and I got TypeError: cannot concatenate 'str' and 'int' objects Commented Dec 13, 2016 at 14:01
  • So this question really isn't about the Imputer, it's about the groupby method? Commented Mar 17, 2017 at 23:05

1 Answer 1

1

try this small change

imp=imp.fit(data_1_group['datetime']) data_1_group=imp.transform(data_1_group['datetime'])

Though I m new to scikit myself, I am recommending the solution that worked for me. This is because

1) imp object needs to override to fit, as in the first line

2) it needs to fit and impute the same dataset, which in this case seems to be data_1_group['datetime']

I hope this helps

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.