fill NaN values with mean based on another column specific value

Question

I want to fill the NaN values on my dataframe on column c with the mean for only rows who has as category B, and ignore the others.

print (df) Category b c 0 A 1 5.0 1 C 1 NaN 2 A 1 4.0 3 B 2 NaN 4 A 2 1.0 5 B 2 Nan 6 C 1 3.0 7 C 1 2.0 8 B 1 NaN

So what I'm doing for the moment is :

df.c = df.c.fillna(df.c.mean())

But it fill all the NaN values, while I want only to fill the 3rd, 5th and the 8th rows who had category value equal to B.

Trenton McKinney · Accepted Answer · 2019-11-22 00:37:23Z

Combine fillna with slicing assignment

df.loc[df.Category.eq('B'), 'c'] = (df.loc[df.Category.eq('B'), 'c']. fillna(df.c.mean())) Out[736]: Category b c 0 A 1 5.0 1 C 1 NaN 2 A 1 4.0 3 B 2 3.0 4 A 2 1.0 5 B 2 3.0 6 C 1 3.0 7 C 1 2.0 8 B 1 3.0

Or a direct assignment with 2 masks

pandas.DataFrame.eq is the element wise equality operator.

df.loc[df.Category.eq('B') & df.c.isna(), 'c'] = df.c.mean() Out[745]: Category b c 0 A 1 5.0 1 C 1 NaN 2 A 1 4.0 3 B 2 3.0 4 A 2 1.0 5 B 2 3.0 6 C 1 3.0 7 C 1 2.0 8 B 1 3.0

Thank you @Andy for the answer that was perfect, beside I'd like to know which one is horizontally scalable for more data.
@saul: You are welcome. I think the 2nd is more scalable and clean :)

Amine Mchayaa · Accepted Answer · 2019-11-22 00:27:28Z

This would be the answer for your question:

df.c = df.apply( lambda row: row['c'].fillna(df.c.mean()) if row['Category']=='B' else row['c'] ,axis=1)

Collectives™ on Stack Overflow

fill NaN values with mean based on another column specific value

2 Answers 2

2 Comments

Comments

Hot Network Questions