Pandas - Replacing NaN by aggregate of non-null values

Question

Suppose I have a DataFrame with some NaN -

import pandas as pd l = [{'C1':-6,'C3':2}, {'C2':-6,'C3':3}, {'C1':-6.3,'C2':8,'C3':9}, {'C2':-7}] df1 = pd.DataFrame(l, index=['R1','R2','R3','R4']) print(df1) C1 C2 C3 R1 -6.0 NaN 2.0 R2 NaN -6.0 3.0 R3 -6.3 8.0 9.0 R4 NaN -7.0 NaN

Problem - If there is any NaN value in any row cell then it has to be replaced by the aggregate of non-null values from the same row. For instance, in first row, the value of (R1,C2) should be = (-6+2)/2 = -2

Expected output -

 C1 C2 C3 R1 -6.0 -4.0 2.0 R2 -1.5 -6.0 3.0 R3 -6.3 8.0 9.0 R4 -7.0 -7.0 -7.0

jezrael · Accepted Answer · 2017-07-28 05:46:54Z

Use apply with axis=1 for process by rows:

df1 = df1.apply(lambda x: x.fillna(x.mean()), axis=1) print(df1) C1 C2 C3 R1 -6.0 -2.0 2.0 R2 -1.5 -6.0 3.0 R3 -6.3 8.0 9.0 R4 -7.0 -7.0 -7.0

Also works:

df1 = df1.T.fillna(df1.mean(1)).T print(df1) C1 C2 C3 R1 -6.0 -2.0 2.0 R2 -1.5 -6.0 3.0 R3 -6.3 8.0 9.0 R4 -7.0 -7.0 -7.0

Because:

df1 = df1.fillna(df1.mean(1), axis=1) print(df1)

NotImplementedError: Currently only can fill with dict/Series column by column

anon · Accepted Answer · 2017-07-28 05:58:11Z

You could do this. Transpose, then do fillna() then again transpose it.

>>> df1 = df1.T.fillna(df1.mean(axis=1)).T >>> print(df1) C1 C2 C3 R1 -6.0 -4.0 2.0 R2 -1.5 -6.0 3.0 R3 -6.3 8.0 9.0 R4 -7.0 -7.0 -7.0

Collectives™ on Stack Overflow

Pandas - Replacing NaN by aggregate of non-null values

2 Answers 2

1 Comment

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Related