Replacing values with nan based on values of another column

Question

This is my dataframe:

df = pd.DataFrame( { 'a': [np.nan, np.nan, np.nan, 3333, np.nan, np.nan, 10, np.nan, np.nan, np.nan, np.nan, 200, 100], 'b': [np.nan, 20, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 100, np.nan, np.nan, np.nan, np.nan] } )

And this is the output that I want:

 a b 0 NaN NaN 1 NaN 20.0 2 NaN NaN 3 3333.0 NaN 4 NaN NaN 5 NaN NaN 6 NaN NaN 7 NaN NaN 8 NaN 100.0 9 NaN NaN 10 NaN NaN 11 200.0 NaN 12 NaN NaN

Basically if a value in column 'b' is not NaN, I want to keep one value in column a. And then make the rest of values in column a NaN until a value in column b is not NaN.

For example the first case is 20 in column b. After that I want to keep 3333 because this is one value below it which is not NaN and I want to replace 10 with NaN because I've already got one value below b which in this case is 3333 and it is not NaN. The same applies for 100 in column b.

I've searched many posts on stackoverflow and also tried a couple of lines but it didn't work. I guess maybe it can be done by fillna.

ansev · Accepted Answer · 2023-01-20 22:58:31Z

One approach

a_notna = df['a'].notna() m = (a_notna.groupby(df['b'].notna().cumsum()) .cumsum() .eq(1) & a_notna) df['a'] = df['a'].where(m) print(df) a b 0 NaN NaN 1 NaN 20.0 2 NaN NaN 3 3333.0 NaN 4 NaN NaN 5 NaN NaN 6 NaN NaN 7 NaN NaN 8 NaN 100.0 9 NaN NaN 10 NaN NaN 11 200.0 NaN 12 NaN NaN

Soudipta Dutta · Accepted Answer · 2024-06-24 12:44:10Z

import pandas as pd import numpy as np df = pd.DataFrame({ 'a': [np.nan, np.nan, np.nan, 3333, np.nan, np.nan, 10, np.nan, np.nan, np.nan, np.nan, 200, 100], 'b': [np.nan, 20, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 100, np.nan, np.nan, np.nan, np.nan] }) def process_group(df): non_null_a = df['a'].notnull() if non_null_a.any(): # Keep the first non-null value of 'a' in the group df.loc[non_null_a.idxmax() +1 :,'a'] = np.nan return df # Group by consecutive blocks of non-null values in 'b', apply the function, and reset index df = df.groupby(df['b'].notnull().cumsum()).apply(process_group).reset_index(drop=True) print(df) ''' a b 0 NaN NaN 1 NaN 20.0 2 NaN NaN 3 3333.0 NaN 4 NaN NaN 5 NaN NaN 6 NaN NaN 7 NaN NaN 8 NaN 100.0 9 NaN NaN 10 NaN NaN 11 200.0 NaN 12 NaN NaN '''

Collectives™ on Stack Overflow

Replacing values with nan based on values of another column

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related