1

This is my dataframe:

df = pd.DataFrame( { 'a': [np.nan, np.nan, np.nan, 3333, np.nan, np.nan, 10, np.nan, np.nan, np.nan, np.nan, 200, 100], 'b': [np.nan, 20, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 100, np.nan, np.nan, np.nan, np.nan] } ) 

And this is the output that I want:

 a b 0 NaN NaN 1 NaN 20.0 2 NaN NaN 3 3333.0 NaN 4 NaN NaN 5 NaN NaN 6 NaN NaN 7 NaN NaN 8 NaN 100.0 9 NaN NaN 10 NaN NaN 11 200.0 NaN 12 NaN NaN 

Basically if a value in column 'b' is not NaN, I want to keep one value in column a. And then make the rest of values in column a NaN until a value in column b is not NaN.

For example the first case is 20 in column b. After that I want to keep 3333 because this is one value below it which is not NaN and I want to replace 10 with NaN because I've already got one value below b which in this case is 3333 and it is not NaN. The same applies for 100 in column b.

I've searched many posts on stackoverflow and also tried a couple of lines but it didn't work. I guess maybe it can be done by fillna.

2 Answers 2

3

One approach

a_notna = df['a'].notna() m = (a_notna.groupby(df['b'].notna().cumsum()) .cumsum() .eq(1) & a_notna) df['a'] = df['a'].where(m) print(df) a b 0 NaN NaN 1 NaN 20.0 2 NaN NaN 3 3333.0 NaN 4 NaN NaN 5 NaN NaN 6 NaN NaN 7 NaN NaN 8 NaN 100.0 9 NaN NaN 10 NaN NaN 11 200.0 NaN 12 NaN NaN 
Sign up to request clarification or add additional context in comments.

Comments

0
import pandas as pd import numpy as np df = pd.DataFrame({ 'a': [np.nan, np.nan, np.nan, 3333, np.nan, np.nan, 10, np.nan, np.nan, np.nan, np.nan, 200, 100], 'b': [np.nan, 20, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 100, np.nan, np.nan, np.nan, np.nan] }) def process_group(df): non_null_a = df['a'].notnull() if non_null_a.any(): # Keep the first non-null value of 'a' in the group df.loc[non_null_a.idxmax() +1 :,'a'] = np.nan return df # Group by consecutive blocks of non-null values in 'b', apply the function, and reset index df = df.groupby(df['b'].notnull().cumsum()).apply(process_group).reset_index(drop=True) print(df) ''' a b 0 NaN NaN 1 NaN 20.0 2 NaN NaN 3 3333.0 NaN 4 NaN NaN 5 NaN NaN 6 NaN NaN 7 NaN NaN 8 NaN 100.0 9 NaN NaN 10 NaN NaN 11 200.0 NaN 12 NaN NaN ''' 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.