1

On pandas DataFrame, I'm trying to compute percent change between two features. For example:

df = pd.DataFrame({'A': [100, 100, 100], 'B': [105, 110, 93], 'C': ['NaN', 102, 'NaN']}) 

I attempting to compute change between df['A'] - df['C'], but on the rows where we have 'NaN', use value from 'B' column.

Expecting result: [-5, -2, 7] since, df['C'].loc[0] is NaN, first value is 100 - 105 (from 'B'). But second value is 100 -102.

1 Answer 1

2

I think simpliest is replace missing values by another column by Series.fillna:

#if need replace strings NaN to missing values np.nan df['C'] = pd.to_numeric(df.C, errors='coerce') s = df['A'] - df['C'].fillna(df.B) print (s) 0 -5.0 1 -2.0 2 7.0 dtype: float64 

Another idea with numpy.where and test missing values by Series.isna:

a = np.where(df.C.isna(), df['A'] - df['B'], df['A'] - df['C']) print (a) [-5. -2. 7.] 

s = df['A'] - np.where(df.C.isna(), df['B'], df['C']) print (s) 0 -5.0 1 -2.0 2 7.0 Name: A, dtype: float64 
Sign up to request clarification or add additional context in comments.

2 Comments

Really appreciate it! How do you get well with pandas? Are there any exercises we can do or just experience?
@rohan - Here are some tutorials

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.