pandas dataframe create a new column whose values are based on groupby sum on another column

Question

I am trying to create a new column amount_0_flag for a df, the values in that column are based on groupby another column key, for which if amount sum is 0, assigned True to amount_0_flag, otherwise False. The df looks like,

key amount amount_0_flag negative_amount 1 1.0 True False 1 1.0 True True 2 2.0 False True 2 3.0 False False 2 4.0 False False

so when df.groupby('key'), cluster with key=1, will be assigned True to amount_0_flag for each element of the cluster, since within the cluster, one element has negative 1 and another element has postive 1 as their amounts.

df.groupby('key')['amount'].sum()

only gives the sum of amount for each cluster not considering values in negative_amount and I am wondering how to also find the cluster and its rows with 0 sum amounts consdering negative_amount values using pandas/numpy.

Do you mind to write the original dataframe and the output you are looking forward? — rpanai
– rpanai, Commented Dec 7, 2017 at 18:33

Scott Boston · Accepted Answer · 2017-12-07 18:39:56Z

Let's try this where I created a 'new_column' showing the comparison to your 'amount_0_flag':

df['new_column'] = (df.assign(amount_n = df.amount * np.where(df.negative_amount,-1,1)) .groupby('key')['amount_n'] .transform(lambda x: sum(x)<=0))

Output:

 key amount amount_0_flag negative_amount new_column 0 1 1.0 True False True 1 1 1.0 True True True 2 2 2.0 False True False 3 2 3.0 False False False 4 2 4.0 False False False

Collectives™ on Stack Overflow

pandas dataframe create a new column whose values are based on groupby sum on another column

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related