comparing the last values in pandas groupby

Question

This is my dataframe:

df = pd.DataFrame({'a': list('xxxxxzzz'), 'b':[0,0,1,0,1,0,1,1], 'c': [100, 101, 105, 110, 120, 125, 100, 150], 'd':[0,0,0,1,1,0,0,0]})

I group them:

groups = df.groupby(['a', 'd'])

I want to add another column to df that in each group shows the difference (in percentage) between the last value of c that its b is 0 and the last value that its b is 1.

For example in the first group I want to compare c of row 2 and row 1.

My desired groups looks like this:

('x', 0) a b c d result 0 x 0 100 0 3.96 1 x 0 101 0 3.96 2 x 1 105 0 3.96 ('x', 1) a b c d result 3 x 0 110 1 9.09 4 x 1 120 1 9.09 ('z', 0) a b c d result 5 z 0 125 0 20.0 6 z 1 100 0 20.0 7 z 1 150 0 20.0

rafaelc · Accepted Answer · 2019-03-09 16:04:15Z

Define a custom function and use GroupBy.apply

def func(s): l0 = s[s.b==0].tail(1).c.item() l1 = s[s.b==1].tail(1).c.item() s['result'] = (l1 - l0)/l0 * 100 return s df.groupby(['a','d']).apply(func)

Outputs

 a b c d result 0 x 0 100 0 3.960396 1 x 0 101 0 3.960396 2 x 1 105 0 3.960396 3 x 0 110 1 9.090909 4 x 1 120 1 9.090909 5 z 0 125 0 20.000000 6 z 1 100 0 20.000000 7 z 1 150 0 20.000000

If you need each groups separately, just use a list comprehension [func(g) for n, g in df.groupby(['a','d'])]

Erfan · Accepted Answer · 2019-03-09 16:14:20Z

We can do the following here:

Apply .pct_change method to calculate the percent change of each row
Conditionally fill the values of result column with NaN
After that apply fillna with bfill or ffill

# first we apply .pct_change to all rows df['result'] = abs(round(df.groupby(['a', 'd', 'b']).c.pct_change() * 100, 2)) # after that we check if the value if b = 1 and the value of the row before = 0 and we fill in NaN if condition not true df['result'] = np.where((df.b == 1) & (df.b.shift(1) == 0), df.result, np.NaN)

So we get:

 a b c d result 0 x 0 100 0 NaN 1 x 0 101 0 NaN 2 x 1 105 0 3.96 3 x 0 110 1 NaN 4 x 1 120 1 9.09 5 z 0 125 0 NaN 6 z 1 100 0 20.00 7 z 1 150 0 NaN

# then backfill and forwardfill NaN df.result.fillna(method='bfill', inplace=True) df.result.fillna(method='ffill', inplace=True) print(df) a b c d result 0 x 0 100 0 3.96 1 x 0 101 0 3.96 2 x 1 105 0 3.96 3 x 0 110 1 9.09 4 x 1 120 1 9.09 5 z 0 125 0 20.00 6 z 1 100 0 20.00 7 z 1 150 0 20.00

Collectives™ on Stack Overflow

comparing the last values in pandas groupby

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related