Add number to groups by count of values in Pandas Dataframe

Question

I have a pandas Dataframe with a column I would like to group them by pack of 3 rows and then increment an indice on each pack.

id protocol protocol_grp 1 ISD ISD1 2 ISD ISD1 3 ISD ISD1 4 IRQ IRQ1 5 IRQ IRQ1 6 IRQ IRQ1 7 IRQ IRQ2 8 IRQ IRQ2 9 IRQ IRQ2 10 IRQ IRQ3 11 ISD ISD2 12 ISD ISD2 13 ISD ISD2 14 ISD ISD3 15 IRQ IRQ3 16 IRQ IRQ3 17 IRQ IRQ4

The desired output is protocol_grp column. What I'd like to be able to do is each time I had 3 same protocols, I increment the indice by 1.

Hopes this make sense.

Can you check the row id 16 please? It should be IRQ3 and not IRQ4, right? — Corralien
– Corralien, Commented Apr 18, 2022 at 15:18

Corralien · Accepted Answer · 2022-04-18 15:17:03Z

You can use:

df['protocol_grp'] = df['protocol'] + df.groupby('protocol').cumcount() \ .floordiv(3).add(1).astype(str) print(df) # Output id protocol protocol_grp 0 1 ISD ISD1 1 2 ISD ISD1 2 3 ISD ISD1 3 4 IRQ IRQ1 4 5 IRQ IRQ1 5 6 IRQ IRQ1 6 7 IRQ IRQ2 7 8 IRQ IRQ2 8 9 IRQ IRQ2 9 10 IRQ IRQ3 10 11 ISD ISD2 11 12 ISD ISD2 12 13 ISD ISD2 13 14 ISD ISD3 14 15 IRQ IRQ3 15 16 IRQ IRQ3 # <- check this row 16 17 IRQ IRQ4

I tried your solution with about 50k rows. But after a while (IRQ7 and ISD22) the indice start back to 1. I don't understand why

BENY · Accepted Answer · 2022-04-18 15:13:53Z

1

Let us check cumcount then get the divisor

df['protocol_grp'] = df['protocol'].add((df.groupby('protocol').cumcount()//3+1).astype(str))

answered Apr 18, 2022 at 15:13

BENY

324k22 gold badges176 silver badges250 bronze badges

1 Comment

MrSo Over a year ago

Thx for you solution. But same as @Corralien, the indice is getting back to 1 after a while.

Collectives™ on Stack Overflow

Add number to groups by count of values in Pandas Dataframe

2 Answers 2

2 Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Related