1

I have a pandas Dataframe with a column I would like to group them by pack of 3 rows and then increment an indice on each pack.

id protocol protocol_grp 1 ISD ISD1 2 ISD ISD1 3 ISD ISD1 4 IRQ IRQ1 5 IRQ IRQ1 6 IRQ IRQ1 7 IRQ IRQ2 8 IRQ IRQ2 9 IRQ IRQ2 10 IRQ IRQ3 11 ISD ISD2 12 ISD ISD2 13 ISD ISD2 14 ISD ISD3 15 IRQ IRQ3 16 IRQ IRQ3 17 IRQ IRQ4 

The desired output is protocol_grp column. What I'd like to be able to do is each time I had 3 same protocols, I increment the indice by 1.

Hopes this make sense.

2
  • Can you check the row id 16 please? It should be IRQ3 and not IRQ4, right? Commented Apr 18, 2022 at 15:18
  • @Corralien Right ! Corrected thx Commented Apr 18, 2022 at 15:37

2 Answers 2

2

You can use:

df['protocol_grp'] = df['protocol'] + df.groupby('protocol').cumcount() \ .floordiv(3).add(1).astype(str) print(df) # Output id protocol protocol_grp 0 1 ISD ISD1 1 2 ISD ISD1 2 3 ISD ISD1 3 4 IRQ IRQ1 4 5 IRQ IRQ1 5 6 IRQ IRQ1 6 7 IRQ IRQ2 7 8 IRQ IRQ2 8 9 IRQ IRQ2 9 10 IRQ IRQ3 10 11 ISD ISD2 11 12 ISD ISD2 12 13 ISD ISD2 13 14 ISD ISD3 14 15 IRQ IRQ3 15 16 IRQ IRQ3 # <- check this row 16 17 IRQ IRQ4 
Sign up to request clarification or add additional context in comments.

2 Comments

I tried your solution with about 50k rows. But after a while (IRQ7 and ISD22) the indice start back to 1. I don't understand why
I have double checked; no leading or trailing whitespaces
1

Let us check cumcount then get the divisor

df['protocol_grp'] = df['protocol'].add((df.groupby('protocol').cumcount()//3+1).astype(str)) 

1 Comment

Thx for you solution. But same as @Corralien, the indice is getting back to 1 after a while.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.