Add new column indicating count in a pandas dataframe

Question

I have a dataframe with some replicated rows

item h2 h3 h4 ---------------- foo v1 ... ... foo v2 ... ... foo v1 ... ... foo v2 ... ... foo v1 ... ... foo v2 ... ... foo v1 ... ... foo v2 ... ... bar v5 ... ... bar v6 ... ... bar v7 ... ... bar v5 ... ... bar v6 ... ... bar v7 ... ...

My goal is to add a column (new_id) in this dataframe which indicates an incrementing count of duplicate blocks (block being a set of rows that have the same item name) prefixed with the value in the item column (if it helps, the replicated blocks will be consecutive)

item h2 h3 h4 new_id ----------------------- foo v1 ... ... foo1 foo v2 ... ... foo1 foo v1 ... ... foo2 foo v2 ... ... foo2 foo v1 ... ... foo3 foo v2 ... ... foo3 foo v1 ... ... foo4 foo v2 ... ... foo4 bar v5 ... ... bar1 bar v6 ... ... bar1 bar v7 ... ... bar1 bar v5 ... ... bar2 bar v6 ... ... bar2 bar v7 ... ... bar2

Suggestions on how to accomplish this?

wwnde · Accepted Answer · 2020-10-05 07:20:46Z

Use str.cat() to concat column item with the cummulative count of each group in h2. Obviously the cummulative count begins from zero, offset it by 1

df.item.str.cat((df.groupby('h2').cumcount()+1).astype(str),sep='') item h2 h3 h4 new_id 0 foo v1 ... ... foo1 1 foo v2 ... ... foo1 2 foo v1 ... ... foo2 3 foo v2 ... ... foo2 4 foo v1 ... ... foo3 5 foo v2 ... ... foo3 6 foo v1 ... ... foo4 7 foo v2 ... ... foo4 8 bar v5 ... ... bar1 9 bar v6 ... ... bar1 10 bar v7 ... ... bar1 11 bar v5 ... ... bar2 12 bar v6 ... ... bar2 13 bar v7 ... ... bar2

jezrael · Accepted Answer · 2020-10-05 07:23:53Z

Use GroupBy.cumcount by both columns item and h2:

df['new_id'] = df['item'] + '_' + df.groupby(['item','h2']).cumcount().add(1).astype(str) print (df) item h2 h3 h4 new_id 0 foo v1 ... ... foo_1 1 foo v2 ... ... foo_1 2 foo v1 ... ... foo_2 3 foo v2 ... ... foo_2 4 foo v1 ... ... foo_3 5 foo v2 ... ... foo_3 6 foo v1 ... ... foo_4 7 foo v2 ... ... foo_4 8 bar v5 ... ... bar_1 9 bar v6 ... ... bar_1 10 bar v7 ... ... bar_1 11 bar v5 ... ... bar_2 12 bar v6 ... ... bar_2 13 bar v7 ... ... bar_2

Thanks for the quick answer! I just updated my question to clarify the example.. sorry for the confusion. Can you look at it again and update your answer? Thanks a lot!

Collectives™ on Stack Overflow

Add new column indicating count in a pandas dataframe

2 Answers 2

Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Related