pandas dataframe group by a column and based on count update another column rows individually

Question

Input dataframe

data = { 'org_id' :[79,80,21,36,40,7,10,9,12,24], 'r_id' : [79,80,20,20,20,7,7,9,12,12], 'Type_id' : ['P','P','C','C','C','P','C','P','P','C'], 'grp_id': ['g54','g55','g13','g13','g13','g6','g6','g7','g8','g8'] } df2 = pd.DataFrame.from_dict(data) df2 Out[271]: org_id r_id Type_id grp_id 0 79 79 P g54 1 80 80 P g55 2 21 20 C g13 3 36 20 C g13 4 40 20 C g13 5 7 7 P g6 6 10 7 C g6 7 9 9 P g7 8 12 12 P g8 9 24 12 C g8

output dataframe

data = {

'org_id' :[79,80,21,36,40,7,10,9,12,24], 'r_id' : [79,80,20,20,20,7,7,9,12,12], 'Type_id' : ['C','C','C','C','C','P','C','C','P','C'], 'grp_id': ['g54','g55','g13','g13','g13','g6','g6','g7','g8','g8'] } df3 = pd.DataFrame.from_dict(data) df3

Expected output:

 Out[270]: org_id r_id Type_id grp_id 0 79 79 C g54 1 80 80 C g55 2 21 20 C g13 3 36 20 C g13 4 40 20 C g13 5 7 7 P g6 6 10 7 C g6 7 9 9 C g7 8 12 12 P g8 9 24 12 C g8

Based on the group count on column(grp_id) if the value is equal to one(1) then change the type_id to 'C'

Eg. g54 and g55 has only one occurrence, hence the type_id='C', g13 and g6 appears more than once , so I don't change that type. Thanking you.

jezrael · Accepted Answer · 2019-12-03 13:25:40Z

Use Series.where with mask by Series.duplicated with keep=False for all duplicates:

#get all duplicated rows by grp_id mask = df2['grp_id'].duplicated(keep=False) #compare counts by not equal 1 #mask = df2.groupby('grp_id')['grp_id'].transform('size').ne(1) df2['Type_id'] = df2['Type_id'].where(mask, 'C') print (df2) org_id r_id Type_id grp_id 0 79 79 C g54 1 80 80 C g55 2 21 20 C g13 3 36 20 C g13 4 40 20 C g13 5 7 7 P g6 6 10 7 C g6 7 9 9 C g7 8 12 12 P g8 9 24 12 C g8

Collectives™ on Stack Overflow

pandas dataframe group by a column and based on count update another column rows individually

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related