pandas add column to groupby dataframe

Question

I have this simple dataframe df:

df = pd.DataFrame({'c':[1,1,1,2,2,2,2],'type':['m','n','o','m','m','n','n']})

my goal is to count values of type for each c, and then add a column with the size of c. So starting with:

In [27]: g = df.groupby('c')['type'].value_counts().reset_index(name='t') In [28]: g Out[28]: c type t 0 1 m 1 1 1 n 1 2 1 o 1 3 2 m 2 4 2 n 2

the first problem is solved. Then I can also:

In [29]: a = df.groupby('c').size().reset_index(name='size') In [30]: a Out[30]: c size 0 1 3 1 2 4

How can I add the size column directly to the first dataframe? So far I used map as:

In [31]: a.index = a['c'] In [32]: g['size'] = g['c'].map(a['size']) In [33]: g Out[33]: c type t size 0 1 m 1 3 1 1 n 1 3 2 1 o 1 3 3 2 m 2 4 4 2 n 2 4

which works, but is there a more straightforward way to do this?

EdChum · Accepted Answer · 2016-05-12 14:33:33Z

46

Use transform to add a column back to the orig df from a groupby aggregation, transform returns a Series with its index aligned to the orig df:

In [123]: g = df.groupby('c')['type'].value_counts().reset_index(name='t') g['size'] = df.groupby('c')['type'].transform('size') g Out[123]: c type t size 0 1 m 1 3 1 1 n 1 3 2 1 o 1 3 3 2 m 2 4 4 2 n 2 4

edited May 12, 2016 at 14:33

answered May 12, 2016 at 14:27

EdChum

397k204 gold badges836 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Fabio Lamanna Over a year ago

Actually in this case we should change it to g['size'] = g.groupby('c')['t'].transform('size') since I want to keep the value_counts().

vagabond Over a year ago

this is a much better answer than: stackoverflow.com/questions/30244952/…

Nicolas Forstner Over a year ago

The link to transform has changed slightly, this is the new one: pandas.pydata.org/pandas-docs/stable/user_guide/…

jezrael · Accepted Answer · 2020-06-07 11:33:30Z

Another solution with transform len:

df['size'] = df.groupby('c')['type'].transform(len) print df c type size 0 1 m 3 1 1 n 3 2 1 o 3 3 2 m 4 4 2 m 4 5 2 n 4 6 2 n 4

Another solution with Series.map and Series.value_counts:

df['size'] = df['c'].map(df['c'].value_counts()) print (df) c type size 0 1 m 3 1 1 n 3 2 1 o 3 3 2 m 4 4 2 m 4 5 2 n 4 6 2 n 4

Could you please explain shortly why you removed the first part of your original answer? (I found it useful for my goal, which is different from the OP's, but the question's title describes it well (which is how i got here))

Mykola Zotko · Accepted Answer · 2022-01-23 10:25:44Z

You can calculate the groupby object and use it multiple times:

g = df.groupby('c')['type'] df = g.value_counts().reset_index(name='counts') df['size'] = g.transform('size')

or

g.value_counts().reset_index(name='counts').assign(size=g.transform('size'))

Output:

 c type counts size 0 1 m 1 3 1 1 n 1 3 2 1 o 1 3 3 2 m 2 4 4 2 n 2 4

Collectives™ on Stack Overflow

pandas add column to groupby dataframe

3 Answers 3

3 Comments

1 Comment

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

1 Comment

Comments

Linked

Related