49

I have this simple dataframe df:

df = pd.DataFrame({'c':[1,1,1,2,2,2,2],'type':['m','n','o','m','m','n','n']}) 

my goal is to count values of type for each c, and then add a column with the size of c. So starting with:

In [27]: g = df.groupby('c')['type'].value_counts().reset_index(name='t') In [28]: g Out[28]: c type t 0 1 m 1 1 1 n 1 2 1 o 1 3 2 m 2 4 2 n 2 

the first problem is solved. Then I can also:

In [29]: a = df.groupby('c').size().reset_index(name='size') In [30]: a Out[30]: c size 0 1 3 1 2 4 

How can I add the size column directly to the first dataframe? So far I used map as:

In [31]: a.index = a['c'] In [32]: g['size'] = g['c'].map(a['size']) In [33]: g Out[33]: c type t size 0 1 m 1 3 1 1 n 1 3 2 1 o 1 3 3 2 m 2 4 4 2 n 2 4 

which works, but is there a more straightforward way to do this?

3 Answers 3

46

Use transform to add a column back to the orig df from a groupby aggregation, transform returns a Series with its index aligned to the orig df:

In [123]: g = df.groupby('c')['type'].value_counts().reset_index(name='t') g['size'] = df.groupby('c')['type'].transform('size') g Out[123]: c type t size 0 1 m 1 3 1 1 n 1 3 2 1 o 1 3 3 2 m 2 4 4 2 n 2 4 
Sign up to request clarification or add additional context in comments.

3 Comments

Actually in this case we should change it to g['size'] = g.groupby('c')['t'].transform('size') since I want to keep the value_counts().
this is a much better answer than: stackoverflow.com/questions/30244952/…
The link to transform has changed slightly, this is the new one: pandas.pydata.org/pandas-docs/stable/user_guide/…
20

Another solution with transform len:

df['size'] = df.groupby('c')['type'].transform(len) print df c type size 0 1 m 3 1 1 n 3 2 1 o 3 3 2 m 4 4 2 m 4 5 2 n 4 6 2 n 4 

Another solution with Series.map and Series.value_counts:

df['size'] = df['c'].map(df['c'].value_counts()) print (df) c type size 0 1 m 3 1 1 n 3 2 1 o 3 3 2 m 4 4 2 m 4 5 2 n 4 6 2 n 4 

1 Comment

Could you please explain shortly why you removed the first part of your original answer? (I found it useful for my goal, which is different from the OP's, but the question's title describes it well (which is how i got here))
1

You can calculate the groupby object and use it multiple times:

g = df.groupby('c')['type'] df = g.value_counts().reset_index(name='counts') df['size'] = g.transform('size') 

or

g.value_counts().reset_index(name='counts').assign(size=g.transform('size')) 

Output:

 c type counts size 0 1 m 1 3 1 1 n 1 3 2 1 o 1 3 3 2 m 2 4 4 2 n 2 4 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.