How to GroupBy a Dataframe in Pandas and keep Columns [duplicate]

Question

Given a dataframe that logs uses of some books like this:

Name Type ID Book1 ebook 1 Book2 paper 2 Book3 paper 3 Book1 ebook 1 Book2 paper 2

I need to get the count of all the books, keeping the other columns and get this:

Name Type ID Count Book1 ebook 1 2 Book2 paper 2 2 Book3 paper 3 1

How can this be done?

EdChum · Accepted Answer · 2015-07-22 18:14:50Z

You want the following:

In [20]: df.groupby(['Name','Type','ID']).count().reset_index() Out[20]: Name Type ID Count 0 Book1 ebook 1 2 1 Book2 paper 2 2 2 Book3 paper 3 1

In your case the 'Name', 'Type' and 'ID' cols match in values so we can groupby on these, call count and then reset_index.

An alternative approach would be to add the 'Count' column using transform and then call drop_duplicates:

In [25]: df['Count'] = df.groupby(['Name'])['ID'].transform('count') df.drop_duplicates() Out[25]: Name Type ID Count 0 Book1 ebook 1 2 1 Book2 paper 2 2 2 Book3 paper 3 1

This seems to work, but If we had many more columns (as I have in other dataframes), wouldn't this hurt performance? Also, it is not very intuitive.
The problem here is that grouping will reduce the amount of information so it won't necessarily yield your desired df in one go, I've updated my answer to show how it could be done in 2 steps which is better to understand

jtlz2 · Accepted Answer · 2022-07-31 19:32:08Z

126

I think as_index=False should do the trick.

df.groupby(['Name','Type','ID'], as_index=False).count()

edited Jul 31, 2022 at 19:32

jtlz2

8,53511 gold badges74 silver badges128 bronze badges

answered Jun 2, 2016 at 22:06

jpobst

3,7413 gold badges27 silver badges25 bronze badges

1 Comment

Michael McFarlane Over a year ago

This is the simplest answer and works for other summary stats.

NeStack · Accepted Answer · 2023-10-02 15:12:45Z

If you have many columns in a df it makes sense to use df.groupby(['ID']).agg(Count=('ID', 'count'),...), see here. The .agg() function allows you to choose what to do with the columns you don't want to apply operations on. If you just want to keep them (or more precisely to keep the first entries in them), use .agg(Count=('ID', 'count'), col1=('col1', 'first'), col2=('col2', 'first'),...). Instead of 'first', you can also apply 'sum', 'mean' and others.

I use this because it gives custom names to new calculated columns.
@SteveScott I actually didn't know about the option to give custom names to new columns. Can you provide an example? I will be certainly using it, I frequently come back to this answer to look up the exact syntax
@NeStack .agg(col1_sum=('col1', 'sum'), col2_avg=('col2', 'mean'))

NeStack · Accepted Answer · 2023-05-24 13:28:57Z

SIMPLEST WAY

df.groupby(['col1', 'col1'], as_index=False).count(). Use as_index=False to retain column names. The default is True.

Also you can use df.groupby(['col_1', 'col_2']).count().reset_index()

rhug123 · Accepted Answer · 2023-02-11 20:03:40Z

You can use value_counts() as well:

df.value_counts().reset_index(name= 'Count')

Output:

 Name Type ID Count 0 Book1 ebook 1 2 1 Book2 paper 2 2 2 Book3 paper 3 1

Collectives™ on Stack Overflow

How to GroupBy a Dataframe in Pandas and keep Columns [duplicate]

5 Answers 5

2 Comments

1 Comment

3 Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

2 Comments

1 Comment

3 Comments

Comments

Comments

Linked

Related