Average number of records by ID

Question

I have a dataframe with IDs and booking refs, looking like the simplified example below.

ID	BookingRef
001	2019/32323
002	2011/23232
002	2017/7u4922

In the above example, 001 has one booking and 002 has two bookings in total so the average number of bookings for customers is 1.5.

How could I calculate this for millions of records using python and pandas?

I'm affraid it doesn't even belon to SO as it would be a duplicate (or low effort) ... correct answer would be to suggest a data analysis course (I'd suggest kaggle course) so that op learn about pandas groupby. — Lucas Morin
– Lucas Morin, Commented Sep 18, 2022 at 8:23

zachdj · Accepted Answer · 2022-09-12 16:31:45Z

You can use the groupby method to group the dataframe by ID, then size() to count the number of rows for each ID. Then use the mean function to get the average:

df.groupby('ID').size().mean()

cottontail · Accepted Answer · 2022-09-30 05:16:49Z

Since you want to count the number of rows per ID, another way is to use value_counts().

df['ID'].value_counts().mean() # 1.5

Stack Exchange Network

Average number of records by ID

2 Answers 2

Hot Network Questions

Average number of records by ID

2 Answers 2

Related

Hot Network Questions