1
$\begingroup$

I have a dataframe with IDs and booking refs, looking like the simplified example below.

ID BookingRef
001 2019/32323
002 2011/23232
002 2017/7u4922

In the above example, 001 has one booking and 002 has two bookings in total so the average number of bookings for customers is 1.5.

How could I calculate this for millions of records using python and pandas?

$\endgroup$
2
  • 2
    $\begingroup$ This question belongs in stack overflow $\endgroup$ Commented Sep 12, 2022 at 16:29
  • $\begingroup$ I'm affraid it doesn't even belon to SO as it would be a duplicate (or low effort) ... correct answer would be to suggest a data analysis course (I'd suggest kaggle course) so that op learn about pandas groupby. $\endgroup$ Commented Sep 18, 2022 at 8:23

2 Answers 2

2
$\begingroup$

You can use the groupby method to group the dataframe by ID, then size() to count the number of rows for each ID. Then use the mean function to get the average:

df.groupby('ID').size().mean() 
$\endgroup$
0
0
$\begingroup$

Since you want to count the number of rows per ID, another way is to use value_counts().

df['ID'].value_counts().mean() # 1.5 
$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.