How do I convert upgrouped Data to Grouped Data in Pandas

Question

So i have a dataset about customers in a store and the sales of that store on each day.

Which looks something like this -

Store ID Sales Customers 1 250 500 2 276 786 3 124 256 5 164 925

How do i convert it to grouped data, something like this

Sales Customers 0-100 0 100-200 1181 200-300 1286

I have searched for a while and found this from the pandas site - http://pandas.pydata.org/pandas-docs/version/0.15.2/groupby.html

df2.groupby(['X'], sort=True).sum()

But i am unable to understand how to apply the same to my example.

jezrael · Accepted Answer · 2017-09-12 06:51:08Z

Use cut for bins and then groupby and aggregate sum:

df = df.groupby(pd.cut(df['Sales'], [0,100,200,300]))['Customers'].sum().fillna(0) print (df) Sales (0, 100] 0.0 (100, 200] 1181.0 (200, 300] 1286.0 Name: Customers, dtype: float64

Also is possible define labels:

l =['0-100','100-200','200-300'] b = [0,100,200,300] df = df.groupby(pd.cut(df['Sales'], bins=b, labels=l))['Customers'].sum() .fillna(0) .reset_index() print (df) Sales Customers 0 0-100 0.0 1 100-200 1181.0 2 200-300 1286.0

Zero · Accepted Answer · 2017-09-12 06:56:17Z

Use pd.cut with cusom bins

In [2058]: df.groupby(pd.cut(df.Sales, [0, 100, 200, 300])).Customers.sum().fillna(0) Out[2058]: Sales (0, 100] 0.0 (100, 200] 1181.0 (200, 300] 1286.0 Name: Customers, dtype: float64

And reset_index() to reshape

In [2061]: (df.groupby(pd.cut(df.Sales, [0, 100, 200, 300])).Customers.sum() .fillna(0).reset_index()) Out[2061]: Sales Customers 0 (0, 100] 0.0 1 (100, 200] 1181.0 2 (200, 300] 1286.0

And, use labels for custom names

In [2062]: (df.groupby(pd.cut(df.Sales, bins=[0, 100, 200, 300], labels=['0-100', '100-200', '200-300'])) .Customers.sum().fillna(0).reset_index()) Out[2062]: Sales Customers 0 0-100 0.0 1 100-200 1181.0 2 200-300 1286.0

piRSquared · Accepted Answer · 2017-09-12 07:03:56Z

1

with np.searchsorted

labels = {0: '0-100', 1: '100-200', 2: '200-300'} s = np.searchsorted([100, 200], df.Sales) df.groupby(s).Customers.sum().reindex( pd.Index([0, 1, 2], name='Sales'), fill_value=0 ).rename(labels).reset_index() Sales Customers 0 0-100 0 1 100-200 1181 2 200-300 1286

answered Sep 12, 2017 at 7:03

piRSquared

296k68 gold badges509 silver badges654 bronze badges

2 Comments

jezrael Over a year ago

Can I ask something? I use your solution for timings and it seems wrong output here. And I have no idea what is wrong. Can you check it?

piRSquared Over a year ago

Your functions don't accept a parameter. Therefore data is always the same. Use this for stmt parameter: '{}(df)'.format(j). And place data in the signature of each function you define. dropna(data) and notnull(data) and query(data)

Collectives™ on Stack Overflow

How do I convert upgrouped Data to Grouped Data in Pandas

3 Answers 3

Comments

Comments

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

2 Comments

Linked

Related