Pandas Groupby result into a separate dataframe

Question

Say there is a dataframe with 100 records containing 4(or n) columns, example of dataframe below:

 id target col3 col4 00 0 .. .. 00 0 .. .. 00 0 .. .. 01 1 .. .. 01 1 .. .. 01 0 .. .. 01 1 .. .. 02 1 .. .. 02 0 .. .. 02 1 .. .. 02 0 .. .. .. ..

Based on this dataframe I want to create a new dataframe that is a resultant of group_by on this dataframe and value_counts of a specific column (target).

I have figured out how to get those values(my current code):

for id, target in df.group_by('id'): print(id) print(group.target.value_counts())

Which give me the following output:

00 0 3 Name: target, dtype: int64 01 0 1 1 3 Name: target, dtype: int64 02 0 2 1 2 Name: target, dtype: int64 .. ..

I am able to get these values but I can't seem to pass these values into a empty dataframe. I would like to create a new dataframe that represents this information in this format:

id 0 1 00 3 NaN 01 1 3 02 2 2 .. ..

am I missing something here or is what you want df.groupby('id').count().reset_index() — gold_cy
– gold_cy, Commented Jan 3, 2020 at 17:45
That gives a count of records for each key('id'). What I'm looking for is the count of each unique value for a column('target') for each key('key'). — user9996043
– user9996043, Commented Jan 3, 2020 at 17:48

YOLO · Accepted Answer · 2020-01-03 17:49:01Z

Here's a way to do:

df = (df .groupby('id') .apply(lambda f: f['target'].value_counts().to_frame()) .unstack() .reset_index()) df.columns = ['id', 0, 1] print(df) id 0 1 0 0 3.0 NaN 1 1 1.0 3.0 2 2 2.0 2.0

Andrej Kesely · Accepted Answer · 2020-01-03 17:53:07Z

You can do simple .pivot_table() with 'size' as aggfunc:

d = {'id': ['00', '00', '00', '01', '01', '01', '01', '02', '02', '02', '02'], 'target': [0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0]} df = pd.DataFrame(d) print( df.pivot_table(columns='target', index='id', aggfunc='size') )

Prints:

target 0 1 id 00 3.0 NaN 01 1.0 3.0 02 2.0 2.0

BarathVutukuri · Accepted Answer · 2020-01-05 17:32:49Z

You can use Pandas CrossTab functionality to achieve this. Pandas Crosstab computes the frequency of values between two factors in a table. Read more here

import pandas as pd import numpy as np d = {'id': ['00', '00', '00', '01', '01', '01', '01', '02', '02', '02', '02'], 'target': [0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0]} df = pd.DataFrame(d) print ( pd.crosstab(index=df['id'], columns=df['target']).replace(0, np.nan) )

prints

target 0 1 id 00 3 0 01 1 3 02 2 2

Collectives™ on Stack Overflow

Pandas Groupby result into a separate dataframe

3 Answers 3

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Linked

Related