0

Say there is a dataframe with 100 records containing 4(or n) columns, example of dataframe below:

 id target col3 col4 00 0 .. .. 00 0 .. .. 00 0 .. .. 01 1 .. .. 01 1 .. .. 01 0 .. .. 01 1 .. .. 02 1 .. .. 02 0 .. .. 02 1 .. .. 02 0 .. .. .. .. 

Based on this dataframe I want to create a new dataframe that is a resultant of group_by on this dataframe and value_counts of a specific column (target).

I have figured out how to get those values(my current code):

for id, target in df.group_by('id'): print(id) print(group.target.value_counts()) 

Which give me the following output:

00 0 3 Name: target, dtype: int64 01 0 1 1 3 Name: target, dtype: int64 02 0 2 1 2 Name: target, dtype: int64 .. .. 

I am able to get these values but I can't seem to pass these values into a empty dataframe. I would like to create a new dataframe that represents this information in this format:

id 0 1 00 3 NaN 01 1 3 02 2 2 .. .. 
2
  • 1
    am I missing something here or is what you want df.groupby('id').count().reset_index() Commented Jan 3, 2020 at 17:45
  • That gives a count of records for each key('id'). What I'm looking for is the count of each unique value for a column('target') for each key('key'). Commented Jan 3, 2020 at 17:48

3 Answers 3

2

Here's a way to do:

df = (df .groupby('id') .apply(lambda f: f['target'].value_counts().to_frame()) .unstack() .reset_index()) df.columns = ['id', 0, 1] print(df) id 0 1 0 0 3.0 NaN 1 1 1.0 3.0 2 2 2.0 2.0 
Sign up to request clarification or add additional context in comments.

Comments

2

You can do simple .pivot_table() with 'size' as aggfunc:

d = {'id': ['00', '00', '00', '01', '01', '01', '01', '02', '02', '02', '02'], 'target': [0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0]} df = pd.DataFrame(d) print( df.pivot_table(columns='target', index='id', aggfunc='size') ) 

Prints:

target 0 1 id 00 3.0 NaN 01 1.0 3.0 02 2.0 2.0 

Comments

1

You can use Pandas CrossTab functionality to achieve this. Pandas Crosstab computes the frequency of values between two factors in a table. Read more here

import pandas as pd import numpy as np d = {'id': ['00', '00', '00', '01', '01', '01', '01', '02', '02', '02', '02'], 'target': [0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0]} df = pd.DataFrame(d) print ( pd.crosstab(index=df['id'], columns=df['target']).replace(0, np.nan) ) 

prints

target 0 1 id 00 3 0 01 1 3 02 2 2 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.