Pandas: GroupBy to DataFrame

Question

Referring to this very popular question regarding groupby to dataframe. Unfortunately, I do not think this particular use case is the most useful, here's mine:

Suppose you have what could be a hierarchical dataset in a flattened form, e.g.

 key val 0 'a' 2 1 'a' 1 2 'b' 3 3 'b' 4

what I wish to do is convert that dataframe to this structure

 'a' 'b' 0 2 3 1 1 4

I thought this would be as simple as pd.DataFrame(df.groupby('key').groups) but it isn't.

How can I make this transformation?

What if there are two ('a', 2) pairs? What should be the expected result? — Tai
– Tai, Commented Jan 14, 2018 at 19:41
Thanks. Updated my answer :P I suggested you update your example to help people understand your problem. You can take data from my answer. — Tai
– Tai, Commented Jan 14, 2018 at 20:04
This is a pandas pivot_table in disguise; it allows counts of duplicate entries. See How to aggregate unique count with pandas pivot_table, Difference between groupby and pivot_table for pandas dataframes. — smci
– smci, Commented Nov 18, 2023 at 0:53

BENY · Accepted Answer · 2018-01-14 17:22:46Z

7

df.assign(index=df.groupby('key').cumcount()).pivot('index','key','val') Out[369]: key 'a' 'b' index 0 2 3 1 1 4

answered Jan 14, 2018 at 17:22

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

SumNeuron Over a year ago

Could you please explain how this works, as well as why you chose to make this hierarchical by assigning a value to index?

BENY Over a year ago

@SumNeuron , when you want ot pivot you need three para, index, column , value, In your example df, you do not have index, the cumcount is to get then index , then we can pivot your sample df to expected output

SumNeuron Over a year ago

is there a way to drop the index label?

BENY Over a year ago

@SumNeuron adding rename_axis(None,axis=1) pandas.pydata.org/pandas-docs/stable/generated/…

SumNeuron Over a year ago

@Wen that ends up dropping the label key and keeping index

|

MaxU - stand with Ukraine · Accepted Answer · 2018-01-14 17:14:48Z

what about the following approach?

In [134]: pd.DataFrame(df.set_index('val').groupby('key').groups) Out[134]: a b 0 2 3 1 1 4

Tai · Accepted Answer · 2018-01-14 20:14:37Z

Think this should work. Note the example is different from OP's. There are duplicates in the example.

df = pd.DataFrame({'key': {0: "'a'", 1: "'a'", 2: "'b'", 3: "'b'", 4: "'a'"}, 'val': {0: 2, 1: 1, 2: 3, 3: 4, 4: 2}}) df_wanted = pd.DataFrame.from_dict( df.groupby("key")["val"].apply(list).to_dict(), orient='index' ).transpose() 'a' 'b' 0 2.0 3.0 1 1.0 4.0 2 2.0 NaN

df.groupby("key")["val"].apply(list).to_dict() creates a dictionary {"'a'": [2, 1, 2], "'b'": [3, 4]}. Then, we transfer the dictionary to a DataFrame object.

We use DataFrame.from_dict function. Because the dictionary contains different lengths, we need to pass in an extra argument orient='index' and then do transpose() in the end.

Reference

Creating dataframe from a dictionary where entries have different lengths

SumNeuron · Accepted Answer · 2018-01-14 17:11:18Z

I'm new to Pandas but this seems to work:

gb = df.groupby('key') k = 'val' pd.DataFrame( [gb.get_group(x)[k].tolist() for x in gb.groups], index=[x for x in gb.groups] ).transpose()

Scott Boston · Accepted Answer · 2018-01-14 19:08:03Z

Let's use set_index and unstack with cumcount:

df.set_index([df.groupby('key').cumcount(),'key'])['val']\ .unstack().rename_axis(None,1)

Output:

 'a' 'b' 0 2 3 1 1 4

Collectives™ on Stack Overflow

Pandas: GroupBy to DataFrame

5 Answers 5

8 Comments

Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

8 Comments

Comments

Comments

Comments

Comments

Linked

Related