2

Referring to this very popular question regarding groupby to dataframe. Unfortunately, I do not think this particular use case is the most useful, here's mine:

Suppose you have what could be a hierarchical dataset in a flattened form, e.g.

 key val 0 'a' 2 1 'a' 1 2 'b' 3 3 'b' 4 

what I wish to do is convert that dataframe to this structure

 'a' 'b' 0 2 3 1 1 4 

I thought this would be as simple as pd.DataFrame(df.groupby('key').groups) but it isn't.

How can I make this transformation?

4
  • What if there are two ('a', 2) pairs? What should be the expected result? Commented Jan 14, 2018 at 19:41
  • @Tai then they should be there the value 2, twice Commented Jan 14, 2018 at 19:55
  • 1
    Thanks. Updated my answer :P I suggested you update your example to help people understand your problem. You can take data from my answer. Commented Jan 14, 2018 at 20:04
  • This is a pandas pivot_table in disguise; it allows counts of duplicate entries. See How to aggregate unique count with pandas pivot_table, Difference between groupby and pivot_table for pandas dataframes. Commented Nov 18, 2023 at 0:53

5 Answers 5

7
df.assign(index=df.groupby('key').cumcount()).pivot('index','key','val') Out[369]: key 'a' 'b' index 0 2 3 1 1 4 
Sign up to request clarification or add additional context in comments.

8 Comments

Could you please explain how this works, as well as why you chose to make this hierarchical by assigning a value to index?
@SumNeuron , when you want ot pivot you need three para, index, column , value, In your example df, you do not have index, the cumcount is to get then index , then we can pivot your sample df to expected output
is there a way to drop the index label?
@SumNeuron adding rename_axis(None,axis=1) pandas.pydata.org/pandas-docs/stable/generated/…
@Wen that ends up dropping the label key and keeping index
|
3

what about the following approach?

In [134]: pd.DataFrame(df.set_index('val').groupby('key').groups) Out[134]: a b 0 2 3 1 1 4 

Comments

2

Think this should work. Note the example is different from OP's. There are duplicates in the example.

df = pd.DataFrame({'key': {0: "'a'", 1: "'a'", 2: "'b'", 3: "'b'", 4: "'a'"}, 'val': {0: 2, 1: 1, 2: 3, 3: 4, 4: 2}}) df_wanted = pd.DataFrame.from_dict( df.groupby("key")["val"].apply(list).to_dict(), orient='index' ).transpose() 'a' 'b' 0 2.0 3.0 1 1.0 4.0 2 2.0 NaN 

df.groupby("key")["val"].apply(list).to_dict() creates a dictionary {"'a'": [2, 1, 2], "'b'": [3, 4]}. Then, we transfer the dictionary to a DataFrame object.

We use DataFrame.from_dict function. Because the dictionary contains different lengths, we need to pass in an extra argument orient='index' and then do transpose() in the end.

Reference

Creating dataframe from a dictionary where entries have different lengths

Comments

0

I'm new to Pandas but this seems to work:

gb = df.groupby('key') k = 'val' pd.DataFrame( [gb.get_group(x)[k].tolist() for x in gb.groups], index=[x for x in gb.groups] ).transpose() 

Comments

0

Let's use set_index and unstack with cumcount:

df.set_index([df.groupby('key').cumcount(),'key'])['val']\ .unstack().rename_axis(None,1) 

Output:

 'a' 'b' 0 2 3 1 1 4 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.