How to get unique values from multiple columns in a pandas groupby

Question

Starting from this dataframe df:

df = pd.DataFrame({'c':[1,1,1,2,2,2],'l1':['a','a','b','c','c','b'],'l2':['b','d','d','f','e','f']}) c l1 l2 0 1 a b 1 1 a d 2 1 b d 3 2 c f 4 2 c e 5 2 b f

I would like to perform a groupby over the c column to get unique values of the l1 and l2 columns. For one columns I can do:

g = df.groupby('c')['l1'].unique()

that correctly returns:

c 1 [a, b] 2 [c, b] Name: l1, dtype: object

but using:

g = df.groupby('c')['l1','l2'].unique()

returns:

AttributeError: 'DataFrameGroupBy' object has no attribute 'unique'

I know I can get the unique values for the two columns with (among others):

In [12]: np.unique(df[['l1','l2']]) Out[12]: array(['a', 'b', 'c', 'd', 'e', 'f'], dtype=object)

Is there a way to apply this method to the groupby in order to get something like:

c 1 [a, b, d] 2 [c, b, e, f] Name: l1, dtype: object

is there a way you can have the output as distinct columns instead of one cell having a list? — saving_space
– saving_space, Commented Oct 9, 2020 at 4:45

Yaakov Bressler · Accepted Answer · 2020-01-23 22:30:44Z

70

Alternatively, you can use agg:

g = df.groupby('c')['l1','l2'].agg(['unique'])

answered Jan 23, 2020 at 22:30

Yaakov Bressler

12.7k5 gold badges66 silver badges96 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

CodeMaster Over a year ago

how would you combine 'unique' and let's say '.join' in the same agg?

Yaakov Bressler Over a year ago

You can write a custom function and apply it the same way. For example: f = lambda arr: ','.join(np.unique(arr)) --> then .agg([f]) or, if you want to label it: .agg([('MyName', f)])

josepmaria Over a year ago

@YaakovBressler how do you actually get the resulting values in order?

Yaakov Bressler Over a year ago

You could sort the data at any point! Best performance would be to sort after the aggregation -> df.groupby(...).agg()..sort_values() More context + options here: pandas groupby, then sort within groups @josepmaria

Philip Ciunkiewicz Over a year ago

Visiting this in 2023, this is the correct answer. While you CAN use apply, this approach with agg is much more readable and flexible.

|

score 63 · Accepted Answer · 2019-09-15 21:56:48Z

You can do it with apply:

import numpy as np g = df.groupby('c')['l1','l2'].apply(lambda x: list(np.unique(x)))

Ch3steR · Accepted Answer · 2021-02-27 16:25:53Z

18

One more alternative is to use GroupBy.agg with set

df.groupby('c').agg(set) l1 l2 c 1 {a, b} {d, b} 2 {c, b} {e, f}

answered Feb 27, 2021 at 16:25

Ch3steR

20.8k4 gold badges34 silver badges66 bronze badges

2 Comments

Yaakov Bressler Over a year ago

You might get into trouble with this when the values in l1 and l2 aren't hashable (ex timestamps). Otherwise, solid solution.

anapaulagomes Over a year ago

Beautiful solution but it doesn't work for nan.

Mykola Zotko · Accepted Answer · 2023-10-04 16:15:22Z

A shorter version without the lambda function:

df.groupby('c').apply(np.unique) # or df.groupby('c')['l1','l2'].apply(np.unique)

Output:

c 1 [a, b, d] 2 [b, c, e, f] dtype: object

Collectives™ on Stack Overflow

How to get unique values from multiple columns in a pandas groupby

4 Answers 4

6 Comments

Comments

2 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

6 Comments

Comments

2 Comments

Comments

Linked

Related