Count number of unique values in a column, grouped by multiple other columns [duplicate]

Question

I am trying to figure out how to use the Pandas to find the number of unique values in a DataFrame that has been grouped by two of its columns. My sample dataset looks like this:

df = pd.DataFrame( {"Header1" : [0,1,2,3,0,1,2,3], "Header2" : [0,1,0,1,0,1,0,1], "values" : [1,2,3,4,1,3,2,1]} )

I would love to be able to transform this to return a DataFrame that looks like this

output = pd.DataFrame( {"Header1" : [0,1,2,3], "Header2" : [0,1,0,1], "unique values" : [1,2,2,2]} )

So far what I have tried is using groupby and nunique:

pd_series = df.groupby(['Header1', 'Header2'])['values'].nunique()

This returns the right answer but in a multi-indexed series data format that is very tricky to convert to a usable DataFrame. I've spent quite a lot of time trying to figure out how to correctly format the output with no luck. Instead of generating a DataFrame with the correct set of columns, pd_series.to_frame() produces a DataFrame with a single column named "values" with one row that contains the entire series object.

So far I am resorting to copy-pasting the results from nunique() into a new DataFrame. There must be a better way. Does anyone have any suggestions for how to do better with this?

I think you just need to set as_index=False, i.e. df.groupby(['Header1', 'Header2'], as_index=False)['values'].nunique(), no? — fsimonjetz
– fsimonjetz, Commented May 19, 2022 at 22:02
If you want to handle the rename as well you can set the name attribute of Series.reset_index as in this answer or this answer new_df = df.groupby(['Header1', 'Header2'])['values'].nunique().reset_index(name="unique values") — Henry Ecker
– Henry Ecker ♦, Commented May 19, 2022 at 22:26

cammyalex · Accepted Answer · 2022-05-19 22:18:22Z

Removing the bracket notation select of 'values' and adding .reset_index() to the end of your command will give your desired result in dataframe format.

df.groupby(['Header1', 'Header2']).nunique().reset_index()

Collectives™ on Stack Overflow

Count number of unique values in a column, grouped by multiple other columns [duplicate]

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related