How to convert groupby multi-index as a new columns in Pandas?

Question

Here I have a DataFrame like below:

>>> import pandas as pd >>> import numpy as np >>> df = pd.DataFrame() >>> df["user_id"] = [1,1,1,2,2,3,4,4,4,4] >>> df["cate"] = ["a","b","c","b","c","a","a","b","c","d"] >>> df["prob"] = [np.random.rand() for _ in range(len(df["user_id"]))]

I want to convert the pro of each cate as a new columns of the user(user_id),like this:

The only solution to solve this problem is using for loop,when I have tens of thousands users, it's very very slowly!

user_ids = list(set(df["user_id"])) cates = list(set(df["cate"])) user_probs = pd.DataFrame() for uid in user_ids: d = pd.DataFrame({'user_id': [uid]}) for c in cates: ratio = df[(df["user_id"] == uid) & (df["cate"] == c)]["prob"] ratio = 0 if len(ratio)==0 else float(ratio) d["cate_"+c+"_prob"] = ratio user_probs = pd.concat([user_probs, d])

So, Does Pandas have built-in method to solve this problem? Thank you very much!

Vaishali · Accepted Answer · 2017-05-11 02:35:31Z

Pivot would work perfectly well here

df.pivot('user_id', 'cate', 'prob').reset_index().fillna(0)

You get

cate user_id a b c d 0 1 0.853583 0.161935 0.388652 0.000000 1 2 0.000000 0.554185 0.177939 0.000000 2 3 0.700654 0.000000 0.000000 0.000000 3 4 0.781307 0.634584 0.861808 0.130701

Another way using set_index

df.set_index(['user_id', 'cate']).prob.unstack(fill_value = 0).reset_index()

You get the same result

Collectives™ on Stack Overflow

How to convert groupby multi-index as a new columns in Pandas?

1 Answer 1

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Related