1

Here I have a DataFrame like below:

>>> import pandas as pd >>> import numpy as np >>> df = pd.DataFrame() >>> df["user_id"] = [1,1,1,2,2,3,4,4,4,4] >>> df["cate"] = ["a","b","c","b","c","a","a","b","c","d"] >>> df["prob"] = [np.random.rand() for _ in range(len(df["user_id"]))] 

enter image description here

I want to convert the pro of each cate as a new columns of the user(user_id),like this:

enter image description here

The only solution to solve this problem is using for loop,when I have tens of thousands users, it's very very slowly!

user_ids = list(set(df["user_id"])) cates = list(set(df["cate"])) user_probs = pd.DataFrame() for uid in user_ids: d = pd.DataFrame({'user_id': [uid]}) for c in cates: ratio = df[(df["user_id"] == uid) & (df["cate"] == c)]["prob"] ratio = 0 if len(ratio)==0 else float(ratio) d["cate_"+c+"_prob"] = ratio user_probs = pd.concat([user_probs, d]) 

So, Does Pandas have built-in method to solve this problem? Thank you very much!

1 Answer 1

2

Pivot would work perfectly well here

df.pivot('user_id', 'cate', 'prob').reset_index().fillna(0) 

You get

cate user_id a b c d 0 1 0.853583 0.161935 0.388652 0.000000 1 2 0.000000 0.554185 0.177939 0.000000 2 3 0.700654 0.000000 0.000000 0.000000 3 4 0.781307 0.634584 0.861808 0.130701 

Another way using set_index

df.set_index(['user_id', 'cate']).prob.unstack(fill_value = 0).reset_index() 

You get the same result

Sign up to request clarification or add additional context in comments.

1 Comment

Awesome! Thank you very much!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.