0

I have the following pandas DataFrame:

df = pd.DataFrame({ "category": ["one", "one", "one", "one", "two", "two", "two", "three", "three", "three"], "value": [2, 4, 3, 2, 5, 6, 5, 7, 8, 6] }) >>> df category value 0 one 2 1 one 4 2 one 3 3 one 2 4 two 5 5 two 6 6 two 5 7 three 7 8 three 8 9 three 6 

I want to calculate a new column called normalized by computing the median (or any other groupby operation) and subtracting it (or any other simple operation) from the corresponding values in the non-grouped DataFrame. In non-pandas code this is what I mean:

new_column = [] # Groupby equivalent for cat in df["category"].unique(): curr_df = df[df["category"] == cat] curr_median = curr_df.median() # Calculation on groupby components for val in curr_df["value"]: normalized = val - curr_median new_column.append(normalized) df["normalized"] = new_column 

Which results in the following DataFrame:

df = pd.DataFrame({ "category": ["one", "one", "one", "one", "two", "two", "two", "three", "three", "three"], "value": [2, 4, 3, 2, 5, 6, 5, 7, 8, 6], "normalized": [-0.5, 1.5, 0.5, -0.5, 0.0, 1.0, 0.0, 0.0, 1.0, -1.0] }) >>> df category value normalized 0 one 2 -0.5 1 one 4 1.5 2 one 3 0.5 3 one 2 -0.5 4 two 5 0.0 5 two 6 1.0 6 two 5 0.0 7 three 7 0.0 8 three 8 1.0 9 three 6 -1.0 

How could I write this in a nicer, pandas way? Thanks in advance :)

1

1 Answer 1

2

transform is your friend. I think of this as apply when I want to maintain the original dataframe shape. You can use this:

df["normalized"] = df.value - df.groupby("category").value.transform("median") 

output:

 category value normalized 0 one 2 -0.5 1 one 4 1.5 2 one 3 0.5 3 one 2 -0.5 4 two 5 0.0 5 two 6 1.0 6 two 5 0.0 7 three 7 0.0 8 three 8 1.0 9 three 6 -1.0 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.