7

Currently when I have to add a constant column to an existing data frame, I do the following. To me it seems not all that elegant (the part where I multiply by length of dataframe). Wondering if there are better ways of doing this.

import pandas as pd testdf = pd.DataFrame({'categories': ['bats', 'balls', 'paddles'], 'skus': [50, 5000, 32], 'sales': [500, 700, 90]}) testdf['avg_sales_per_sku'] = [testdf.sales.sum() / testdf.skus.sum()] * len(testdf) 

2 Answers 2

19

You can fill the column implicitly by giving only one number.

testdf['avg_sales_per_sku'] = testdf.sales.sum() / testdf.skus.sum() 

From the documentation:

When inserting a scalar value, it will naturally be propagated to fill the column

Sign up to request clarification or add additional context in comments.

Comments

2

It seems confusing to me to mix the categorical average with the aggregate average. You could also use:

testdf['avg_sales_per_sku'] = testdf.sales / testdf.skus testdf['avg_agg_sales_per_agg_sku'] = testdf.sales.sum() / float(testdf.skus.sum()) # float is for Python2 >>> testdf categories sales skus avg_sales_per_sku avg_agg_sales_per_agg_sku 0 bats 500 50 10.0000 0.253837 1 balls 700 5000 0.1400 0.253837 2 paddles 90 32 2.8125 0.253837 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.