Summing given columns by row in DataFrame

Question

I have a DataFrame defined by:

df = pd.DataFrame({ 'id':[1,2,3], 'activity':['A1', 'A2', 'A2'], 'prep_hours':[None,None,1], 'delivery_hours':[10,10,15]})

I want to create a column total_hours which is the sum of all columns matching the pattern *_hours

For the time being, I simply add the columns I want into a new column:

df.fillna(0, inplace=True) df['total_hours'] = df['prep_hours'] + df['delivery_hours']

But it does not scale easily. For the sake of example, I only have 2 columns called *_hours but in the real DataFrame, it contains more than 30 columns that should be added.

Is there a smarter way to do it?

jezrael · Accepted Answer · 2020-04-02 09:54:24Z

Use DataFrame.filter with like parameter and sum, convert missing values to 0 is not necessary:

df["total_hours"] = df.filter(like='_hours').sum(axis=1) print (df) id activity prep_hours delivery_hours total_hours 0 1 A1 NaN 10 10.0 1 2 A2 NaN 10 10.0 2 3 A2 1.0 15 16.0

Dominique Paul · Accepted Answer · 2020-04-02 09:53:44Z

You could select a subset of your original dataframe using a list_comprehension and add content horizontally like this:

columns = [col for col in df.columns if "_hours" in col] df["total_hours"] = df[columns].sum(axis=1)

If the pattern you are looking for is more complicated you could use regex matching as well :)

Thanks for the answer. I accepted the other answer because: 1. it is substantially shorter and @jezrael answered before you. However, I really like your answer because it seems more reusable in other contexts.

Collectives™ on Stack Overflow

Summing given columns by row in DataFrame

2 Answers 2

Comments

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Related