3

I have a DataFrame defined by:

df = pd.DataFrame({ 'id':[1,2,3], 'activity':['A1', 'A2', 'A2'], 'prep_hours':[None,None,1], 'delivery_hours':[10,10,15]}) 

I want to create a column total_hours which is the sum of all columns matching the pattern *_hours

For the time being, I simply add the columns I want into a new column:

df.fillna(0, inplace=True) df['total_hours'] = df['prep_hours'] + df['delivery_hours'] 

But it does not scale easily. For the sake of example, I only have 2 columns called *_hours but in the real DataFrame, it contains more than 30 columns that should be added.

Is there a smarter way to do it?

2 Answers 2

4

Use DataFrame.filter with like parameter and sum, convert missing values to 0 is not necessary:

df["total_hours"] = df.filter(like='_hours').sum(axis=1) print (df) id activity prep_hours delivery_hours total_hours 0 1 A1 NaN 10 10.0 1 2 A2 NaN 10 10.0 2 3 A2 1.0 15 16.0 
Sign up to request clarification or add additional context in comments.

Comments

1

You could select a subset of your original dataframe using a list_comprehension and add content horizontally like this:

columns = [col for col in df.columns if "_hours" in col] df["total_hours"] = df[columns].sum(axis=1) 

If the pattern you are looking for is more complicated you could use regex matching as well :)

2 Comments

Thanks for the answer. I accepted the other answer because: 1. it is substantially shorter and @jezrael answered before you. However, I really like your answer because it seems more reusable in other contexts.
No worries! Very nice of you to take the time to comment.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.