Create new column from specific rows in pandas dataframe

Question

I have a csv file where each row represents a property followed by a variable number of subsequent rows that reflect rooms in the property. I want to create a column that, for each property, summates the gross floor area of each room. The unstructured nature of the data is making this difficult to achieve in pandas. Here is an example of the table I have at the moment:

id ba store_desc floor_area 0 1 Toy Shop NaN 1 2 Retail Zone A 29.42 2 2 Retail Zone B 31.29 3 1 Grocery Store NaN 4 2 Retail Zone A 68.00 5 2 Outside Garden 83.50 6 2 Office 7.30

Here is the table I am trying to create:

id ba store_desc floor_area gross_floor_area 0 1 Toy Shop NaN 60.71 3 1 Grocery Store NaN 158.8

Does anybody have any pointers on how to achieve this result? I'm totally lost.

Sam

BENY · Accepted Answer · 2017-10-25 21:56:37Z

IIUC

df1=df[df['floor_area'].isnull()] df1['gross_floor_area']=df.groupby(df['floor_area'].isnull().cumsum())['floor_area'].sum().values df1 Out[463]: id ba store_desc floor_area gross_floor_area 0 0 1 ToyShop NaN 60.71 3 3 1 GroceryStore NaN 158.80

Nathan H · Accepted Answer · 2017-10-25 22:06:11Z

First made a temporary column named category which I then forward filled, grouped by that column to get the sum, and then mapped that back to the relevant store_desc values.

df['category'] = df[df.floor_area.isnull()]['store_desc'] df['category'].fillna(method='ffill',inplace=True) df['gross_floor_area'] = df.store_desc.map(df.groupby('category').sum().floor_area) df.drop('category',axis=1,inplace=True) df[df.gross_floor_area.notnull()]

Collectives™ on Stack Overflow

Create new column from specific rows in pandas dataframe

2 Answers 2

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Linked

Related