0

I have a csv file where each row represents a property followed by a variable number of subsequent rows that reflect rooms in the property. I want to create a column that, for each property, summates the gross floor area of each room. The unstructured nature of the data is making this difficult to achieve in pandas. Here is an example of the table I have at the moment:

id ba store_desc floor_area 0 1 Toy Shop NaN 1 2 Retail Zone A 29.42 2 2 Retail Zone B 31.29 3 1 Grocery Store NaN 4 2 Retail Zone A 68.00 5 2 Outside Garden 83.50 6 2 Office 7.30 

Here is the table I am trying to create:

id ba store_desc floor_area gross_floor_area 0 1 Toy Shop NaN 60.71 3 1 Grocery Store NaN 158.8 

Does anybody have any pointers on how to achieve this result? I'm totally lost.

Sam

2 Answers 2

3

IIUC

df1=df[df['floor_area'].isnull()] df1['gross_floor_area']=df.groupby(df['floor_area'].isnull().cumsum())['floor_area'].sum().values df1 Out[463]: id ba store_desc floor_area gross_floor_area 0 0 1 ToyShop NaN 60.71 3 3 1 GroceryStore NaN 158.80 
Sign up to request clarification or add additional context in comments.

Comments

1

First made a temporary column named category which I then forward filled, grouped by that column to get the sum, and then mapped that back to the relevant store_desc values.

df['category'] = df[df.floor_area.isnull()]['store_desc'] df['category'].fillna(method='ffill',inplace=True) df['gross_floor_area'] = df.store_desc.map(df.groupby('category').sum().floor_area) df.drop('category',axis=1,inplace=True) df[df.gross_floor_area.notnull()] 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.