4

Let's say, I have the below data frame as sample.

 name age status price 0 frank 12 1 100 1 jack 33 0 190 2 joe 22 1 200 ****************************** 

Desired output:

 name age status price 0 frank 12 1 100 1 jack 33 0 190 2 jack NaN 0 -190 3 joe 22 1 200 ****************************** 

I also posted the sample data frame below, so you can test it easily.

df1 = pd.DataFrame({ "name":["frank", "jack", "joe"], "age": [12, 33, 22], "status": [1,0, 1], "price": [100,190, 200] }) 

As you can see, I want to insert a new row based on the above row, which status is 0, it's meant to be a transaction failure. to statistics more easily, I want to generate a new row below that one. I also want its price to be negative number. and since I don't care about whole column of the new row, so I want the other column be NaN, just like 'age' here in my desired output.

What I've tried so far.

import pandas as pd import numpy as np df1 = pd.DataFrame({ "name":["frank", "jack", "joe"], "age": [12, 33, 22], "status": [1,0, 1], "price": [100,190, 200] }) df2_list = [] for i, row in df1.iterrows(): if row["status"] == 0: origin_row = row.to_dict() new_row = ({ "name": origin_row.get("name"), #"age": origin_row.get("age"), "age": np.NaN, "status": origin_row.get("status"), "price": -origin_row.get("price"), }) df2_list.append(new_row) df2 = pd.DataFrame(df2_list) # concat df1 and df2 and sort it . df3 = pd.concat([df1, df2], ignore_index=True) df4 = df3.sort_values(['name', 'price'], ascending=[True, False]) print(df4) 

I have a loop, and check if it hit my condition status==0, and append it on my tmp list, and ...but it's too many code. I want to know is there any good way, I mean more Pythonic code or pandas has already got some function can it ?

2
  • 1
    Are names unique? Commented Feb 28, 2019 at 3:05
  • @coldspeed yes sir Commented Feb 28, 2019 at 3:05

1 Answer 1

3

Use numpy.repeat to add rows, and Series.duplicated to set the price.

df2 = pd.DataFrame(df.values.repeat(df.status.eq(0)+1, axis=0), columns=df.columns) df2.loc[df2.name.duplicated(), 'price'] *= -1 df2 name age status price 0 frank 12 1 100 1 jack 33 0 190 2 jack 33 0 -190 3 joe 22 1 200 

If you need to mask NaNs in the age column as well, you can do that with Series.mask.

df2.age.mask(df2.name.duplicated()) 0 12 1 33 2 NaN 3 22 Name: age, dtype: object 

Full code.

df2 = pd.DataFrame(df.values.repeat(df.status.eq(0)+1, axis=0), columns=df.columns) isdup = df2.name.duplicated() df2.loc[isdup, 'price'] *= -1 df2['age'] = df2['age'].mask(isdup) df2 name age status price 0 frank 12 1 100 1 jack 33 0 190 2 jack NaN 0 -190 3 joe 22 1 200 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.