how to add new row to pandas dataframe based on other columns?

Question

Let's say, I have the below data frame as sample.

 name age status price 0 frank 12 1 100 1 jack 33 0 190 2 joe 22 1 200 ******************************

Desired output:

 name age status price 0 frank 12 1 100 1 jack 33 0 190 2 jack NaN 0 -190 3 joe 22 1 200 ******************************

I also posted the sample data frame below, so you can test it easily.

df1 = pd.DataFrame({ "name":["frank", "jack", "joe"], "age": [12, 33, 22], "status": [1,0, 1], "price": [100,190, 200] })

As you can see, I want to insert a new row based on the above row, which status is 0, it's meant to be a transaction failure. to statistics more easily, I want to generate a new row below that one. I also want its price to be negative number. and since I don't care about whole column of the new row, so I want the other column be NaN, just like 'age' here in my desired output.

What I've tried so far.

import pandas as pd import numpy as np df1 = pd.DataFrame({ "name":["frank", "jack", "joe"], "age": [12, 33, 22], "status": [1,0, 1], "price": [100,190, 200] }) df2_list = [] for i, row in df1.iterrows(): if row["status"] == 0: origin_row = row.to_dict() new_row = ({ "name": origin_row.get("name"), #"age": origin_row.get("age"), "age": np.NaN, "status": origin_row.get("status"), "price": -origin_row.get("price"), }) df2_list.append(new_row) df2 = pd.DataFrame(df2_list) # concat df1 and df2 and sort it . df3 = pd.concat([df1, df2], ignore_index=True) df4 = df3.sort_values(['name', 'price'], ascending=[True, False]) print(df4)

I have a loop, and check if it hit my condition status==0, and append it on my tmp list, and ...but it's too many code. I want to know is there any good way, I mean more Pythonic code or pandas has already got some function can it ?

Are names unique?

cs95
– cs95

2019-02-28 03:05:15 +00:00
Commented Feb 28, 2019 at 3:05 — cs95
– cs95, Commented Feb 28, 2019 at 3:05
@coldspeed yes sir

Frank AK
– Frank AK

2019-02-28 03:05:42 +00:00
Commented Feb 28, 2019 at 3:05 — Frank AK
– Frank AK, Commented Feb 28, 2019 at 3:05

cs95 · Accepted Answer · 2019-02-28 03:08:37Z

Use numpy.repeat to add rows, and Series.duplicated to set the price.

df2 = pd.DataFrame(df.values.repeat(df.status.eq(0)+1, axis=0), columns=df.columns) df2.loc[df2.name.duplicated(), 'price'] *= -1 df2 name age status price 0 frank 12 1 100 1 jack 33 0 190 2 jack 33 0 -190 3 joe 22 1 200

If you need to mask NaNs in the age column as well, you can do that with Series.mask.

df2.age.mask(df2.name.duplicated()) 0 12 1 33 2 NaN 3 22 Name: age, dtype: object

Full code.

df2 = pd.DataFrame(df.values.repeat(df.status.eq(0)+1, axis=0), columns=df.columns) isdup = df2.name.duplicated() df2.loc[isdup, 'price'] *= -1 df2['age'] = df2['age'].mask(isdup) df2 name age status price 0 frank 12 1 100 1 jack 33 0 190 2 jack NaN 0 -190 3 joe 22 1 200

Collectives™ on Stack Overflow

how to add new row to pandas dataframe based on other columns?

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related