2 dataframes, same number of columns, different number of rows comparing and replacing values [duplicate]

Question

i think this is an easy question and I know where to look, using merge, join, loc, iloc or 1 of these functions but did not figure it out yet. Here a simplistic example what I want to do. df1 and df2 have the same columns but a different number of rows. Now I want to find rows where the column "t1" is the same for both dataframes and then replace the values in column "c1" of df1 with the values of column "c1" of df2 (so where their t1 value are the same). I also tried functions where and replace but I am pretty sure I need merge or join. Thank you.

import pandas as pd df1 = pd.DataFrame() df2 = pd.DataFrame() # close price df1.at[0,"c1"]=0 df1.at[1,"c1"]=0 df1.at[2,"c1"]=0 df1.at[3,"c1"]=0 df1.at[4,"c1"]=0 df1.at[5,"c1"]=0 df1.at[6,"c1"]=0 df1.at[7,"c1"]=0 df2.at[0,"c1"]=20 df2.at[1,"c1"]=26 df2.at[3,"c1"]=23 df2.at[4,"c1"]=21 # time stamp df1.at[0,"t1"]=3 df1.at[1,"t1"]=4 df1.at[2,"t1"]=5 df1.at[3,"t1"]=6 df1.at[4,"t1"]=7 df1.at[5,"t1"]=8 df1.at[6,"t1"]=9 df1.at[7,"t1"]=10 df2.at[0,"t1"]=5 df2.at[1,"t1"]=6 df2.at[3,"t1"]=7 df2.at[4,"t1"]=8 They look like: >>> df1 c1 t1 0 0.0 3.0 1 0.0 4.0 2 0.0 5.0 3 0.0 6.0 4 0.0 7.0 5 0.0 8.0 6 0.0 9.0 7 0.0 10.0 >>> df2 c1 t1 0 20.0 5.0 1 26.0 6.0 3 23.0 7.0 4 21.0 8.0

So I want df1 to look like the frame shown below. At the rows where the value for "t1" is the same for both df1 and df2 I want to replace the values in column "c1" in df1 with the values from df2.

>>> df1 c1 t1 0 0.0 3.0 1 0.0 4.0 2 20.0 5.0 3 26.0 6.0 4 23.0 7.0 5 21.0 8.0 6 0.0 9.0 7 0.0 10.0

T C Molenaar · Accepted Answer · 2022-10-10 09:40:53Z

You can use pd.merge for this:

df1 = df1.merge(df2, on=['t1'], how='left')

Which results in:

 c1_x t1 c1_y 0 0.0 3.0 NaN 1 0.0 4.0 NaN 2 0.0 5.0 20.0 3 0.0 6.0 26.0 4 0.0 7.0 23.0 5 0.0 8.0 21.0 6 0.0 9.0 NaN 7 0.0 10.0 NaN

It adds a new column c1_y which are the merged values from df2. To create the desired output we only need to do the following:

df1['c1'] = df1.c1_y.fillna(df1.c1_x) df1 = df1[['c1', 't1']]

Output:

 c1 t1 0 0.0 3.0 1 0.0 4.0 2 20.0 5.0 3 26.0 6.0 4 23.0 7.0 5 21.0 8.0 6 0.0 9.0 7 0.0 10.0

i wanted to post a reply with an example but seems I can only add small comments. But it works even better than I hoped for since I was making this "dummy" column "c1" in df1 filled with zeros but this is not necessary. Too bad I can not add my expanded example. I tested on my real data en it works great

flyakite · Accepted Answer · 2022-10-10 09:53:40Z

Simple use merge:

res = pd.merge(df1, df2, on='t1', how='outer') df1['t1'] = res['c1_y'].fillna(df1['t1']) print(df1) ###output: ### c1 t1 ###0 0.0 3.0 ###1 0.0 4.0 ###2 0.0 20.0 ###3 0.0 26.0 ###4 0.0 23.0 ###5 0.0 21.0 ###6 0.0 9.0 ###7 0.0 10.0

Collectives™ on Stack Overflow

2 dataframes, same number of columns, different number of rows comparing and replacing values [duplicate]

2 Answers 2

2 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Linked

Related