2

I try to merge multiple new dataFrames in a main one. Suppose main dataframe:

 key1 key2 0 0.365803 0.259112 1 0.086869 0.589834 2 0.269619 0.183644 3 0.755826 0.045187 4 0.204009 0.669371 

And I try to merge the 2 following datasets within the main one,
New data1:

 key1 key2 new feature 0 0.365803 0.259112 info1 

New data2:

 key1 key2 new feature 0 0.204009 0.669371 info2 

Expected result:

 key1 key2 new feature 0 0.365803 0.259112 info1 1 0.776945 0.780978 NaN 2 0.275891 0.114998 NaN 3 0.667057 0.373029 NaN 4 0.204009 0.669371 info2 

What I tried:

test = test.merge(data1, left_on=['key1', 'key2'], right_on=['key1', 'key2'], how='left') test = test.merge(data2, left_on=['key1', 'key2'], right_on=['key1', 'key2'], how='left') 

Works well for the first one, but not for the second, the result I get:

 key1 key2 new feature_x new feature_y 0 0.365803 0.259112 info1 NaN 1 0.776945 0.780978 NaN NaN 2 0.275891 0.114998 NaN NaN 3 0.667057 0.373029 NaN NaN 4 0.204009 0.669371 NaN info2 

Thanks for your help!

3 Answers 3

2

First append or concat both DataFrames together and then merge:

dat = pd.concat([data1, data2], ignore_index=True) 

Or:

dat = data1.append(data2, ignore_index=True) print (dat) key1 key2 new feature 0 0.365803 0.259112 info1 1 0.204009 0.669371 info2 

#if same joined columns names better is only on parameter df = test.merge(dat, on=['key1', 'key2'], how='left') print (df) key1 key2 new feature 0 0.365803 0.259112 info1 1 0.086869 0.589834 NaN 2 0.269619 0.183644 NaN 3 0.755826 0.045187 NaN 4 0.204009 0.669371 info2 
Sign up to request clarification or add additional context in comments.

Comments

0

You can use pd.DataFrame.update instead:

# create new column and set index res = test.assign(newfeature=None).set_index(['key1', 'key2']) # update with new data sequentially res.update(data1.set_index(['key1', 'key2'])) res.update(data2.set_index(['key1', 'key2'])) # reset index to recover columns res = res.reset_index() print(res) key1 key2 newfeature 0 0.365803 0.259112 info1 1 0.086869 0.589834 None 2 0.269619 0.183644 None 3 0.755826 0.045187 None 4 0.204009 0.669371 info2 

Comments

0

You can also set the data frames to the same index and use simple loc

df = df.set_index(["key1", "key2"]) df2 = df2.set_index(["key1", "key2"]) 

Then

df.loc[:, "new_feature"] = df2['new_feature'] 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.