0

Hi everyone! I have two different dataframes which I need to merge but not completely in order to get the third one:

The value of reference is the column [1] in both DF1 and DF2, the new column in DF1 must have the values of column[3] of DF2 to create DF3 but only the ones corresponding to the values of column [1] of DF1, how can I achieve this? I've tried with merge and melt but i doesn't work since I don't know how to consider the different lengths in the reference columns for the match.

DF1

 0 1 2 3 4 5 6 0 12345678 40 10.610,000 1294822 22345679 HCTFCILE 16000 1 12345678 100 8.196,001 1294822 22345679 HCTFCILE 10000 2 12345678 110 1.062,000 1294822 22345679 HCTFCILE 1000 3 12345678 130 2.850,000 1294822 22345679 HCTFCILE 12000 

DF2

 0 1 2 3 0 1294822 10 DM 13500 1 1294822 20 DM 33500 2 1294822 30 DM 18300 3 1294822 40 DM 22200 4 1294822 90 DM 16200 5 1294822 100 DM 24500 6 1294822 110 DM 27800 7 1294822 120 DM 15500 8 1294822 130 DM 13400 

Expected Result DF3:

 0 1 2 3 4 5 6 7 0 12345678 40 10.610,000 1294822 22345679 HCTFCILE 16000 22200 1 12345678 100 8.196,001 1294822 22345679 HCTFCILE 10000 24500 2 12345678 110 1.062,000 1294822 22345679 HCTFCILE 1000 27800 3 12345678 130 2.850,000 1294822 22345679 HCTFCILE 12000 13400 

Thank you for your help in advance. (:

1
  • try this 1 liner. df1['6']=df2['3'][df1['3'] == df2['1']] Commented Jul 11, 2022 at 23:39

1 Answer 1

0

First get the DF2 column 1 values that are in DF1 column 1 by using the 'isin' function like so:

DF3 = DF2[DF2[1].isin(DF[1].values)] 

Then as long as DF and DF3 are sorted by column one in the same order, we can reindex DF3

DF3 = DF3.reset_index(drop=True) 

then concatenate DF with DF3

DF3 = pd.concat([DF, DF3[3]], axis=1) 

Here is the complete code:

import pandas as pd DF = pd.DataFrame([[12345678, 40, 10.610000, 1294822, 22345679, 'HCTFCILE', 16000], [12345678, 100, 8.196001, 1294822, 22345679, 'HCTFCILE', 10000], [12345678, 110, 1.062000, 1294822, 22345679, 'HCTFCILE', 1000], [12345678, 130, 2.850000, 1294822, 22345679, 'HCTFCILE', 12000]]) DF2 = pd.DataFrame([[1294822, 10, 'DM', 13500], [1294822, 20, 'DM', 33500], [1294822, 30, 'DM', 18300], [1294822, 40, 'DM', 22200], [1294822, 90, 'DM', 16200], [1294822, 100, 'DM', 24500], [1294822, 110, 'DM', 27800], [1294822, 120, 'DM', 15500], [1294822, 130, 'DM', 13400]]) DF3 = DF2[DF2[1].isin(DF[1].values)] DF3 = DF3.reset_index(drop=True) DF3.columns = list(range(8)) DF3 = pd.concat([DF, DF3[3]], axis=1) 

output:

 0 1 2 3 4 5 6 7 0 12345678 40 10.610000 1294822 22345679 HCTFCILE 16000 22200 1 12345678 100 8.196001 1294822 22345679 HCTFCILE 10000 24500 2 12345678 110 1.062000 1294822 22345679 HCTFCILE 1000 27800 3 12345678 130 2.850000 1294822 22345679 HCTFCILE 12000 13400 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.