how to add a column to a dataframe using another dataframe with columns of reference of different lengths python

Question

Hi everyone! I have two different dataframes which I need to merge but not completely in order to get the third one:

The value of reference is the column [1] in both DF1 and DF2, the new column in DF1 must have the values of column[3] of DF2 to create DF3 but only the ones corresponding to the values of column [1] of DF1, how can I achieve this? I've tried with merge and melt but i doesn't work since I don't know how to consider the different lengths in the reference columns for the match.

DF1

 0 1 2 3 4 5 6 0 12345678 40 10.610,000 1294822 22345679 HCTFCILE 16000 1 12345678 100 8.196,001 1294822 22345679 HCTFCILE 10000 2 12345678 110 1.062,000 1294822 22345679 HCTFCILE 1000 3 12345678 130 2.850,000 1294822 22345679 HCTFCILE 12000

DF2

 0 1 2 3 0 1294822 10 DM 13500 1 1294822 20 DM 33500 2 1294822 30 DM 18300 3 1294822 40 DM 22200 4 1294822 90 DM 16200 5 1294822 100 DM 24500 6 1294822 110 DM 27800 7 1294822 120 DM 15500 8 1294822 130 DM 13400

Expected Result DF3:

 0 1 2 3 4 5 6 7 0 12345678 40 10.610,000 1294822 22345679 HCTFCILE 16000 22200 1 12345678 100 8.196,001 1294822 22345679 HCTFCILE 10000 24500 2 12345678 110 1.062,000 1294822 22345679 HCTFCILE 1000 27800 3 12345678 130 2.850,000 1294822 22345679 HCTFCILE 12000 13400

Thank you for your help in advance. (:

try this 1 liner. df1['6']=df2['3'][df1['3'] == df2['1']]

smcrowley
– smcrowley

2022-07-11 23:39:31 +00:00
Commented Jul 11, 2022 at 23:39 — smcrowley
– smcrowley, Commented Jul 11, 2022 at 23:39

Daniel Illenberger · Accepted Answer · 2022-07-12 00:38:23Z

First get the DF2 column 1 values that are in DF1 column 1 by using the 'isin' function like so:

DF3 = DF2[DF2[1].isin(DF[1].values)]

Then as long as DF and DF3 are sorted by column one in the same order, we can reindex DF3

DF3 = DF3.reset_index(drop=True)

then concatenate DF with DF3

DF3 = pd.concat([DF, DF3[3]], axis=1)

Here is the complete code:

import pandas as pd DF = pd.DataFrame([[12345678, 40, 10.610000, 1294822, 22345679, 'HCTFCILE', 16000], [12345678, 100, 8.196001, 1294822, 22345679, 'HCTFCILE', 10000], [12345678, 110, 1.062000, 1294822, 22345679, 'HCTFCILE', 1000], [12345678, 130, 2.850000, 1294822, 22345679, 'HCTFCILE', 12000]]) DF2 = pd.DataFrame([[1294822, 10, 'DM', 13500], [1294822, 20, 'DM', 33500], [1294822, 30, 'DM', 18300], [1294822, 40, 'DM', 22200], [1294822, 90, 'DM', 16200], [1294822, 100, 'DM', 24500], [1294822, 110, 'DM', 27800], [1294822, 120, 'DM', 15500], [1294822, 130, 'DM', 13400]]) DF3 = DF2[DF2[1].isin(DF[1].values)] DF3 = DF3.reset_index(drop=True) DF3.columns = list(range(8)) DF3 = pd.concat([DF, DF3[3]], axis=1)

output:

 0 1 2 3 4 5 6 7 0 12345678 40 10.610000 1294822 22345679 HCTFCILE 16000 22200 1 12345678 100 8.196001 1294822 22345679 HCTFCILE 10000 24500 2 12345678 110 1.062000 1294822 22345679 HCTFCILE 1000 27800 3 12345678 130 2.850000 1294822 22345679 HCTFCILE 12000 13400

Collectives™ on Stack Overflow

how to add a column to a dataframe using another dataframe with columns of reference of different lengths python

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related