Joining Two Dataframes while combining two identical columns into one

Question

I am trying to join two dataframes by a buying group ID key. The first dataframe contains the buying group code, customer ID and start/end dates for each customer on a buying group.
The head looks like:

Buying_group, Customer_ID, BG_Start_Dt, BG_End_Dt 123 0001 2020-1-1 2025-1-1 123 0002 2021-1-1 2024-1-1 132 0033 2019-1-1 2023-1-1 132 0044 2019-1-1 2023-1-1

The other contains the buying group code, a contract number, customer ID and start/end dates for the contract.
The head looks like:

Buying_group, Contract, ContractStDt, ContractEnDt, Customer_ID 123 1 2020-1-1 2025-1-1 0001 123 2 2021-1-1 2024-1-1 0002 156 4 2019-1-1 2023-1-1 0003 156 4 2019-1-1 2023-1-1 0004

I am trying to join each df so I have a master df that combines the buying group dates from the first df. The problem I am running in to is when I do an outer or inner join of the two df's by the Buying_group key, that the resulting df has two columns for Customer ID:

Customer_IDx, Customer_IDy

And the dataframe looks like this:

Buying_group, Contract, ContractStDt, ContractEnDt, BG_Start_Dt, BG_End_Dt Customer_IDx, Customer_IDy 123 1 2020-1-1 2025-1-1 2020-1-1 2025-1-1 0001 0001 123 2 2021-1-1 2024-1-1 2021-1-1 2024-1-1 0002 0002 156 4 2019-1-1 2023-1-1 2022-1-1 2029-1-1 null 0003 156 4 2019-1-1 2023-1-1 2022-1-1 2029-1-1 null 0004 132 5 2019-1-1 2023-1-1 2019-1-1 2023-1-1 0033 null 132 5 2019-1-1 2023-1-1 2019-1-1 2023-1-1 0044 null

The problem here is that each row corresponds to one unique contract for each customer and the merged dataframe needs to have a row for each unique customer ID. I'm really new to these SQL type joins and am wondering if there is a way to do this. Combine both customer ID columns for each joining DF into one column while preserving their unique rows. I tried doing both inner and outer joins but its the same result.

The ideal dataframe should look like this:

Buying_group, Contract, ContractStDt, ContractEnDt, BG_Start_Dt, BG_End_Dt, Customer_ID 123 1 2020-1-1 2025-1-1 2020-1-1 2025-1-1 0001 123 2 2021-1-1 2024-1-1 2021-1-1 2024-1-1 0002 156 4 2019-1-1 2023-1-1 2022-1-1 2029-1-1 0003 156 4 2019-1-1 2023-1-1 2022-1-1 2029-1-1 0004 132 5 2019-1-1 2023-1-1 2019-1-1 2023-1-1 0033 132 5 2019-1-1 2023-1-1 2019-1-1 2023-1-1 0044

Please provide a sample of the dataframes so that it would be easier to resolve. — Rishin Rahim
– Rishin Rahim, Commented May 17, 2021 at 19:05
Done, added better formatting to make the problem easier to read. — Austin Rock
– Austin Rock, Commented May 17, 2021 at 19:39
i think you are merging with two columns. try this result=two.merge(one,on=['Buying_group','Customer_ID'],how='left') columns_order=['Buying_group', 'Contract', 'ContractStDt', 'ContractEnDt', 'BG_Start_Dt', 'BG_End_Dt','Customer_ID'] result = result.reindex(columns=columns_order) — simpleApp
– simpleApp, Commented May 17, 2021 at 20:16

Rishin Rahim · Accepted Answer · 2021-05-17 19:54:37Z

Try this

df_3 = pd.concat([df1,df2]).drop_duplicates(subset=['Buying_group','Customer_ID'], keep="last").reset_index(drop=True) df_3

Output

 Buying_group Customer_ID BG_Start_Dt BG_End_Dt contract 0 132 00033 2019-1-1 2023-1-1 NaN 1 132 00044 2019-1-1 2023-1-1 NaN 2 123 0001 2020-1-1 2025-1-1 1.0 3 123 0002 2021-1-1 2024-1-1 2.0 4 156 0003 2019-1-1 2023-1-1 4.0 5 156 0004 2019-1-1 2023-1-1 4.0

Collectives™ on Stack Overflow

Joining Two Dataframes while combining two identical columns into one

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related