1

I am trying to join two dataframes by a buying group ID key. The first dataframe contains the buying group code, customer ID and start/end dates for each customer on a buying group.
The head looks like:

Buying_group, Customer_ID, BG_Start_Dt, BG_End_Dt 123 0001 2020-1-1 2025-1-1 123 0002 2021-1-1 2024-1-1 132 0033 2019-1-1 2023-1-1 132 0044 2019-1-1 2023-1-1 

The other contains the buying group code, a contract number, customer ID and start/end dates for the contract.
The head looks like:

Buying_group, Contract, ContractStDt, ContractEnDt, Customer_ID 123 1 2020-1-1 2025-1-1 0001 123 2 2021-1-1 2024-1-1 0002 156 4 2019-1-1 2023-1-1 0003 156 4 2019-1-1 2023-1-1 0004 

I am trying to join each df so I have a master df that combines the buying group dates from the first df. The problem I am running in to is when I do an outer or inner join of the two df's by the Buying_group key, that the resulting df has two columns for Customer ID:

Customer_IDx, Customer_IDy

And the dataframe looks like this:

Buying_group, Contract, ContractStDt, ContractEnDt, BG_Start_Dt, BG_End_Dt Customer_IDx, Customer_IDy 123 1 2020-1-1 2025-1-1 2020-1-1 2025-1-1 0001 0001 123 2 2021-1-1 2024-1-1 2021-1-1 2024-1-1 0002 0002 156 4 2019-1-1 2023-1-1 2022-1-1 2029-1-1 null 0003 156 4 2019-1-1 2023-1-1 2022-1-1 2029-1-1 null 0004 132 5 2019-1-1 2023-1-1 2019-1-1 2023-1-1 0033 null 132 5 2019-1-1 2023-1-1 2019-1-1 2023-1-1 0044 null 

The problem here is that each row corresponds to one unique contract for each customer and the merged dataframe needs to have a row for each unique customer ID. I'm really new to these SQL type joins and am wondering if there is a way to do this. Combine both customer ID columns for each joining DF into one column while preserving their unique rows. I tried doing both inner and outer joins but its the same result.

The ideal dataframe should look like this:

Buying_group, Contract, ContractStDt, ContractEnDt, BG_Start_Dt, BG_End_Dt, Customer_ID 123 1 2020-1-1 2025-1-1 2020-1-1 2025-1-1 0001 123 2 2021-1-1 2024-1-1 2021-1-1 2024-1-1 0002 156 4 2019-1-1 2023-1-1 2022-1-1 2029-1-1 0003 156 4 2019-1-1 2023-1-1 2022-1-1 2029-1-1 0004 132 5 2019-1-1 2023-1-1 2019-1-1 2023-1-1 0033 132 5 2019-1-1 2023-1-1 2019-1-1 2023-1-1 0044 
3
  • Please provide a sample of the dataframes so that it would be easier to resolve. Commented May 17, 2021 at 19:05
  • Done, added better formatting to make the problem easier to read. Commented May 17, 2021 at 19:39
  • 1
    i think you are merging with two columns. try this result=two.merge(one,on=['Buying_group','Customer_ID'],how='left') columns_order=['Buying_group', 'Contract', 'ContractStDt', 'ContractEnDt', 'BG_Start_Dt', 'BG_End_Dt','Customer_ID'] result = result.reindex(columns=columns_order) Commented May 17, 2021 at 20:16

1 Answer 1

1

Try this

df_3 = pd.concat([df1,df2]).drop_duplicates(subset=['Buying_group','Customer_ID'], keep="last").reset_index(drop=True) df_3 

Output

 Buying_group Customer_ID BG_Start_Dt BG_End_Dt contract 0 132 00033 2019-1-1 2023-1-1 NaN 1 132 00044 2019-1-1 2023-1-1 NaN 2 123 0001 2020-1-1 2025-1-1 1.0 3 123 0002 2021-1-1 2024-1-1 2.0 4 156 0003 2019-1-1 2023-1-1 4.0 5 156 0004 2019-1-1 2023-1-1 4.0 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.