I am trying to combine two dataframes which both contain a column of repeated values but not the same number of repeats.
import pandas as pd df1 = pd.DataFrame({'col1':[1, 1, 2, 2, 3, 3, 3], 'col2':[1.1, 1.3, 2.1, 2.3, 3.1, 3.3, 3.5]}) df2 = pd.DataFrame({'col1':[1, 2, 2, 3, 3, 3], 'col2':[1.2, 2.2, 2.4, 3.2, 3.4, 3.6]}) df1 col1 col2 0 1 1.1 1 1 1.3 2 2 2.1 3 2 2.3 4 3 3.1 5 3 3.3 6 3 3.5 df2 col1 col2 0 1 1.2 1 2 2.2 2 2 2.4 3 3 3.2 4 3 3.4 5 3 3.6 The desired output would be for example:
desired_result = pd.DataFrame({'col1': [1, 1, 2, 2, 3, 3, 3], 'col2_x':[1.1, 1.3, 2.1, 2.3, 3.1, 3.3, 3.5], 'col2_y':[1.2, 'NaN' , 2.2, 2.4, 3.2, 3.4, 3.6]}) desired_result col1 col2_x col2_y 0 1 1.1 1.2 1 1 1.3 NaN 2 2 2.1 2.2 3 2 2.3 2.4 4 3 3.1 3.2 5 3 3.3 3.4 6 3 3.5 3.6 The problem is the ambiguity in how to combine the two dataframes on col1 which contains repeated values and a direct matching is not possible (and also not necessary).