Consider 2 Dataframes and need to use joining of 2 dataframes by 2 unique columns (idA, idB) and compute sum of their col Distance . By the way (idA,idB) is equal to (idB,idA), so their Distance has to be summed
In [1]: df1 = pd.DataFrame({'idA': ['1', '2', '3', '2'], ...: 'idB': ['1', '4', '8', '1'], ...: 'Distance': ['0.727273', '0.827273', '0.127273', '0.927273']}, ...: index=[0, 1, 2, 3]) ...: In [2]: df2 = pd.DataFrame({'idA': ['1', '5', '2', '5'], ...: 'idB': ['2', '1', '4', '7'], ...: 'Distance': ['0.11', '0.1', '3.0', '0.8']}, ...: index=[4, 5, 6, 7]) The output has to be this way:
Sum_Distance idA idB 0 0.727273 1 1 1 3.827273 2 4 <-- 2,4 = 3.0 + 2,4 = 0.827273 2 0.127273 3 8 3 1.037273 2 1 <-- 2,1 = 0.927273 + 1,2 = 0.11 4 0.1 5 1 5 0.8 5 7 Help find the way how to do it using Pandas/Spark.