2

Please refer following dataframes

I want to get mismatching rows in col2 after matching col1 of both dataframes I am trying following but it's not producing result because seems the Dataframe join is forming Cartesian

val dfs = Seq((1,1),(1,2),(1,3),(2,6)).toDF("col1","col2") val dft = Seq((1,1),(1,2),(1,4)).toDF("col1","col2") dfs.join(dft,"col1").filter(dfs("col2").notEqual(dft("col2"))).show 

In above case I expect the join & filter to return result (1,3) But seems it's joining every row of col1 in dfs to every row in col1 on dft thus producing unwanted result

Is the Cartesion as following normal behaviour for Dataframe join or I am missing some setting? how can I get (1,3) as output?

scala> dft.join(dfs,dft("col1")===dfs("col1")).show +----+----+----+----+ |col1|col2|col1|col2| +----+----+----+----+ | 1| 1| 1| 3| | 1| 1| 1| 2| | 1| 1| 1| 1| | 1| 2| 1| 3| | 1| 2| 1| 2| | 1| 2| 1| 1| | 1| 4| 1| 3| | 1| 4| 1| 2| | 1| 4| 1| 1| +----+----+----+----+ 

Thanks chetab

2
  • Using "leftouter" is taking close but again for removing extra records in "col1" use of except or intersect I guess would be costlier dfs.join(dft,dfs("col1")===dft("col1") && dfs("col2")===dft("col2"),"leftouter") +----+----+----+----+ |col1|col2|col1|col2| +----+----+----+----+ | 1| 1| 1| 1| | 1| 2| 1| 2| | 1| 3|null|null| | 2| 6|null|null| +----+----+----+----+ Commented Oct 29, 2016 at 17:33
  • did you resolve this issue, i'm also looking for the same solution. Commented Dec 6, 2017 at 19:52

1 Answer 1

1

This is not Cartesian product. You join by col1 so output contains all combinations of row with matching col1. Result is correct.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.