I have two data frame like below
+--------------------+--------+-----------+-------------+ |UniqueFundamentalSet|Taxonomy|FFAction|!||DataPartition| +--------------------+--------+-----------+-------------+ |192730241374 |1 |I|!| |Japan | |192730241374 |2 |I|!| |Japan | |192730241373 |1 |I|!| |Japan | |192730241373 |2 |I|!| |Japan | +--------------------+--------+-----------+-------------+ +--------------------+--------+-----------+-------------+ |UniqueFundamentalSet|Taxonomy|FFAction|!||DataPartition| +--------------------+--------+-----------+-------------+ |192730241374 |1 |I|!| |Japan | |192730241374 |2 |I|!| |Japan | |192730391384 |1 |I|!| |Japan | |192730391384 |2 |I|!| |Japan | |192730241373 |1 |I|!| |Japan | |192730241373 |2 |I|!| |Japan | +--------------------+--------+-----------+-------------+ When i perform union between above data frame i get duplicate rows . Here is my output
+--------------------+--------+-----------+-------------+ |UniqueFundamentalSet|Taxonomy|FFAction|!||DataPartition| +--------------------+--------+-----------+-------------+ |192730241374 |1 |I|!| |Japan | |192730241374 |2 |I|!| |Japan | |192730241373 |1 |I|!| |Japan | |192730241373 |2 |I|!| |Japan | |192730241374 |1 |I|!| |Japan | |192730241374 |2 |I|!| |Japan | |192730391384 |1 |I|!| |Japan | |192730391384 |2 |I|!| |Japan | |192730241373 |1 |I|!| |Japan | |192730241373 |2 |I|!| |Japan | +--------------------+--------+-----------+-------------+ val dfToSave = dfMainOutput.union(insertdf) I was in a impression that union removes duplicate rows and unionall keeps it. I have to use distinct after union . Can some one please explain this .