10

I have two data frame like below

+--------------------+--------+-----------+-------------+ |UniqueFundamentalSet|Taxonomy|FFAction|!||DataPartition| +--------------------+--------+-----------+-------------+ |192730241374 |1 |I|!| |Japan | |192730241374 |2 |I|!| |Japan | |192730241373 |1 |I|!| |Japan | |192730241373 |2 |I|!| |Japan | +--------------------+--------+-----------+-------------+ +--------------------+--------+-----------+-------------+ |UniqueFundamentalSet|Taxonomy|FFAction|!||DataPartition| +--------------------+--------+-----------+-------------+ |192730241374 |1 |I|!| |Japan | |192730241374 |2 |I|!| |Japan | |192730391384 |1 |I|!| |Japan | |192730391384 |2 |I|!| |Japan | |192730241373 |1 |I|!| |Japan | |192730241373 |2 |I|!| |Japan | +--------------------+--------+-----------+-------------+ 

When i perform union between above data frame i get duplicate rows . Here is my output

+--------------------+--------+-----------+-------------+ |UniqueFundamentalSet|Taxonomy|FFAction|!||DataPartition| +--------------------+--------+-----------+-------------+ |192730241374 |1 |I|!| |Japan | |192730241374 |2 |I|!| |Japan | |192730241373 |1 |I|!| |Japan | |192730241373 |2 |I|!| |Japan | |192730241374 |1 |I|!| |Japan | |192730241374 |2 |I|!| |Japan | |192730391384 |1 |I|!| |Japan | |192730391384 |2 |I|!| |Japan | |192730241373 |1 |I|!| |Japan | |192730241373 |2 |I|!| |Japan | +--------------------+--------+-----------+-------------+ val dfToSave = dfMainOutput.union(insertdf) 

I was in a impression that union removes duplicate rows and unionall keeps it. I have to use distinct after union . Can some one please explain this .

1 Answer 1

15

Your impression was wrong. As stated in the official documentation:

Returns a new Dataset containing union of rows in this Dataset and another Dataset>.

This is equivalent to UNION ALL in SQL. To do a SQL-style set union (that does deduplication of elements), use this function followed by a distinct.

Also as standard in SQL, this function resolves columns by position (not by name):

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.