20

How do I perform a DISTINCT operation on a single column after a UNION is performed?

T1 -- ID Value 1 1 2 2 3 3 T2 -- ID Value 1 2 4 4 5 5 


I am trying to return the table:

ID Value 1 1 2 2 3 3 4 4 5 5 

I tried:

SELECT DISTINCT ID, Value FROM (SELECT*FROM T1 UNION SELECT*FROM T2) AS T3 

This does not seem to work.

2
  • 1
    "This does not seem to work." - in what way? Commented Jan 9, 2012 at 0:39
  • You are not giving us all the details, will the value always has to be the same as field 1, min value, max value, random value...Any way distinct is on all the fields, not just one field. Commented Jan 9, 2012 at 0:50

4 Answers 4

50

Why are you using a sub-query? This will work:

SELECT * FROM T1 UNION SELECT * FROM T2 

UNION removes duplicates. (UNION ALL does not)

Sign up to request clarification or add additional context in comments.

3 Comments

Point was, OP wanted something called "one-field DISTINCT", and there's no such a concept.
If you UNION records [1, 1] and [1, 2], you will get both in the result set. OP wanted no repeats from the first column. Obviously this answer was helpful to a lot of people, but I don't think it answers what was asked.
@user7733611 Actually, you're right now that I examine OP's example data. This query is the refactored equivalent of OP's query.
20

As far as I can say, there's no "one-column distinct": distinct is always applied to a whole record (unless used within an aggregate like count(distinct name)). The reason for this is, SQL cannot guess which values of Value to leave for you—and which to drop. That's something you need to define by yourself.

Try using GROUP BY to ensure ID is not repeated, and any aggregate (here MIN, as in your example it was the minimum that survived) to select a particular value of Value:

SELECT ID, min(Value) FROM (SELECT * FROM T1 UNION ALL SELECT * FROM T2) AS T3 GROUP BY ID 

Should be exactly what you need. That is, it's not the same query, and there's no distinct—but it's a query which would return what's shown in the example.

6 Comments

you sure that's the same query?
I'd suggest using UNION ALL in the subquery as there is no point in doing a DISTINCT twice.
@MitchWheat I'm sure it's not—but it's a query which would return what's shown in the example.
@MitchWheat: It isn't, but it'll do what the OP specifically said he wanted in his "I'm trying to return the table" table.
On that size data set, I'm not sure it's 100% valid.
|
6

I think that's what you meant:

SELECT * FROM T1 UNION SELECT * FROM T2 WHERE ( **ID ** NOT IN (SELECT ID FROM T1) ); 

1 Comment

I really think this should be the accepted answer to the question. It lets you prioritize which table gets values chosen from instead of doing a MIN() with a GROUP BY. Depends on how OP wanted to choose the Value.
4

This - even though this thread is way old - might be a working solution for the question of the OP, even though it might be considered dirty.

We select all tuples from the first table, then adding (union) it with the tuples from the second table limited to those that doe not have the specific field matched in the first table.

SELECT * FROM T1 UNION SELECT * FROM T2 WHERE ( Value NOT IN (SELECT Value FROM T1) ); 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.