2

I have a query that returns a result set similar to the one below (in reality it is far bigger, thousands of rows):

 A | B | C | D -----|----|----|----- 1 NULL | d0 | d0 | NULL 2 NULL | d0 | d1 | NULL 3 NULL | d0 | d2 | a0 4 d0 | d1 | d1 | NULL 5 d0 | d2 | d2 | a0 

Two of the rows are considered duplicates, 1 and 2, because A, B and D are the same. To eliminate this, I could use SELECT DISTINCT A, B, D but then I do not get column C in my result set. Column C is necessary information for rows 3, 4 and 5.

So how do I come from the result set above to this one (the result appearing in C4 can also be NULL instead of d1):

 A | B | C | D -----|----|------|----- 1 NULL | d0 | NULL | NULL 3 NULL | d0 | d2 | a0 4 d0 | d1 | d1 | NULL 5 d0 | d2 | d2 | a0 
2
  • A, B and D are the columns that define uniqueness? Commented Apr 8, 2009 at 11:06
  • And column C can be ignored for duplicates? Commented Apr 8, 2009 at 11:06

5 Answers 5

5
DECLARE @YourTable TABLE ( A VARCHAR(2) , B VARCHAR(2) , C VARCHAR(2) , D VARCHAR(2)) INSERT INTO @YourTable VALUES (NULL, 'd0', 'd0', NULL) INSERT INTO @YourTable VALUES (NULL, 'd0', 'd1', NULL) INSERT INTO @YourTable VALUES (NULL, 'd0', 'd2', 'a0') INSERT INTO @YourTable VALUES ('d0', 'd1', 'd1', NULL) INSERT INTO @YourTable VALUES ('d0', 'd2', 'd2', 'a0') SELECT A, B, C = MIN(C), D FROM @YourTable GROUP BY A, B, D 

SELECT A, B, CASE WHEN MIN(C) = MAX(C) THEN MIN(C) ELSE NULL END, D FROM @YourTable GROUP BY A, B, D 

SELECT A, B, CASE WHEN MIN(COALESCE(C, 'dx')) = MAX(COALESCE(C, 'dx')) THEN MIN(C) ELSE NULL END, D FROM @YourTable GROUP BY A, B, D 
Sign up to request clarification or add additional context in comments.

3 Comments

+1 - GROUP BY exists to combine "similar" rows that share the same value of a subset of columns, and thus is exactly what the OP is asking for.
If column C is the same for rows 1 and 2, or one value is NULL, then this does not give NULL. It works if the 2 values of column C are different and both NOT NULL
Clever solution. Thanks. I made my example a little too simple so I had a hard time applying it to the real situation but it works now.
2

Use Dense_Rank() to partition by A, B, and D
(Thanks Lieven, for the temp table query, I had to use it for demo to be consistent ;))

According to MSDN,

The rank of a row is one plus the number of distinct ranks that come before the row in question

Partitioning by A, B, C and then sorting by A, B, C, D will give you the rank of 1 for the first distinct value where uniqueness is defined by A, B, D. That is where filtering by 1 came from.

where DenseRank = 1

Here is the result

alt text

Here is the code:

DECLARE @YourTable TABLE ( A VARCHAR(2) , B VARCHAR(2) , C VARCHAR(2) , D VARCHAR(2)) INSERT INTO @YourTable VALUES (NULL, 'd0', 'd0', NULL) INSERT INTO @YourTable VALUES (NULL, 'd0', 'd1', NULL) INSERT INTO @YourTable VALUES (NULL, 'd0', 'd2', 'a0') INSERT INTO @YourTable VALUES ('d0', 'd1', 'd1', NULL) INSERT INTO @YourTable VALUES ('d0', 'd2', 'd2', 'a0') ;with DistinctTable as ( select *, DenseRank = Dense_Rank() over (Partition By A, B, D order by A, B, C, D) from @YourTable ) select A, B, C, D from DistinctTable where DenseRank = 1 

Comments

0

A subquery perhaps?

SELECT A,B,C,D FROM table1 WHERE EXISTS ( SELECT DISTINCT A,B,D FROM table1 );

1 Comment

Tested this? Just gives 5 rows: the exists will either give all rows or no rows)
0

The fact you have NULLs in A and D compicates matters for any EXISTS.

Any MIN/MAX solution on C may not give you NULL as I think you want. Otherwise, use MIN(C) and a simple group by.

You have to extract the unique keys first (A, B, D), then use that to determine extract the rows again and work out what to do with C

DECLARE @TheTable TABLE ( A varchar(2) NULL, B varchar(2) NULL, C varchar(2) NULL, D varchar(2) NULL ) INSERT INTO @TheTable VALUES (NULL, 'd0', 'd0', NULL) INSERT INTO @TheTable VALUES (NULL, 'd0', 'd1', NULL) INSERT INTO @TheTable VALUES (NULL, 'd0', 'd2', 'a0') INSERT INTO @TheTable VALUES ('d0', 'd1', 'd1', NULL) INSERT INTO @TheTable VALUES ('d0', 'd2', 'd2', 'a0') SELECT DISTINCT T.A, T.B, CASE Number WHEN 1 THEN T.C ELSE NULL END, T.D FROM (SELECT COUNT(*) AS Number, A, B, D FROM @TheTable GROUP BY A, B, D ) UQ JOIN @TheTable T ON ISNULL(T.A, '') = ISNULL(UQ.A, '') AND ISNULL(T.B, '') = ISNULL(UQ.B, '') AND ISNULL(T.D, '') = ISNULL(UQ.D, '') 

Comments

0

if you have an unique id in the table, then i would go for something like this:

 SELECT A,B,C,D FROM table WHERE id IN (SELECT DISTINCT A,B,D) 

The problem is that you would always get the first value of C, not the first one with an value.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.