Eliminate partial duplicate rows from result set

Question

I have a query that returns a result set similar to the one below (in reality it is far bigger, thousands of rows):

 A | B | C | D -----|----|----|----- 1 NULL | d0 | d0 | NULL 2 NULL | d0 | d1 | NULL 3 NULL | d0 | d2 | a0 4 d0 | d1 | d1 | NULL 5 d0 | d2 | d2 | a0

Two of the rows are considered duplicates, 1 and 2, because A, B and D are the same. To eliminate this, I could use SELECT DISTINCT A, B, D but then I do not get column C in my result set. Column C is necessary information for rows 3, 4 and 5.

So how do I come from the result set above to this one (the result appearing in C4 can also be NULL instead of d1):

 A | B | C | D -----|----|------|----- 1 NULL | d0 | NULL | NULL 3 NULL | d0 | d2 | a0 4 d0 | d1 | d1 | NULL 5 d0 | d2 | d2 | a0

A, B and D are the columns that define uniqueness?

gbn
– gbn

2009-04-08 11:06:02 +00:00
Commented Apr 8, 2009 at 11:06 — gbn
– gbn, Commented Apr 8, 2009 at 11:06
And column C can be ignored for duplicates?

gbn
– gbn

2009-04-08 11:06:53 +00:00
Commented Apr 8, 2009 at 11:06 — gbn
– gbn, Commented Apr 8, 2009 at 11:06

Lieven Keersmaekers · Accepted Answer · 2009-04-08 11:55:22Z

DECLARE @YourTable TABLE ( A VARCHAR(2) , B VARCHAR(2) , C VARCHAR(2) , D VARCHAR(2)) INSERT INTO @YourTable VALUES (NULL, 'd0', 'd0', NULL) INSERT INTO @YourTable VALUES (NULL, 'd0', 'd1', NULL) INSERT INTO @YourTable VALUES (NULL, 'd0', 'd2', 'a0') INSERT INTO @YourTable VALUES ('d0', 'd1', 'd1', NULL) INSERT INTO @YourTable VALUES ('d0', 'd2', 'd2', 'a0') SELECT A, B, C = MIN(C), D FROM @YourTable GROUP BY A, B, D

SELECT A, B, CASE WHEN MIN(C) = MAX(C) THEN MIN(C) ELSE NULL END, D FROM @YourTable GROUP BY A, B, D

SELECT A, B, CASE WHEN MIN(COALESCE(C, 'dx')) = MAX(COALESCE(C, 'dx')) THEN MIN(C) ELSE NULL END, D FROM @YourTable GROUP BY A, B, D

+1 - GROUP BY exists to combine "similar" rows that share the same value of a subset of columns, and thus is exactly what the OP is asking for.
If column C is the same for rows 1 and 2, or one value is NULL, then this does not give NULL. It works if the 2 values of column C are different and both NOT NULL
Clever solution. Thanks. I made my example a little too simple so I had a hard time applying it to the real situation but it works now.

Community · Accepted Answer · 2017-02-08 14:11:23Z

Use Dense_Rank() to partition by A, B, and D
(Thanks Lieven, for the temp table query, I had to use it for demo to be consistent ;))

According to MSDN,

The rank of a row is one plus the number of distinct ranks that come before the row in question

Partitioning by A, B, C and then sorting by A, B, C, D will give you the rank of 1 for the first distinct value where uniqueness is defined by A, B, D. That is where filtering by 1 came from.

where DenseRank = 1

Here is the result

alt text

Here is the code:

DECLARE @YourTable TABLE ( A VARCHAR(2) , B VARCHAR(2) , C VARCHAR(2) , D VARCHAR(2)) INSERT INTO @YourTable VALUES (NULL, 'd0', 'd0', NULL) INSERT INTO @YourTable VALUES (NULL, 'd0', 'd1', NULL) INSERT INTO @YourTable VALUES (NULL, 'd0', 'd2', 'a0') INSERT INTO @YourTable VALUES ('d0', 'd1', 'd1', NULL) INSERT INTO @YourTable VALUES ('d0', 'd2', 'd2', 'a0') ;with DistinctTable as ( select *, DenseRank = Dense_Rank() over (Partition By A, B, D order by A, B, C, D) from @YourTable ) select A, B, C, D from DistinctTable where DenseRank = 1

Speedy · Accepted Answer · 2009-04-08 10:52:35Z

0

A subquery perhaps?

SELECT A,B,C,D FROM table1 WHERE EXISTS ( SELECT DISTINCT A,B,D FROM table1 );

answered Apr 8, 2009 at 10:52

Speedy

2561 gold badge5 silver badges12 bronze badges

1 Comment

gbn Over a year ago

Tested this? Just gives 5 rows: the exists will either give all rows or no rows)

gbn · Accepted Answer · 2009-04-08 11:15:27Z

The fact you have NULLs in A and D compicates matters for any EXISTS.

Any MIN/MAX solution on C may not give you NULL as I think you want. Otherwise, use MIN(C) and a simple group by.

You have to extract the unique keys first (A, B, D), then use that to determine extract the rows again and work out what to do with C

DECLARE @TheTable TABLE ( A varchar(2) NULL, B varchar(2) NULL, C varchar(2) NULL, D varchar(2) NULL ) INSERT INTO @TheTable VALUES (NULL, 'd0', 'd0', NULL) INSERT INTO @TheTable VALUES (NULL, 'd0', 'd1', NULL) INSERT INTO @TheTable VALUES (NULL, 'd0', 'd2', 'a0') INSERT INTO @TheTable VALUES ('d0', 'd1', 'd1', NULL) INSERT INTO @TheTable VALUES ('d0', 'd2', 'd2', 'a0') SELECT DISTINCT T.A, T.B, CASE Number WHEN 1 THEN T.C ELSE NULL END, T.D FROM (SELECT COUNT(*) AS Number, A, B, D FROM @TheTable GROUP BY A, B, D ) UQ JOIN @TheTable T ON ISNULL(T.A, '') = ISNULL(UQ.A, '') AND ISNULL(T.B, '') = ISNULL(UQ.B, '') AND ISNULL(T.D, '') = ISNULL(UQ.D, '')

JK. · Accepted Answer · 2012-05-12 06:25:16Z

if you have an unique id in the table, then i would go for something like this:

 SELECT A,B,C,D FROM table WHERE id IN (SELECT DISTINCT A,B,D)

The problem is that you would always get the first value of C, not the first one with an value.

Collectives™ on Stack Overflow

Eliminate partial duplicate rows from result set

5 Answers 5

3 Comments

Comments

1 Comment

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

3 Comments

Comments

1 Comment

Comments

Comments

Linked

Related