1

I have a database table as follows:

ProductDetails ----------------- ProductDetailsID int ProductIdentifier VARCHAR (20) ProductID int ProductFile VARCHAR(255) ProductAvailability char(2) RightsCountry varchar(MAX) Deleted bit 

There was a bug in the platform recently that allowed a large number of duplicates in. So I could have multiple ProductDetails entries which are the same EXCEPT for the ProductDetailsID (PK) and ProductFile (this is null, for some reason the duplicates didn't insert the files).

I need to write a T-SQL script that finds these duplicates with a view to deleting them (after examination).

I have found this online, which is great. It gives me the ProductIdentifier with several records, and the number of duplicates.

SELECT pd.ProductIdentifier, COUNT(pd.ProductIdentifier) AS NumOccurrences FROM dbo.ProductDetails pd GROUP BY pd.ProductIdentifier HAVING ( COUNT(pd.ProductIdentifier) > 1 ) 

The thing is, some of these records should remain. I need to select the ProductDetail records that have duplicate ProductIdentifiers, where at least 1 of the duplicates has a FileName and all other columns are exactly the same. For example, if I have a dataset as follows:

ProductDetailsID | ProductIdentifier | ProductID | ProductFile | ProductAvailability | RightsCountry | Deleted 123 | 567890 | 12 | filename.png | 20 | AU CX CC CK HM NZ NU NF TK | 0 124 | 567890 | 12 | (NULL) | 20 | AU CX CC CK HM NZ NU NF TK | 0 125 | 567890 | 12 | (NULL) | 20 | AU CX CC CK HM NZ NU NF TK | 0 

I need to return ProductDetailsID 124 and 125 as these are for deletion. I'd appreciate any guidance or links to examples or any help at all!

2
  • "at least 1 of the duplicates has a FileName" - what if more than 1 of the duplicates has a filename? Will they always match (or, if the rows match on all values except the filename, are they not considered duplicates)? Commented Apr 22, 2014 at 13:56
  • Hi @Damien_The_Unbeliever There could be more than 1 of the duplicates with a filename. I don't want to delete (select) any record with a filename. Commented Apr 22, 2014 at 14:08

3 Answers 3

1

This works and is a cheeky way of abusing the partitioned window functions:

declare @ProductDetails table( ProductDetailsID int not null, ProductIdentifier VARCHAR(20) not null, ProductID int not null, ProductFile VARCHAR(255) null, ProductAvailability char(2) not null, RightsCountry varchar(MAX) not null, Deleted bit not null ) insert into @ProductDetails(ProductDetailsID,ProductIdentifier,ProductID, ProductFile,ProductAvailability,RightsCountry,Deleted) values (123,567890,12,'filename.png',20,'AU CX CC CK HM NZ NU NF TK',0), (124,567890,12,NULL,20,'AU CX CC CK HM NZ NU NF TK',0), (125,567890,12,NULL,20,'AU CX CC CK HM NZ NU NF TK',0) ;With FillInFileNames as ( select *, MAX(ProductFile) OVER (PARTITION BY ProductIdentifier,ProductID, ProductAvailability,RightsCountry,Deleted) as AnyFileName from @ProductDetails ) select * from FillInFileNames where ProductFile is null and AnyFileName is not null 

And the fact that aggregate functions will never return NULL if at least one input value wasn't NULL.

Result:

ProductDetailsID ProductIdentifier ProductID ProductFile ProductAvailability RightsCountry Deleted AnyFileName ---------------- -------------------- ----------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 124 567890 12 NULL 20 AU CX CC CK HM NZ NU NF TK 0 filename.png 125 567890 12 NULL 20 AU CX CC CK HM NZ NU NF TK 0 filename.png 

It may also be instructive for the OP to observe that the top of my script isn't much more than the table information and sample data provided in their question - except mine is actually runnable.

It might be worth considering writing your samples in such a style in the future, because that way it can be immediately copy & pasted from your question into a query window.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks Damien, will take your advice onboard for future questions :) Thanks for the good answer !
1
create view rows_to_delete as select * from ( select *, row_number() over(partition by ProductIdentifier order by ProductFile desc, ProductDetailsID) as rn from t ) x where rn > 1 and ProductFile is null 

2 Comments

The OP has since clarified that there may be multiple rows with non-NULL ProductFile values and all of them should be preserved.
Unfortunatelly only after I've posted the answer. OK, added another condition to account for this.
0

The following query first gets the counts and then filters those records whose fields match with the entry with the fileName.

WITH counts as ( SELECT ProductDetailsID , ProductIdentifier , ProductID , ProductFile , ProductAvailability , RightsCountry , Deleted , COUNT(*) as cnt_files FROM dbo.ProductDetails pd GROUP BY ProductDetailsID , ProductIdentifier , ProductID , ProductFile , ProductAvailability , RightsCountry , Deleted ) SELECT c1.* FROM counts c1 INNER JOIN counts c2 ON c1.ProductIdentifier = c2.ProductIdentifier AND c1.ProductID = c2.ProductID AND c1.ProductFile = c2.ProductFile AND c1.ProductAvailability = c2.ProductAvailability AND c1.RightsCountry = c2.RightsCountry AND c1.Deleted = c2.Deleted WHERE ProductFile is NULL; 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.