I have a database table as follows:
ProductDetails ----------------- ProductDetailsID int ProductIdentifier VARCHAR (20) ProductID int ProductFile VARCHAR(255) ProductAvailability char(2) RightsCountry varchar(MAX) Deleted bit There was a bug in the platform recently that allowed a large number of duplicates in. So I could have multiple ProductDetails entries which are the same EXCEPT for the ProductDetailsID (PK) and ProductFile (this is null, for some reason the duplicates didn't insert the files).
I need to write a T-SQL script that finds these duplicates with a view to deleting them (after examination).
I have found this online, which is great. It gives me the ProductIdentifier with several records, and the number of duplicates.
SELECT pd.ProductIdentifier, COUNT(pd.ProductIdentifier) AS NumOccurrences FROM dbo.ProductDetails pd GROUP BY pd.ProductIdentifier HAVING ( COUNT(pd.ProductIdentifier) > 1 ) The thing is, some of these records should remain. I need to select the ProductDetail records that have duplicate ProductIdentifiers, where at least 1 of the duplicates has a FileName and all other columns are exactly the same. For example, if I have a dataset as follows:
ProductDetailsID | ProductIdentifier | ProductID | ProductFile | ProductAvailability | RightsCountry | Deleted 123 | 567890 | 12 | filename.png | 20 | AU CX CC CK HM NZ NU NF TK | 0 124 | 567890 | 12 | (NULL) | 20 | AU CX CC CK HM NZ NU NF TK | 0 125 | 567890 | 12 | (NULL) | 20 | AU CX CC CK HM NZ NU NF TK | 0 I need to return ProductDetailsID 124 and 125 as these are for deletion. I'd appreciate any guidance or links to examples or any help at all!