I've got a table that has duplicate data that needs to be cleaned up. Consider the following example:
CREATE TABLE #StackOverFlow ( [ctrc_num] int, [Ctrc_name] varchar(6), [docu] bit, [adj] bit, new bit, [some_date] datetime ); INSERT INTO #StackOverFlow ([ctrc_num], [Ctrc_name], [docu], [adj], [new], [some_date]) VALUES (12345, 'John R', null, null, 1, '2023-12-11 09:05:13.003'), (12345, 'John R', 1, null, 0, '2023-12-11 09:05:12.987'), (12345, 'John R', null, null, 1, '2023-12-11 09:05:12.947'), (56789, 'Sam S', null, null, 1, '2023-12-11 09:05:13.003'), (56789, 'Sam S', null, null, 1, '2023-12-11 09:05:12.987'), (56789, 'Sam S', 1, null, 0, '2023-12-11 09:05:12.947'), (78945, 'Pat P', null, null, 1, '2023-12-11 09:05:13.003'), (78945, 'Pat P', null, null, 1, '2023-12-11 09:05:12.987'), (78945, 'Pat P', null, null, 1, '2023-12-11 09:05:12.947'); This gives me:
[ctrc_num] [Ctrc_name] [docu] [adj] [new] [some_date] ----------------------------------------------------------------------- 12345 John R NULL NULL 1 2023-12-11 09:05:13.003 12345 John R 1 NULL 0 2023-12-11 09:05:12.987 12345 John R NULL NULL 1 2023-12-11 09:05:12.947 56789 Sam S NULL NULL 1 2023-12-11 09:05:13.003 56789 Sam S NULL NULL 1 2023-12-11 09:05:12.987 56789 Sam S 1 NULL 0 2023-12-11 09:05:12.947 78945 Pat P NULL NULL 1 2023-12-11 09:05:13.003 78945 Pat P NULL NULL 1 2023-12-11 09:05:12.987 78945 Pat P NULL NULL 1 2023-12-11 09:05:12.947 What I need to do is delete from the table duplicates. If new is 0, delete the records where new is 1. If all records have new = 1 keep the newest record and delete the older ones.
The result should look like this:
[ctrc_num] [Ctrc_name] [docu] [adj] [new] [some_date] ----------------------------------------------------------------------- 12345 John R 1 NULL 0 2023-12-11 09:05:12.987 56789 Sam S 1 NULL 0 2023-12-11 09:05:12.947 78945 Pat P NULL NULL 1 2023-12-11 09:05:13.003 I've tried ROW_NUMBER:
;WITH RankedByDate AS ( SELECT ctrc_num, Ctrc_name, docu, adj, new, some_date, ROW_NUMBER() OVER (PARTITION BY Ctrc_num, Ctrc_name, [docu],[adj], [new] ORDER BY some_date DESC) AS rNum FROM #StackOverFlow ) SELECT * FROM RankedByDate This separates the ones with new = 0, but I still have the ones with new = 1 that are ordered.
Grouping gives me the records that are duplicated but no way to delete the ones needed to be deleted:
SELECT [ctrc_num] ,[Ctrc_name] ,[docu] ,[adj] ,[new] FROM #StackOverFlow GROUP BY [ctrc_num] ,[Ctrc_name] ,[docu] ,[adj] ,[new] HAVING COUNT(*) > 1
[ctrc_num]and[Ctrc_name]?[ctrc_numand[Ctrc_name]