I am attempting to insert many records using T-SQL's MERGE statement, but my query fails to INSERT when there are duplicate records in the source table. The failure is caused by:
- The target table has a Primary Key based on two columns
- The source table may contain duplicate records that violate the target table's Primary Key constraint ("Violation of PRIMARY KEY constraint" is thrown)
I'm looking for a way to change my MERGE statement so that it either ignores duplicate records within the source table and/or will try/catch the INSERT statement to catch exceptions that may occur (i.e. all other INSERT statements will run regardless of the few bad eggs that may occur) - or, maybe, there's a better way to go about this problem?
Here's a query example of what I'm trying to explain. The example below will add 100k records to a temp table and then will attempt to insert those records in the target table -
EDIT In my original post I only included two fields in the example tables which gave way to SO friends to give a DISTINCT solution to avoid duplicates in the MERGE statement. I should have mentioned that in my real-world problem the tables have 15 fields and of those 15, two of the fields are a CLUSTERED PRIMARY KEY. So the DISTINCT keyword doesn't work because I need to SELECT all 15 fields and ignore duplicates based on two of the fields.
I have updated the query below to include one more field, col4. I need to include col4 in the MERGE, but I only need to make sure that ONLY col2 and col3 are unique.
-- Create the source table CREATE TABLE #tmp ( col2 datetime NOT NULL, col3 int NOT NULL, col4 int ) GO -- Add a bunch of test data to the source table -- For testing purposes, allow duplicate records to be added to this table DECLARE @loopCount int = 100000 DECLARE @loopCounter int = 0 DECLARE @randDateOffset int DECLARE @col2 datetime DECLARE @col3 int DECLARE @col4 int WHILE (@loopCounter) < @loopCount BEGIN SET @randDateOffset = RAND() * 100000 SET @col2 = DATEADD(MI,@randDateOffset,GETDATE()) SET @col3 = RAND() * 1000 SET @col4 = RAND() * 10 INSERT INTO #tmp (col2,col3,col4) VALUES (@col2,@col3,@col4); SET @loopCounter = @loopCounter + 1 END -- Insert the source data into the target table -- How do we make sure we don't attempt to INSERT a duplicate record? Or how can we -- catch exceptions? Or? MERGE INTO dbo.tbl1 AS tbl USING (SELECT * FROM #tmp) AS src ON (tbl.col2 = src.col2 AND tbl.col3 = src.col3) WHEN NOT MATCHED THEN INSERT (col2,col3,col4) VALUES (src.col2,src.col3,src.col4); GO
group by col2, col3andmin(col4) as col4.