Delete duplicate rows not based on primary key

Question

I have this table in my database:

tblAgencies ---------------------- AgencyID (PK) VendorID RegionID Name Zip

Long story short, I accidentally copied my entire table into itself - so every row in my table has a duplicate.

But with my AgencyID field being the identity, and automatically incrementing, I need to find duplicates based on all the other fields, since AgencyID is unique.

Does anyone know how I can do this?

If your PK is auto incremented, can't you just run a DELETE where the AgencyID > [the last good record]? — Jed
– Jed, Commented Nov 14, 2013 at 22:22
use row_number for whatever columns make a duplicate, and delete where it = 2? — Andrew
– Andrew, Commented Nov 14, 2013 at 22:22

Aaron Bertrand · Accepted Answer · 2013-11-14 22:42:11Z

This will keep the oldest AgencyID values, and delete any duplicates otherwise.

;WITH x AS ( SELECT *, rn = ROW_NUMBER() OVER (PARTITION BY VendorID, RegionID, Name, Zip ORDER BY AgencyID) FROM dbo.tblAgencies ) DELETE x WHERE rn > 1;

Be careful, though; this may not work if other tables reference AgencyID and they've obtained any of your newer, erroneous values.

Gary Walker · Accepted Answer · 2013-11-14 22:22:09Z

1

The simplest solution, use select distinct into a temp table, then reload the original

answered Nov 14, 2013 at 22:22

Gary Walker

9,2043 gold badges22 silver badges43 bronze badges

1 Comment

Szymon Over a year ago

This will only work if there's no foreign keys referring to the original table.

Szymon · Accepted Answer · 2013-11-14 22:24:11Z

This query will give you duplicates provided that the combination of all other columns is unique:

select * from mytable t1 where exists (select * from mytable t2 where t1.VendorID = t2.VendorID and t1.RegionID = t2.RegionID and and t1.Name = t2.Name and t1.Zip = t2.Zip and t1.AgencyID > t2.AgencyID)

Diver · Accepted Answer · 2013-11-14 22:34:46Z

This should give you all the rows that have duplicate values except for the minimum agencyid row.

select * from tblAgencies where AgencyID not in (select min(AgencyID) from tblAgencies group by VendorID, RegionID, Name, Zip)

edit: adding SQLFiddle

M.Ali · Accepted Answer · 2013-11-14 22:45:54Z

;with CTE AS ( SELECT ID_Column, rn = ROW_NUMBER() OVER (PARTITION BY Column1, Column2, Column3... ORDER BY ID ASC) FROM T ) DELETE FROM CTE WHERE rn >= 2

user30410 · Accepted Answer · 2013-11-15 05:59:15Z

;with CTE AS (SELECT MAX(AgencyID) AgentID,VendorID , RegionID , Name , Zip FROM tblAgencies GROUP BY VendorID , RegionID , Name , Zip HAVING COUNT(*) > 1) DELETE FROM tblAgencies WHERE EXISTS (SELECT 1 FROM CTE WHERE AgentID = tblAgencies.AgencyID)

Mathew Collins · Accepted Answer · 2013-11-15 15:00:27Z

Lots of answers that will give you what you want here, but there's no need to use a CTE or do any grouping, the simplest way is just:

delete t1 from tblAgencies t1 join tblAgencies t2 on t1.VendorId = t2.VendorId and t1.RegionId = t2.RegionId and t1.Name = t2.Name and t1.Zip = t2.Zip and t1.AgencyId > t2.AgencyId

msi77 · Accepted Answer · 2013-11-16 09:54:34Z

0

Maybe this will help: How to delete duplicates in the presence of a primary key?

answered Nov 16, 2013 at 9:54

msi77

1,6321 gold badge11 silver badges10 bronze badges

Collectives™ on Stack Overflow

Delete duplicate rows not based on primary key

8 Answers 8

Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

8 Answers 8

Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Related