Delete records with two identical column values in a table

Question

I have a table like this:

Location E-Mail Name Language Creation Time ----------------------------------------------------------------------- D [email protected] Max de 19:06:00 <-- A D [email protected] Peter en 19:10:00 D [email protected] Lisa en 17:39:00 E [email protected] Tom fr 05:00:00 <-- B E [email protected] Carl en 09:01:00 D [email protected] Max en 17:45:00 <-- A D [email protected] Mike de 17:45:00 <-- A E [email protected] Tom de 06:00:00 <-- B

I want to identify every record, where Location AND E-Mail are the same, delete the "duplicates" and keep only the newest record. If the creation time is the same (A), then it doesn't matter, which record is deleted.

The result-table should be:

Location E-Mail Name Language Creation Time ----------------------------------------------------------------------- D [email protected] Peter en 19:10:00 D [email protected] Lisa en 17:39:00 E [email protected] Tom fr 05:00:00 <-- B E [email protected] Carl en 09:01:00 D [email protected] Mike de 17:45:00 <-- A

Because the table is very long, the solution should be speedy. :-)

Thanks for every hint!

Cheers, Holger

@shadowjfaith: No primary key. I'm planning to use the combination of 'Location' and 'E-Mail' as primary key in the resulting table... — dynobo
– dynobo, Commented Jan 8, 2014 at 19:40

marc_s · Accepted Answer · 2014-01-08 19:27:54Z

One approach would be to use a CTE (Common Table Expression) if you're on SQL Server 2005 and newer (you aren't specific enough in that regard).

With this CTE, you can partition your data by some criteria - i.e. your (Location, EMail) - and have SQL Server number all your rows starting at 1 for each of those "partitions", ordered by the descending CreationTime.

So try something like this:

;WITH NewestData AS ( SELECT Location, EMail, Name, Language, CreationTime, RowNum = ROW_NUMBER() OVER(PARTITION BY Location, EMail ORDER BY CreationTime DESC) FROM dbo.YourTableNameHere ) DELETE FROM NewestData WHERE RowNum > 1

Here, I am selecting only the "first" (newest) entry for each "partition" (i.e. for each (Location, EMail) tuple) - ordered by the CreationTime in a descending order.

So the newest entry for each (Location, EMail) tuple has a value of 1 for its RowNum, any "duplicates" have values larger than 1 - so I just delete away all those "duplicates" - and you're done!

Thanks a lot! Working with MS SQL-Server 2008, so should be fine! I test it and report back :-)

Collectives™ on Stack Overflow

Delete records with two identical column values in a table

1 Answer 1

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Related