1

I have a table like this:

Location E-Mail Name Language Creation Time ----------------------------------------------------------------------- D [email protected] Max de 19:06:00 <-- A D [email protected] Peter en 19:10:00 D [email protected] Lisa en 17:39:00 E [email protected] Tom fr 05:00:00 <-- B E [email protected] Carl en 09:01:00 D [email protected] Max en 17:45:00 <-- A D [email protected] Mike de 17:45:00 <-- A E [email protected] Tom de 06:00:00 <-- B 

I want to identify every record, where Location AND E-Mail are the same, delete the "duplicates" and keep only the newest record. If the creation time is the same (A), then it doesn't matter, which record is deleted.

The result-table should be:

Location E-Mail Name Language Creation Time ----------------------------------------------------------------------- D [email protected] Peter en 19:10:00 D [email protected] Lisa en 17:39:00 E [email protected] Tom fr 05:00:00 <-- B E [email protected] Carl en 09:01:00 D [email protected] Mike de 17:45:00 <-- A 

Because the table is very long, the solution should be speedy. :-)

Thanks for every hint!

Cheers, Holger

2
  • 1
    Do you have a primary key for the table? Commented Jan 8, 2014 at 19:25
  • @shadowjfaith: No primary key. I'm planning to use the combination of 'Location' and 'E-Mail' as primary key in the resulting table... Commented Jan 8, 2014 at 19:40

1 Answer 1

6

One approach would be to use a CTE (Common Table Expression) if you're on SQL Server 2005 and newer (you aren't specific enough in that regard).

With this CTE, you can partition your data by some criteria - i.e. your (Location, EMail) - and have SQL Server number all your rows starting at 1 for each of those "partitions", ordered by the descending CreationTime.

So try something like this:

;WITH NewestData AS ( SELECT Location, EMail, Name, Language, CreationTime, RowNum = ROW_NUMBER() OVER(PARTITION BY Location, EMail ORDER BY CreationTime DESC) FROM dbo.YourTableNameHere ) DELETE FROM NewestData WHERE RowNum > 1 

Here, I am selecting only the "first" (newest) entry for each "partition" (i.e. for each (Location, EMail) tuple) - ordered by the CreationTime in a descending order.

So the newest entry for each (Location, EMail) tuple has a value of 1 for its RowNum, any "duplicates" have values larger than 1 - so I just delete away all those "duplicates" - and you're done!

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks a lot! Working with MS SQL-Server 2008, so should be fine! I test it and report back :-)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.