How to delete duplicate rows that are exactly the same in SQL Server [duplicate]

Question

I loaded some data into a SQL Server table from a .CSV file for test purposes, I don't have any primary key, unique key or auto-generated ID in that table.

Helow is an example of the situation:

select * from people where name in (select name from people group by name having count(name) > 1)

When I run this query, I get these results:

The goal is to keep one row and remove other duplicate rows.

Is there any way other than save the content somewhere else, delete all duplicate rows and insert a new one?

Thanks for helping!

GMB · Accepted Answer · 2020-04-14 14:46:20Z

You could use an updatable CTE for this.

If you want to delete rows that are exact duplicates on the three columns (as shown in your sample data and explained in the question):

with cte as ( select row_number() over(partition by name, age, gender order by (select null)) rn from people ) delete from cte where rn > 1

If you want to delete duplicates on name only (as shown in your existing query):

with cte as ( select row_number() over(partition by name order by (select null)) rn from people ) delete from cte where rn > 1

Gordon Linoff · Accepted Answer · 2020-04-14 14:46:29Z

How are you defining "duplicate"? Based on your code example, it appears to be by name.

For the deletion, you can use an updatable CTE with row_number():

with todelete as ( select p.*, row_number() over (partition by name order by (select null)) as seqnum from people p ) delete from todelete where seqnum > 1;

If more columns define the duplicate, then adjust the partition by clause.

Collectives™ on Stack Overflow

How to delete duplicate rows that are exactly the same in SQL Server [duplicate]

2 Answers 2

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Linked

Related