-1

I loaded some data into a SQL Server table from a .CSV file for test purposes, I don't have any primary key, unique key or auto-generated ID in that table.

Helow is an example of the situation:

select * from people where name in (select name from people group by name having count(name) > 1) 

When I run this query, I get these results:

enter image description here

The goal is to keep one row and remove other duplicate rows.

Is there any way other than save the content somewhere else, delete all duplicate rows and insert a new one?

Thanks for helping!

0

2 Answers 2

2

You could use an updatable CTE for this.

If you want to delete rows that are exact duplicates on the three columns (as shown in your sample data and explained in the question):

with cte as ( select row_number() over(partition by name, age, gender order by (select null)) rn from people ) delete from cte where rn > 1 

If you want to delete duplicates on name only (as shown in your existing query):

with cte as ( select row_number() over(partition by name order by (select null)) rn from people ) delete from cte where rn > 1 
Sign up to request clarification or add additional context in comments.

Comments

1

How are you defining "duplicate"? Based on your code example, it appears to be by name.

For the deletion, you can use an updatable CTE with row_number():

with todelete as ( select p.*, row_number() over (partition by name order by (select null)) as seqnum from people p ) delete from todelete where seqnum > 1; 

If more columns define the duplicate, then adjust the partition by clause.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.