Delete duplicate records from a SQL table without a primary key

Question

I have the below table with the below records in it

create table employee ( EmpId number, EmpName varchar2(10), EmpSSN varchar2(11) ); insert into employee values(1, 'Jack', '555-55-5555'); insert into employee values (2, 'Joe', '555-56-5555'); insert into employee values (3, 'Fred', '555-57-5555'); insert into employee values (4, 'Mike', '555-58-5555'); insert into employee values (5, 'Cathy', '555-59-5555'); insert into employee values (6, 'Lisa', '555-70-5555'); insert into employee values (1, 'Jack', '555-55-5555'); insert into employee values (4, 'Mike', '555-58-5555'); insert into employee values (5, 'Cathy', '555-59-5555'); insert into employee values (6 ,'Lisa', '555-70-5555'); insert into employee values (5, 'Cathy', '555-59-5555'); insert into employee values (6, 'Lisa', '555-70-5555');

I dont have any primary key in this table .But i have the above records in my table already. I want to remove the duplicate records which has the same value in EmpId and EmpSSN fields.

Ex : Emp id 5

How can I frame a query to delete those duplicate records?

Can you ADD a primary key?? What database system or you using? Oracle? Please specify so in your question! — marc_s
– marc_s, Commented Jun 12, 2009 at 7:19
What if it has the same EmpID and EmpSSn, but different names? — cjk
– cjk, Commented Jun 12, 2009 at 7:24
Hmmm... neither "number" nor "varchar2" are valid SQL Server 2005 data types.... smells like Oracle to me. — marc_s
– marc_s, Commented Jun 12, 2009 at 10:52

Flea · Accepted Answer · 2014-06-16 18:03:40Z

88

It is very simple. I tried in SQL Server 2008

DELETE SUB FROM (SELECT ROW_NUMBER() OVER (PARTITION BY EmpId, EmpName, EmpSSN ORDER BY EmpId) cnt FROM Employee) SUB WHERE SUB.cnt > 1

edited Jun 16, 2014 at 18:03

Flea

11.3k6 gold badges76 silver badges88 bronze badges

answered Sep 12, 2011 at 12:22

Anjib Rajkhowa

8816 silver badges2 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Bryce Wagner Over a year ago

This works well when you have a lot of columns to group by, and it neatly deals with the NULL != NULL when comparing two columns. You don't have to list each column twice like some of the other answers ("a.col = b.col" type thing), and even more importantly, you don't have to check "((a.col = b.col) OR (a.col IS NULL AND b.col IS NULL))" on NULL columns.

StuckOverflow Over a year ago

This answer actually resolves the problem, without structural changes. Works perfectly.

abatishchev · Accepted Answer · 2010-07-16 07:47:20Z

60

Add a Primary Key (code below)

Run the correct delete (code below)

Consider WHY you woudln't want to keep that primary key.

Assuming MSSQL or compatible:

ALTER TABLE Employee ADD EmployeeID int identity(1,1) PRIMARY KEY; WHILE EXISTS (SELECT COUNT(*) FROM Employee GROUP BY EmpID, EmpSSN HAVING COUNT(*) > 1) BEGIN DELETE FROM Employee WHERE EmployeeID IN ( SELECT MIN(EmployeeID) as [DeleteID] FROM Employee GROUP BY EmpID, EmpSSN HAVING COUNT(*) > 1 ) END

edited Jul 16, 2010 at 7:47

abatishchev

101k88 gold badges303 silver badges443 bronze badges

answered Jun 12, 2009 at 7:23

cjk

46.6k9 gold badges83 silver badges113 bronze badges

7 Comments

marc_s Over a year ago

+1: to quote some SQL god: "if it doesn't have a primary key, it's not a table"

gbn Over a year ago

+1 A primary key identifies a row. No PK = no sense. @marc_s: a clustered index differentiates a table from a heap. No PK simply means no data integrity

marc_s Over a year ago

@gbn: even a heap is considered a table :-) This quote was more along the lines: unless you specify a primary key, a table really doesn't have much usefulness (except in edge cases like bulk import / temporary tables etc.)

HLGEM Over a year ago

even in those edge cases I almost always add a primary key, just so I can delete dupped recrds if need be.

Stu Pegg Over a year ago

Looks like the duplicate removal is being done so the EmpID can be the primary key. The other data seems dependant on it.

|

Paul Morgan · Accepted Answer · 2010-07-20 01:33:31Z

Use the row number to differentiate between duplicate records. Keep the first row number for an EmpID/EmpSSN and delete the rest:

 DELETE FROM Employee a WHERE ROW_NUMBER() <> ( SELECT MIN( ROW_NUMBER() ) FROM Employee b WHERE a.EmpID = b.EmpID AND a.EmpSSN = b.EmpSSN )

+1 A good solution to avoid having to make structural changes
Will it work for Oracle? I had this issue stackoverflow.com/questions/34948301/…

John Conde · Accepted Answer · 2011-12-06 16:44:34Z

With duplicates As (Select *, ROW_NUMBER() Over (PARTITION by EmpID,EmpSSN Order by EmpID,EmpSSN) as Duplicate From Employee) delete From duplicates Where Duplicate > 1 ;

This will update Table and remove all duplicates from the Table!

askmish · Accepted Answer · 2012-10-20 01:38:50Z

select distinct * into newtablename from oldtablename

Now, the newtablename will have no duplicate records.

Simply change the table name(newtablename) by pressing F2 in object explorer in sql server.

JohnLBevan · Accepted Answer · 2017-04-07 11:39:31Z

Code

DELETE DUP FROM ( SELECT ROW_NUMBER() OVER (PARTITION BY Clientid ORDER BY Clientid ) AS Val FROM ClientMaster ) DUP WHERE DUP.Val > 1

Explanation

Use an inner query to construct a view over the table which includes a field based on Row_Number(), partitioned by those columns you wish to be unique.

Delete from the results of this inner query, selecting anything which does not have a row number of 1; i.e. the duplicates; not the original.

The order by clause of the row_number window function is needed for a valid syntax; you can put any column name here. If you wish to change which of the results is treated as a duplicate (e.g. keep the earliest or most recent, etc), then the column(s) used here do matter; i.e. you want to specify the order such that the record you wish to keep will come first in the result.

Welcome to Stack Overflow! Code only answers are not very useful on their own. It would help if you could add some detail explaining how/why it answers the question.
I was surprised to learn you can delete rows from an alias (or a view), and when you do this, the corresponding row(s) will be deleted from the underlying table! I read more about "updatable views" here - "You can modify the data of an underlying base table through a view, as long as the following conditions are true..."

Daren Thomas · Accepted Answer · 2009-06-12 07:16:34Z

7

You could create a temporary table #tempemployee containing a select distinct of your employee table. Then delete from employee. Then insert into employee select from #tempemployee.

Like Josh said - even if you know the duplicates, deleting them will be impossile since you cannot actually refer to a specific record if it is an exact duplicate of another record.

answered Jun 12, 2009 at 7:16

Daren Thomas

70.8k42 gold badges156 silver badges205 bronze badges

3 Comments

Josh Over a year ago

Only trick there is if the names are different but the ID/SSN match. You'd have to somehow pick one because distinct wouldn't help there.

Bill Karwin Over a year ago

+1 this is the most straightforward and portable solution. OP does not state what brand of database he uses.

Bill Karwin Over a year ago

@Josh: from the OP's sample, it looks like that's not an issue. The duplicate rows are identical in all columns.

Joe · Accepted Answer · 2010-06-02 21:30:41Z

If you don't want to create a new primary key you can use the TOP command in SQL Server:

declare @ID int while EXISTS(select count(*) from Employee group by EmpId having count(*)> 1) begin select top 1 @ID = EmpId from Employee group by EmpId having count(*) > 1 DELETE TOP(1) FROM Employee WHERE EmpId = @ID end

Abhishek Jaiswal · Accepted Answer · 2016-09-19 10:20:10Z

ITS easy use below query

WITH Dups AS ( SELECT col1,col2,col3, ROW_NUMBER() OVER(PARTITION BY col1,col2,col3 ORDER BY (SELECT 0)) AS rn FROM mytable ) DELETE FROM Dups WHERE rn > 1

Sudhar P · Accepted Answer · 2018-11-28 03:24:46Z

1

delete sub from (select ROW_NUMBER() OVer(Partition by empid order by empid)cnt from employee)sub where sub.cnt>1

answered Nov 28, 2018 at 3:24

Sudhar P

111 bronze badge

1 Comment

Simon.S.A. Over a year ago

Welcome to stackoverflow. This is an old question with a well established answer. IF you believe your answer adds something significant and new, please expand it with more explanation.

Josh · Accepted Answer · 2009-06-12 07:18:02Z

I'm not an SQL expert so bear with me. I'm sure you'll get a better answer soon enough. Here's how you can find the duplicate records.

select t1.empid, t1.empssn, count(*) from employee as t1 inner join employee as t2 on (t1.empid=t2.empid and t1.empssn = t2.empssn) group by t1.empid, t1.empssn having count(*) > 1

Deleting them will be more tricky because there is nothing in the data that you could use in a delete statement to differentiate the duplicates. I suspect the answer will involve row_number() or adding an identity column.

Anil · Accepted Answer · 2012-07-06 11:35:19Z

create unique clustered index Employee_idx on Employee ( EmpId,EmpSSN ) with ignore_dup_key

You can drop the index if you don't need it.

Praveen Nambiar · Accepted Answer · 2013-04-14 06:25:05Z

no ID, no rowcount() or no temp table needed....

WHILE ( SELECT COUNT(*) FROM TBLEMP WHERE EMPNO IN (SELECT empno from tblemp group by empno having count(empno)>1)) > 1 DELETE top(1) FROM TBLEMP WHERE EMPNO IN (SELECT empno from tblemp group by empno having count(empno)>1)

Jens Kloster · Accepted Answer · 2013-06-18 13:55:14Z

there are two columns in the a table ID and name where names are repeating with different IDs so for that you may use this query: . .

DELETE FROM dbo.tbl1 WHERE id NOT IN ( Select MIN(Id) AS namecount FROM tbl1 GROUP BY Name )

jonbarlo · Accepted Answer · 2014-07-19 04:08:45Z

Having a database table without Primary Key is really and will say extremely BAD PRACTICE...so after you add one (ALTER TABLE)

Run this until you don't see any more duplicated records (that is the purpose of HAVING COUNT)

DELETE FROM [TABLE_NAME] WHERE [Id] IN ( SELECT MAX([Id]) FROM [TABLE_NAME] GROUP BY [TARGET_COLUMN] HAVING COUNT(*) > 1 ) SELECT MAX([Id]),[TABLE_NAME], COUNT(*) AS dupeCount FROM [TABLE_NAME] GROUP BY [TABLE_NAME] HAVING COUNT(*) > 1

MAX([Id]) will cause to delete latest records (ones added after first created) in case you want the opposite meaning that in case of requiring deleting first records and leave the last record inserted please use MIN([Id])

user3713573 · Accepted Answer · 2022-03-08 09:26:13Z

Let's think out of the box.

I don't delete from the table, I make a new table first, for safety. I personally prefer do a

INSERT INTO new_table SELECT DISTINCT * FROM orig_table;

Now, new_table now should contains the expected data I want. I can check new_table to ensure that.

Then I have 2 options to replace the orig_table

A. delete orig_table; rename new_table to orig_table

B. truncate orig_table; insert data from new_table to orig_table; delete new_table (Recommended: in case you have some trigger/something else linked to the original orig_table)

Good idea, but kind of a duplicate of stackoverflow.com/a/11119012/1260022

Freelancer · Accepted Answer · 2013-06-03 09:38:35Z

select t1.* from employee t1, employee t2 where t1.empid=t2.empid and t1.empname = t2.empname and t1.salary = t2.salary group by t1.empid, t1.empname,t1.salary having count(*) > 1

Santosh kumar · Accepted Answer · 2020-10-03 16:27:55Z

delete from employee where rowid in (select rowid from (select rowid, name_count from (select rowid, count(emp_name) as name_count from employee group by emp_id, emp_name) where name_count>1))

The_Fox · Accepted Answer · 2011-11-07 12:53:21Z

DELETE FROM 'test' USING 'test' , 'test' as vtable WHERE test.id>vtable.id and test.common_column=vtable.common_column

Using this we can remove duplicate records

Anil · Accepted Answer · 2012-08-01 09:27:35Z

-3

ALTER IGNORE TABLE test ADD UNIQUE INDEX 'test' ('b');

@ here 'b' is column name to uniqueness, @ here 'test' is index name.

edited Aug 1, 2012 at 9:27

Anil

3852 silver badges24 bronze badges

answered Nov 9, 2010 at 10:18

jayaram.pagoti

1

1 Comment

Martin Smith Over a year ago

Not remotely valid SQL Server syntax.

Collectives™ on Stack Overflow

Delete duplicate records from a SQL table without a primary key

20 Answers 20

2 Comments

7 Comments

2 Comments

Comments

Comments

2 Comments

3 Comments

Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

1 Comment

1 Comment

Comments

Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

20 Answers 20

2 Comments

7 Comments

2 Comments

Comments

Comments

2 Comments

3 Comments

Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

1 Comment

1 Comment

Comments

Comments

1 Comment

Linked

Related