2539

It's easy to find duplicates with one field:

SELECT email, COUNT(email) FROM users GROUP BY email HAVING COUNT(email) > 1 

So if we have a table

ID NAME EMAIL 1 John [email protected] 2 Sam [email protected] 3 Tom [email protected] 4 Bob [email protected] 5 Tom [email protected] 

This query will give us John, Sam, Tom, Tom because they all have the same email.

However, what I want is to get duplicates with the same email and name.

That is, I want to get "Tom", "Tom".

The reason I need this: I made a mistake, and allowed inserting duplicate name and email values. Now I need to remove/change the duplicates, so I need to find them first.

2
  • 40
    I don't think it would let you select name in your first sample since it's not in an aggregate function. "What is the count of matching email addresses and their name" is some tricky logic... Commented Jan 4, 2013 at 18:09
  • 5
    Found that this doesn't work with MSSQL server because of the name field in the SELECT. Commented Nov 8, 2018 at 9:06

37 Answers 37

3818
SELECT name, email, COUNT(*) FROM users GROUP BY name, email HAVING COUNT(*) > 1 

Simply group on both of the columns.

Note: the older ANSI standard is to have all non-aggregated columns in the GROUP BY but this has changed with the idea of "functional dependency":

In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, functional dependency is a constraint that describes the relationship between attributes in a relation.

Support is not consistent:

Sign up to request clarification or add additional context in comments.

12 Comments

@webXL WHERE works with single record HAVING works with group
@gbn Is it possible to include the Id in the results? Then it would be easier to delete those duplicates afterwards.
@user797717: you'd need to have MIN(ID) and then delete for ID values not in the last if MIN(ID) values
What about cases where any of the columns have null values?
Thanks so much for this, and yes it does work in Oracle, though I needed uniqueness of the condition, so rather than >1 =1
|
447

Try this:

declare @YourTable table (id int, name varchar(10), email varchar(50)) INSERT @YourTable VALUES (1,'John','John-email') INSERT @YourTable VALUES (2,'John','John-email') INSERT @YourTable VALUES (3,'fred','John-email') INSERT @YourTable VALUES (4,'fred','fred-email') INSERT @YourTable VALUES (5,'sam','sam-email') INSERT @YourTable VALUES (6,'sam','sam-email') SELECT name,email, COUNT(*) AS CountOf FROM @YourTable GROUP BY name,email HAVING COUNT(*)>1 

OUTPUT:

name email CountOf ---------- ----------- ----------- John John-email 2 sam sam-email 2 (2 row(s) affected) 

If you want the IDs of the dups use this:

SELECT y.id,y.name,y.email FROM @YourTable y INNER JOIN (SELECT name,email, COUNT(*) AS CountOf FROM @YourTable GROUP BY name,email HAVING COUNT(*)>1 ) dt ON y.name=dt.name AND y.email=dt.email 

OUTPUT:

id name email ----------- ---------- ------------ 1 John John-email 2 John John-email 5 sam sam-email 6 sam sam-email (4 row(s) affected) 

To delete the duplicates try:

DELETE d FROM @YourTable d INNER JOIN (SELECT y.id,y.name,y.email,ROW_NUMBER() OVER(PARTITION BY y.name,y.email ORDER BY y.name,y.email,y.id) AS RowRank FROM @YourTable y INNER JOIN (SELECT name,email, COUNT(*) AS CountOf FROM @YourTable GROUP BY name,email HAVING COUNT(*)>1 ) dt ON y.name=dt.name AND y.email=dt.email ) dt2 ON d.id=dt2.id WHERE dt2.RowRank!=1 SELECT * FROM @YourTable 

OUTPUT:

id name email ----------- ---------- -------------- 1 John John-email 3 fred John-email 4 fred fred-email 5 sam sam-email (4 row(s) affected) 

1 Comment

* Table names are case sensitivearray(3) { [0]=> string(5) "42000" [1]=> int(1064) [2]=> string(226) "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '(PARTITION BY y.employee_id, y.leave_type_id ) AS RowRank ' at line 1" }
164
SELECT name, email FROM users GROUP BY name, email HAVING ( COUNT(*) > 1 ) 

1 Comment

Useful for finding duplicate name-email pairs in db users table.
113

If you want to delete the duplicates, here's a much simpler way to do it than having to find even/odd rows into a triple sub-select:

SELECT id, name, email FROM users u, users u2 WHERE u.name = u2.name AND u.email = u2.email AND u.id > u2.id 

And so to delete:

DELETE FROM users WHERE id IN ( SELECT id/*, name, email*/ FROM users u, users u2 WHERE u.name = u2.name AND u.email = u2.email AND u.id > u2.id ) 

Much more easier to read and understand IMHO

Note: The only issue is that you have to execute the request until there is no rows deleted, since you delete only 1 of each duplicate each time

7 Comments

Nice and easy to read; I'd like to find a way that deleted multiple duplicate rows in one go though.
This doesn't work for me as I get You can't specify target table 'users' for update in FROM clause
@Whitecat seems like a simple MySQL problem: stackoverflow.com/questions/4429319/…
Fails for me. I get: "DBD::CSV::st execute failed: Use of uninitialized value $_[1] in hash element at /Users/hornenj/perl5/perlbrew/perls/perl-5.26.0/lib/site_perl/5.26.0/SQL/Eval.pm line 43"
I think that where clause should be " u.name = u2.name AND u.email = u2.email AND (u.id > u2.id OR u2.id > u.id)" isn't it?
|
64

In contrast to other answers you can view the whole records containing all columns if there are any. In the PARTITION BY part of row_number function choose the desired unique/duplicit columns.

SELECT * FROM ( SELECT a.* , Row_Number() OVER (PARTITION BY Name, Age ORDER BY Name) AS r FROM Customers AS a ) AS b WHERE r > 1; 

When you want to select ALL duplicated records with ALL fields you can write it like

CREATE TABLE test ( id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY , c1 integer , c2 text , d date DEFAULT now() , v text ); INSERT INTO test (c1, c2, v) VALUES (1, 'a', 'Select'), (1, 'a', 'ALL'), (1, 'a', 'multiple'), (1, 'a', 'records'), (2, 'b', 'in columns'), (2, 'b', 'c1 and c2'), (3, 'c', '.'); SELECT * FROM test ORDER BY 1; SELECT * FROM test WHERE (c1, c2) IN ( SELECT c1, c2 FROM test GROUP BY 1,2 HAVING count(*) > 1 ) ORDER BY 1; 

Tested in PostgreSQL.

Comments

44
SELECT name, email FROM users WHERE email in (SELECT email FROM users GROUP BY email HAVING COUNT(*)>1) 

1 Comment

Doesn't work. The inner SELECT works perfect. If I copy the output of the inner SELECT, and place it into the outer one's IN( ) block, it gives me the expected result. But when I run the thing as a whole, it gives me the whole set and I don't have an idea why.
38
SELECT email, GROUP_CONCAT(id) FROM users GROUP BY email HAVING COUNT(email) > 1; 

1 Comment

Keep in mind that GROUP_CONCAT will stop after some predetermined length, so you might not get all the ids.
29

This selects/deletes all duplicate records except one record from each group of duplicates. So, the delete leaves all unique records + one record from each group of the duplicates.

Select duplicates:

SELECT * FROM <table> WHERE id NOT IN ( SELECT MIN(id) FROM table GROUP BY <column1>, <column2> ); 

Delete duplicates:

DELETE FROM <table> WHERE id NOT IN ( SELECT MIN(id) FROM table GROUP BY <column1>, <column2> ); 

Be aware of larger amounts of records, it can cause performance problems.

2 Comments

Error in delete query - You can't specify target table 'cities' for update in FROM clause
There is neither table 'cities' nor update clause. What do you mean? Where is an error in the delete query?
23
WITH CTE AS ( SELECT Id, Name, Age, Comments, RN = ROW_NUMBER() OVER (PARTITION BY Name,Age ORDER BY ccn) FROM ccnmaster ) select * from CTE 

Comments

20

Pick the solution which best fits.

Create table NewTable (id int, name varchar(10), email varchar(50)) INSERT NewTable VALUES (1,'John','[email protected]') INSERT NewTable VALUES (2,'Sam','[email protected]') INSERT NewTable VALUES (3,'Tom','[email protected]') INSERT NewTable VALUES (4,'Bob','[email protected]') INSERT NewTable VALUES (5,'Tom','[email protected]') 

enter image description here

1. USING GROUP BY CLAUSE

SELECT name, email, COUNT(*) AS Occurence FROM NewTable GROUP BY name, email HAVING COUNT(*) > 1 

enter image description here

  • The GROUP BY clause groups the rows into groups by values in both name and email columns.
  • Then, the COUNT() function returns the number of occurrences of each group (name,email).
  • Then, the HAVING clause keeps only duplicate groups, which are groups that have more than one occurrence.

2. Using a CTE:

To return the entire row for each duplicate row, join the result of the above query with the NewTable table using a common table expression (CTE):

WITH cte AS ( SELECT name, email, COUNT(*) occurrences FROM NewTable GROUP BY name, email HAVING COUNT(*) > 1 ) SELECT t1.Id, t1.name, t1.email FROM NewTable t1 INNER JOIN cte ON cte.name = t1.name AND cte.email = t1.email ORDER BY t1.name, t1.email; 

enter image description here

3. Using function ROW_NUMBER()

WITH cte AS ( SELECT name, email, ROW_NUMBER() OVER ( PARTITION BY name,email ORDER BY name,email) rownum FROM NewTable t1 ) SELECT * FROM cte WHERE rownum > 1; 

enter image description here

  • ROW_NUMBER() distributes rows of the NewTable table into partitions by values in the name and email columns. The duplicate rows will have repeated values in the name and email columns, but different row numbers
  • Outer query removes the first row in each group.

1 Comment

19

In case you work with Oracle, this way would be preferable:

create table my_users(id number, name varchar2(100), email varchar2(100)); insert into my_users values (1, 'John', '[email protected]'); insert into my_users values (2, 'Sam', '[email protected]'); insert into my_users values (3, 'Tom', '[email protected]'); insert into my_users values (4, 'Bob', '[email protected]'); insert into my_users values (5, 'Tom', '[email protected]'); commit; select * from my_users where rowid not in ( select min(rowid) from my_users group by name, email); 

Comments

15
select name, email , case when ROW_NUMBER () over (partition by name, email order by name) > 1 then 'Yes' else 'No' end "duplicated ?" from users 

3 Comments

Code only answers are frowned upon on Stack Overflow, could you explain why this answers the question?
@RichBenner: I didn't find the response such as, each & every row in the result and which tells us which all are duplicate rows and which are not in one glance and that to not group by, because if we want to combine this query with any other query group by is not a good option.
Adding Id to the select statement and filtering on duplicated , it give you the possibility to delete the duplicated ids and keep on of each.
13

SELECT id, COUNT(id) FROM table1 GROUP BY id HAVING COUNT(id)>1;

1 Comment

This doesn't quite add anything to the top answer, and technically doesn't even really differ from the code OP's posted in the question.
12

This is the easy thing I've come up with. It uses a common table expression (CTE) and a partition window (I think these features are in SQL 2008 and later).

This example finds all students with duplicate name and dob. The fields you want to check for duplication go in the OVER clause. You can include any other fields you want in the projection.

with cte (StudentId, Fname, LName, DOB, RowCnt) as ( SELECT StudentId, FirstName, LastName, DateOfBirth as DOB, SUM(1) OVER (Partition By FirstName, LastName, DateOfBirth) as RowCnt FROM tblStudent ) SELECT * from CTE where RowCnt > 1 ORDER BY DOB, LName 

Comments

12
create table my_table(id int, name varchar(100), email varchar(100)); insert into my_table values (1, 'shekh', '[email protected]'); insert into my_table values (1, 'shekh', '[email protected]'); insert into my_table values (2, 'Aman', '[email protected]'); insert into my_table values (3, 'Tom', '[email protected]'); insert into my_table values (4, 'Raj', '[email protected]'); Select COUNT(1) As Total_Rows from my_table Select Count(1) As Distinct_Rows from ( Select Distinct * from my_table) abc 

Comments

11
 select emp.ename, emp.empno, dept.loc from emp inner join dept on dept.deptno=emp.deptno inner join (select ename, count(*) from emp group by ename, deptno having count(*) > 1) t on emp.ename=t.ename order by emp.ename 

Comments

10
with MyCTE as ( select Name,EmailId,ROW_NUMBER() over(PARTITION BY EmailId order by id) as Duplicate from [Employees] ) select * from MyCTE where Duplicate>1 

Comments

10

Either a duplicated value is repeated 2 times or greater than 2. Just count them, not groupwise.

select COUNT(distinct col_01) from Table_01 

1 Comment

How would this work for the question as asked? This does not give rows that duplicate information in multiple columns (e.g. "email" and "name") in different rows.
7
SELECT * FROM users u where rowid = (select max(rowid) from users u1 where u.email=u1.email); 

Comments

7
 select * from Users a where exists (select * from Users b where (a.name = b.name or a.email = b.email) and a.ID != b.id) 

If you search for duplicates who have some kind of prefix or general change like a new domain in mail then you can use replace() at these columns.

Comments

6
SELECT name, email,COUNT(email) FROM users WHERE email IN ( SELECT email FROM users GROUP BY email HAVING COUNT(email) > 1) 

2 Comments

You can't use COUNT without GROUP BY, unless it refers to the whole table.
Without Group By you used COUNT but here i have doing a typing mistake to write COUNT
6

The most important thing here is to have the fastest function. Also indices of duplicates should be identified. Self join is a good option but to have a faster function it is better to first find rows that have duplicates and then join with original table for finding id of duplicated rows. Finally order by any column except id to have duplicated rows near each other.

SELECT u.* FROM users AS u JOIN (SELECT username, email FROM users GROUP BY username, email HAVING COUNT(*)>1) AS w ON u.username=w.username AND u.email=w.email ORDER BY u.email; 

Comments

3

To Check From duplicate Record in a table.

select * from users s where rowid < any (select rowid from users k where s.name = k.name and s.email = k.email); 

or

select * from users s where rowid not in (select max(rowid) from users k where s.name = k.name and s.email = k.email); 

To Delete the duplicate record in a table.

delete from users s where rowid < any (select rowid from users k where s.name = k.name and s.email = k.email); 

or

delete from users s where rowid not in (select max(rowid) from users k where s.name = k.name and s.email = k.email); 

Comments

3
SELECT * from (SELECT name, email, COUNT(name) OVER (PARTITION BY name, email) cnt FROM users) WHERE cnt > 1; 

Comments

3

To delete records whose names are duplicate

;WITH CTE AS ( SELECT ROW_NUMBER() OVER (PARTITION BY name ORDER BY name) AS T FROM @YourTable ) DELETE FROM CTE WHERE T > 1 

2 Comments

Does it work? How comes I get this error 'relation "cte" does not exist' in Postgres?
CTE works also in postgress sql..Here is the link postgresqltutorial.com/postgresql-cte You must be missing something else.
2
SELECT NAME, EMAIL, COUNT(*) FROM USERS GROUP BY 1,2 HAVING COUNT(*) > 1 

Comments

1

We can use having here which work on aggregate functions as shown below

create table #TableB (id_account int, data int, [date] date) insert into #TableB values (1 ,-50, '10/20/2018'), (1, 20, '10/09/2018'), (2 ,-900, '10/01/2018'), (1 ,20, '09/25/2018'), (1 ,-100, '08/01/2018') SELECT id_account , data, COUNT(*) FROM #TableB GROUP BY id_account , data HAVING COUNT(id_account) > 1 drop table #TableB 

Here as two fields id_account and data are used with Count(*). So, it will give all the records which has more than one times same values in both columns.

We some reason mistakely we had missed to add any constraints in SQL server table and the records has been inserted duplicate in all columns with front-end application. Then we can use below query to delete duplicate query from table.

SELECT DISTINCT * INTO #TemNewTable FROM #OriginalTable TRUNCATE TABLE #OriginalTable INSERT INTO #OriginalTable SELECT * FROM #TemNewTable DROP TABLE #TemNewTable 

Here we have taken all the distinct records of the orignal table and deleted the records of original table. Again we inserted all the distinct values from new table to the original table and then deleted new table.

Comments

1

Table structure:

ID NAME EMAIL 1 John [email protected] 2 Sam [email protected] 3 Tom [email protected] 4 Bob [email protected] 5 Tom [email protected] 

Solution 1:

SELECT *, COUNT(*) FROM users t1 INNER JOIN users t2 WHERE t1.id > t2.id AND t1.name = t2.name AND t1.email=t2.email 

Solution 2:

SELECT name, email, COUNT(*) FROM users GROUP BY name, email HAVING COUNT(*) > 1 

Comments

1

You can use count(*) over (partition by ...) to select all rows that are duplicate of each other. This approach gives you access to all rows and all columns (unlike group by which consolidates duplicates rows and makes ungrouped columns inaccessible).

To select the original rows or delete their duplicates use row_number() over (partition by ... order by ...).

Sample data

create table t ( id int not null primary key, name varchar(100), email varchar(100), created date ); insert into t (id, name, email, created) values (1, 'Alice', '[email protected]', '2021-01-01'), (2, 'Alice', '[email protected]', '2022-01-01'), (3, 'Alice', '[email protected]', '2023-01-01'), (4, 'Bob', '[email protected]', '2021-01-01'), (5, 'Bob', '[email protected]', '2022-01-01'), (6, 'John', '[email protected]', '2021-01-01'), (7, 'Zack', '[email protected]', '2021-01-01'); 

Select all rows that are duplicate of each other

with cte as ( select t.*, count(*) over (partition by name, email) as dup_count from t ) select * from cte where dup_count > 1; 

Result

{ Alice, [email protected] } is present three times, all three instances are selected
{ John, [email protected] } is present only once, it is excluded

| id | name | email | created | dup_count | |----|-------|-------------------|------------|-----------| | 1 | Alice | [email protected] | 2021-01-01 | 3 | | 2 | Alice | [email protected] | 2022-01-01 | 3 | | 3 | Alice | [email protected] | 2023-01-01 | 3 | | 4 | Bob | [email protected] | 2021-01-01 | 2 | | 5 | Bob | [email protected] | 2022-01-01 | 2 | 

Select (or delete) the duplicates

The CTE selects all but the oldest row in each set of duplicates
Some RDBMS support delete from CTEs
Or you may use delete from t where id in (...) approach

with cte as ( select t.*, row_number() over (partition by name, email order by created) as rn from t ) delete from cte where rn > 1; 

Result after deletion

| id | name | email | created | |----|-------|-------------------|------------| | 1 | Alice | [email protected] | 2021-01-01 | | 4 | Bob | [email protected] | 2021-01-01 | | 6 | John | [email protected] | 2021-01-01 | | 7 | Zack | [email protected] | 2021-01-01 | 

DB<>Fiddle - SQL Server
DB<>Fiddle - MySQL
DB<>Fiddle - Oracle

Comments

1

I always use CTE & ROW_NUMBER to delete the duplicate records like this:

With Temp AS ( SELECT ID, NAM, EMAIL, ROW_NUMBER() OVER(PARTITION BY NAM, EMAIL Order By ID) AS RowNo FROM Users ) // To check the duplicate values, just use the delete after verifying SELECT * FROM Temp WHERE RowNo > 1 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.