1

I have a database table with the columns: email, name, id

There are duplicate records with the same email and name (ids may or may not be different)

I want to remove all duplicates and not keep any of them.

Example

email | name | id ------------------------------------------------- [email protected] | a | 2 --- remove [email protected] | a | 4 --- remove [email protected] | b | 3 [email protected] | c | 5 

What sql query can I use in phpmyadmin to do this?

2
  • possible duplicate query stackoverflow.com/questions/18932/… Commented Jan 29, 2015 at 11:50
  • Which DBMS are you using? Postgres? Oracle? Commented Jan 29, 2015 at 12:01

3 Answers 3

2

You could use EXISTS:

DELETE FROM TableName t1 WHERE EXISTS ( SELECT 1 FROM TableName t2 WHERE t1.id <> t2.id AND COALESCE(t1.email,'') = COALESCE(t2.email,'') AND COALESCE(t1.name,'') = COALESCE(t2.name,'') ) 

I've used COALESCE to also delete duplicates if the emails or names are null.

Sign up to request clarification or add additional context in comments.

6 Comments

Hi, thank you. Trying to digest your answer now. Need to understand the meaning of <> and coalesce first.
@user3509426: <> simply means "not equal"
@user3509426; <> just means different(like != in C# or other programming languages). COALESCE is a function which replaces NULL with something else, in this case an empty string. Otherwise you can't compare an email that is NULL with comparison oprators like = or <> because NULL means "undefined". Another way without COALESCE is to use IS NULL: AND (t1.email IS NULL AND t2.email IS NULL) OR (t1.email = t2.email) ....
Hi, I have ran your code. There is a syntax error though. I have been trying to play around with the syntax but I cannot fix it.
@user3509426: what query have you used and what was the syntax error exactly?
|
0

In MySQL, you should do this with a join:

delete t from example t join (select email, name from example group by email, name having count(*) > 1 ) todelete on t.email = todelete.email and t.name = todelete.name; 

Unfortunately, MySQL does not support simple subqueries on the table being modified in an update or delete statement.

EDIT:

Deleting a lot of records can be a performance bottleneck. To get around this, create a temporary table with the records you want, truncate the original table, and re-insert them.

So, try this:

create temporary table tempt as select t.* from example t join (select email, name from example group by email, name having count(*) = 1 ) tokeep on t.email = tokeep.email and t.name = tokeep.name; truncate table example; insert into example select * from tempt; 

Try the select query first to be sure it has reasonable performance and does what you want.

4 Comments

Hi, I'm not sure if it's because my table is huge (180k records), but I've ran your code and got "Error in Processing Request Error code: 504 Error text: Gateway Time-out" Maybe I should duplicate the table with lesser records to test?
Unfortunately, the whole server seems to not be responding now. I doubt running 1 sql query can affect the whole server? And I have already closed all browser tabs.
Plesk would not load thus I had to SSH in to kill the sqld process. (1) I then tested using a table with 28 records. 14 are duplicates of the other 14. I ran your code and all records were wiped out, which is great. (2) I tested again by throwing in some unique records, and again your code worked fine since those unqiue ones were intact. (3) Now I just need to figure out how to run your code for my 180k records table. Would you have any advice for it?
Perfect, thank you - I have tested quite a number of times and each took less than 3mins to execute on my 180k records. I have found that there are values that hinder finding duplicates - example there is a block of space in front or behind a name/email. and these blocks are in different sizes. They are not the spacebar (that you hit on your keyboard) spaces. Would you know what keywords I have to use to google for solution? I found link which doesnt help me.
-1

DELETE n1 FROM tablename n1, tablename n2 WHERE n1.email = n2.email

1 Comment

not good. it will delete all the records having not null email.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.