Delete all the duplicates except one

Question

We have a table business_users with a user_id and business_id and we have duplicates. How can I write a query that will delete all duplicates except for one?

Click through the related questions. I found a bunch of ideas to try a few weeks ago when I was looking for this problem. I mixed and matched a few to get the desired results. — MetalFrog
– MetalFrog, Commented Sep 18, 2012 at 18:39
Do you have any primary key, or other unique constraint, on this table? Or are user_id and business_id the only columns, such that entire rows are duplicated? — ruakh
– ruakh, Commented Sep 18, 2012 at 18:43
Looks like a duplicate of stackoverflow.com/questions/672702/… — cptScarlet
– cptScarlet, Commented Sep 18, 2012 at 18:44

Community · Accepted Answer · 2017-05-23 12:10:53Z

Completely identical rows

If you want to avoid completely identical rows, as I understood your question at first, then you can select unique rows to a separate table and recreate the table data from that.

CREATE TEMPORARY TABLE tmp SELECT DISTINCT * FROM business_users; DELETE FROM business_users; INSERT INTO business_users SELECT * FROM tmp; DROP TABLE tmp;

Be careful if there are any foreign key constraints referencing this table, though, as the temporary deletion of rows might lead to cascaded deletions elsewhere.

Introducing a unique constraint

If you only care about pairs of user_id and business_id, you probably want to avoid introducing duplicates in the future. You can move the existing data to a temporary table, add a constraint, and then move the table data back, ignoring duplicates.

CREATE TEMPORARY TABLE tmp SELECT * FROM business_users; DELETE FROM business_users; ALTER TABLE business_users ADD UNIQUE (user_id, business_id); INSERT IGNORE INTO business_users SELECT * FROM tmp; DROP TABLE tmp;

The above answer is based on this answer. The warning about foreign keys applies just as it did in the section above.

One-shot removal

If you only want to execute a single query, without modifying the table structure in any way, and you have a primary key id identifying each row, then you can try the following:

DELETE FROM business_users WHERE id NOT IN (SELECT MIN(id) FROM business_users GROUP BY user_id, business_id);

A similar idea was previously suggested by this answer.

If the above request fails, because you are not allowed to read and delete from a table in the same step, you can again use a temporary table:

CREATE TEMPORARY TABLE tmp SELECT MIN(id) id FROM business_users GROUP BY user_id, business_id; DELETE FROM business_users WHERE id NOT IN (SELECT id FROM tmp); DROP TABLE tmp;

If you want to, you can still introduce a uniqueness constraint after cleaning the data in this fashion. To do so, execute the ALTER TABLE line from the previous section.

I like the last one but i get You can't specify target table 'business_users' for update in FROM clause
Just out of curiousity, for the one-shot removal, why does the first example have SELECT MIN(id) FROM and the second one have SELECT MIN(id) id FROM (the second has two 'id's)?
@Pete: The MIN(id) id in the second is an abbreviation of MIN(id) AS id: it specifies the name of the column, so that the column in the resulting table isn't literally named MIN(id) which would be quite confusing and hard to type. In the first query, the name of the column does not matter, since the subquery is just used as a set.

Community · Accepted Answer · 2017-05-23 10:32:29Z

3

Since you have a primary key, you can use that to pick which rows to keep:

delete from business_users where id not in ( select id from ( select min(id) as id -- Make a list of the primary keys to keep from business_users group by user_id, business_id -- Group by your duplicated row definition ) as a -- Derived table to force an implicit temp table );

In this way, you won't need to create/drop temp tables and such (except the implicit one).

You might want to put a unique constraint on user_id, business_id so you don't have to worry about this again.

edited May 23, 2017 at 10:32

CommunityBot

11 silver badge

answered Sep 18, 2012 at 19:04

Tim Lehner

15.3k4 gold badges61 silver badges78 bronze badges

4 Comments

Matt Elhotiby Over a year ago

looks great but i get this You can't specify target table 'business_users' for update in FROM clause

Tim Lehner Over a year ago

@Trace, sorry...I've updated to make a subquery work in mysql in this scenario.

MvG Over a year ago

Note: I've read the same suggestion about using a subquery, but it failed in my own test setup. Seems to be due to the fact that I created business_users as a temporary table as well, for testing. In that case, the error is phrased Can't reopen table: 'business_users' which amounts to pretty much the same problem (at least in my eyes), but cannot be avoided by introducing yet another subquery.

Tim Lehner Over a year ago

Interesting. Here is my test sqlfiddle. Could you possibly give us a better definition of your existing schema that's throwing the error? Perhaps you will need to put the primary keys you wish to keep into a temp table.

Collectives™ on Stack Overflow

Delete all the duplicates except one

2 Answers 2

Completely identical rows

Introducing a unique constraint

One-shot removal

4 Comments

4 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Completely identical rows

Introducing a unique constraint

One-shot removal

4 Comments

4 Comments

Linked

Related