3

We have a table business_users with a user_id and business_id and we have duplicates. How can I write a query that will delete all duplicates except for one?

4
  • 1
    Click through the related questions. I found a bunch of ideas to try a few weeks ago when I was looking for this problem. I mixed and matched a few to get the desired results. Commented Sep 18, 2012 at 18:39
  • 2
    Do you have any primary key, or other unique constraint, on this table? Or are user_id and business_id the only columns, such that entire rows are duplicated? Commented Sep 18, 2012 at 18:43
  • Looks like a duplicate of stackoverflow.com/questions/672702/… Commented Sep 18, 2012 at 18:44
  • yes there is a id as the primary key Commented Sep 18, 2012 at 18:52

2 Answers 2

10

Completely identical rows

If you want to avoid completely identical rows, as I understood your question at first, then you can select unique rows to a separate table and recreate the table data from that.

CREATE TEMPORARY TABLE tmp SELECT DISTINCT * FROM business_users; DELETE FROM business_users; INSERT INTO business_users SELECT * FROM tmp; DROP TABLE tmp; 

Be careful if there are any foreign key constraints referencing this table, though, as the temporary deletion of rows might lead to cascaded deletions elsewhere.

Introducing a unique constraint

If you only care about pairs of user_id and business_id, you probably want to avoid introducing duplicates in the future. You can move the existing data to a temporary table, add a constraint, and then move the table data back, ignoring duplicates.

CREATE TEMPORARY TABLE tmp SELECT * FROM business_users; DELETE FROM business_users; ALTER TABLE business_users ADD UNIQUE (user_id, business_id); INSERT IGNORE INTO business_users SELECT * FROM tmp; DROP TABLE tmp; 

The above answer is based on this answer. The warning about foreign keys applies just as it did in the section above.

One-shot removal

If you only want to execute a single query, without modifying the table structure in any way, and you have a primary key id identifying each row, then you can try the following:

DELETE FROM business_users WHERE id NOT IN (SELECT MIN(id) FROM business_users GROUP BY user_id, business_id); 

A similar idea was previously suggested by this answer.

If the above request fails, because you are not allowed to read and delete from a table in the same step, you can again use a temporary table:

CREATE TEMPORARY TABLE tmp SELECT MIN(id) id FROM business_users GROUP BY user_id, business_id; DELETE FROM business_users WHERE id NOT IN (SELECT id FROM tmp); DROP TABLE tmp; 

If you want to, you can still introduce a uniqueness constraint after cleaning the data in this fashion. To do so, execute the ALTER TABLE line from the previous section.

Sign up to request clarification or add additional context in comments.

4 Comments

I like the last one but i get You can't specify target table 'business_users' for update in FROM clause
@Trace, I added a version which should avoid that problem.
Just out of curiousity, for the one-shot removal, why does the first example have SELECT MIN(id) FROM and the second one have SELECT MIN(id) id FROM (the second has two 'id's)?
@Pete: The MIN(id) id in the second is an abbreviation of MIN(id) AS id: it specifies the name of the column, so that the column in the resulting table isn't literally named MIN(id) which would be quite confusing and hard to type. In the first query, the name of the column does not matter, since the subquery is just used as a set.
3

Since you have a primary key, you can use that to pick which rows to keep:

delete from business_users where id not in ( select id from ( select min(id) as id -- Make a list of the primary keys to keep from business_users group by user_id, business_id -- Group by your duplicated row definition ) as a -- Derived table to force an implicit temp table ); 

In this way, you won't need to create/drop temp tables and such (except the implicit one).

You might want to put a unique constraint on user_id, business_id so you don't have to worry about this again.

4 Comments

looks great but i get this You can't specify target table 'business_users' for update in FROM clause
@Trace, sorry...I've updated to make a subquery work in mysql in this scenario.
Note: I've read the same suggestion about using a subquery, but it failed in my own test setup. Seems to be due to the fact that I created business_users as a temporary table as well, for testing. In that case, the error is phrased Can't reopen table: 'business_users' which amounts to pretty much the same problem (at least in my eyes), but cannot be avoided by introducing yet another subquery.
Interesting. Here is my test sqlfiddle. Could you possibly give us a better definition of your existing schema that's throwing the error? Perhaps you will need to put the primary keys you wish to keep into a temp table.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.