SQL to remove duplicated rows

Question

I've written a sql statement to only keep one instance (minimum id) where there are duplicated product_codes. The issue is that the statement is very inefficient and takes absolutely ages to run, so I'm hoping there is a more efficient way to write it

The dataset is structured as:

id product_code cat_desc product_desc 1 2352345 423 COCA COLA 2 8967896 457 FANTA 3 6456466 435 SPARKLING WATER 4 3562314 457 STILL WATER

The statement is:

DELETE FROM raw_products_inter WHERE id IN (SELECT id FROM raw_products_inter outer_table WHERE product_code IN (SELECT product_code FROM raw_products_inter GROUP BY 1 HAVING COUNT(id) > 1) AND id NOT IN (SELECT MIN(id) FROM raw_products_inter inner_table WHERE inner_table.product_code = outer_table.product_code))

Jordi Llull · Accepted Answer · 2015-03-08 19:07:50Z

1

You should be able to boost the performance using an EXISTS condition:

DELETE FROM raw_products_inter P WHERE EXISTS ( SELECT * FROM raw_products_inter OP WHERE OP.product_code = P.product_code AND OP.id < P.id )

answered Mar 8, 2015 at 19:07

Jordi Llull

8106 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Sam Gilbert Over a year ago

what an elegant solution, thanks. my original query I had to kill after an hour of running and this ran in 5 seconds :)

Collectives™ on Stack Overflow

SQL to remove duplicated rows

1 Answer 1

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Related