0

I've written a sql statement to only keep one instance (minimum id) where there are duplicated product_codes. The issue is that the statement is very inefficient and takes absolutely ages to run, so I'm hoping there is a more efficient way to write it

The dataset is structured as:

id product_code cat_desc product_desc 1 2352345 423 COCA COLA 2 8967896 457 FANTA 3 6456466 435 SPARKLING WATER 4 3562314 457 STILL WATER 

The statement is:

DELETE FROM raw_products_inter WHERE id IN (SELECT id FROM raw_products_inter outer_table WHERE product_code IN (SELECT product_code FROM raw_products_inter GROUP BY 1 HAVING COUNT(id) > 1) AND id NOT IN (SELECT MIN(id) FROM raw_products_inter inner_table WHERE inner_table.product_code = outer_table.product_code)) 

1 Answer 1

1

You should be able to boost the performance using an EXISTS condition:

DELETE FROM raw_products_inter P WHERE EXISTS ( SELECT * FROM raw_products_inter OP WHERE OP.product_code = P.product_code AND OP.id < P.id ) 
Sign up to request clarification or add additional context in comments.

1 Comment

what an elegant solution, thanks. my original query I had to kill after an hour of running and this ran in 5 seconds :)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.