I've written a sql statement to only keep one instance (minimum id) where there are duplicated product_codes. The issue is that the statement is very inefficient and takes absolutely ages to run, so I'm hoping there is a more efficient way to write it
The dataset is structured as:
id product_code cat_desc product_desc 1 2352345 423 COCA COLA 2 8967896 457 FANTA 3 6456466 435 SPARKLING WATER 4 3562314 457 STILL WATER The statement is:
DELETE FROM raw_products_inter WHERE id IN (SELECT id FROM raw_products_inter outer_table WHERE product_code IN (SELECT product_code FROM raw_products_inter GROUP BY 1 HAVING COUNT(id) > 1) AND id NOT IN (SELECT MIN(id) FROM raw_products_inter inner_table WHERE inner_table.product_code = outer_table.product_code))