2

I have a table where I would like to identify duplicate records based on two columns(id and role) and I use a third column (unit) to select a subset of records to analyze and do the deletion within. Here comes the table and a few rows a example data:

id | role | unit ---------------- 946| 1001 | 1 946| 1002 | 1 946| 1003 | 1 946| 1001 | 2 946| 1002 | 2 900| 1001 | 3 900| 1002 | 3 900| 1001 | 3 

An analysis of unit 1 and 2 should identify two rows to delete 946/1001 and 946/1002. It doesn't matter if I delete the rows labeled unit 1 or 2. In a subsequent step I will update all records labeled unit=2 to unit=1.

I have a select statement capable to identify the rows to delete:

SELECT * FROM (SELECT unit, id, role, ROW_NUMBER() OVER ( PARTITION BY id, role ORDER BY id, role ) row_num FROM thetable WHERE unit IN (1,2) ) as x WHERE row_num > 1; 

This query will give this result:

id | role | unit ---------------- 946| 1001 | 2 946| 1002 | 2 

Now I would like to combine this with DELETE to delete the identified records. I have come pretty close (I believe) with this statement:

DELETE FROM thetable tp1 WHERE EXISTS (SELECT unit, id, role, ROW_NUMBER() OVER ( PARTITION BY id, role ORDER BY id, role ) as row_num FROM thetable tp2 WHERE unit IN (1,2) AND tp1.unit=tp2.unit AND tp1.role=tp2.role AND tp1.id=tp2.id AND row_num >1 ) 

However, the row_num is not recognized as column. So how should I modify this statement to delete the two identified records?

1

2 Answers 2

2

It is very simple with EXISTS:

DELETE FROM thetable t WHERE t.unit IN (1,2) AND EXISTS ( SELECT 1 FROM thetable WHERE (id, role) = (t.id, t.role) AND unit < t.unit ) 

See the demo.
Results:

> id | role | unit > --: | ---: | ---: > 946 | 1001 | 1 > 946 | 1002 | 1 > 946 | 1003 | 1 > 900 | 1001 | 3 > 900 | 1002 | 3 > 900 | 1001 | 3 
Sign up to request clarification or add additional context in comments.

1 Comment

Very nice proposal. And it works! Thanks for setting the demo case. That´s really helpful.
1

You could phrase this as:

delete from thetable t where t.unit > ( select min(t1.unit) from thetable t1 where t1.id = t.id and t1.role = t.role ) 

This seems like a simple way to solve the assignment, basically phrasing as: delete rows for which another row exists with a smaller unit and the same id and role.

As for the query you wanted to write, using row_number(), I think that would be:

delete from thetable t using ( select t.*, row_number() over(partition by id, role order by unit) rn from mytable t ) t1 where t1.id = t.id and t1.role = t.role and t1.unit = t.unit and t1.rn > 1 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.