I'm attempting to write some logic with Oracle SQL, but am having a hard time getting it right. First, I need my script to identify duplicated items. Then determine, of the duplicated items, which is the most recent. The database I'm working with suffers from a high number of manual data inserts outside of the application. This causes items to appear out of sequence when using the ID number (I am using the start date and the ID number as a way to measure sequence, as the table has no other method to do so).
If I need to determine the most current role for Employee 12311, how would I do so?
Here is what I have so far:
Table
ID | EMPLOYEE | ROLE | STARTDATE | -----|----------|------------|-----------------------| 3432 | 12311 | Supervisor | 2016-07-12T00:00:00Z | 3421 | 12311 | Analyst | 2016-07-12T00:00:00Z | 4321 | 12311 | Help Desk | 2014-05-12T00:00:00Z | 5432 | 23432 | Manager | 2012-11-02T00:00:00Z | 3452 | 23432 | Associate | 2011-04-23T00:00:00Z | 7652 | 54332 | Analyst | 2015-10-15T00:00:00Z | 5691 | 54332 | Assistant | 2013-10-15T00:00:00Z | Code
SELECT "ID", "EMPLOYEE", "ROLE", "STARTDATE", (CASE WHEN RANK() OVER (PARTITION BY "EMPLOYEE" ORDER BY "STARTDATE" DESC) = 1 THEN 1 ELSE 0 END) "MAX DATE", (CASE WHEN RANK() OVER (PARTITION BY "EMPLOYEE" ORDER BY "ID" DESC) = 1 THEN 1 ELSE 0 END) "MAX ID" FROM ( SELECT DISTINCT EMPLOYEE "E.EMPLOYEE", E.ID "ID", LR.DESCRIPTION "ROLE", ROLE_START_DATE "STARTDATE" FROM EMPLOYEES E JOIN ROLES R ON E.EMPLOYEE_ID = R.EMPLOYEE_ID JOIN LU_ROLES LR ON R.ROLE_ID = LR.ROLE_ID WHERE ROLE_START_DATE <= DATE '2017-12-03') ORDER BY 2 Results
ID | EMPLOYEE | ROLE | STARTDATE | MAX Date | Max ID -----|----------|------------|----------------------------------|-------- 3432 | 12311 | Supervisor | 2016-07-12T00:00:00Z | 1 | 0 3421 | 12311 | Analyst | 2016-07-12T00:00:00Z | 1 | 0 4321 | 12311 | Help Desk | 2014-05-12T00:00:00Z | 0 | 1 5432 | 23432 | Manager | 2012-11-02T00:00:00Z | 1 | 1 3452 | 23432 | Associate | 2011-04-23T00:00:00Z | 0 | 0 7652 | 54332 | Analyst | 2015-10-15T00:00:00Z | 1 | 1 5691 | 54332 | Assistant | 2013-10-15T00:00:00Z | 0 | 0 Rather than looking at all of the records for each employee and determining the most recent record, I'd like for the script to only use the duplicated STARTDATES.
Basically, if the most recent STARTDATE is duplicated, identify which ID is the highest.
So it should look like this:
ID | EMPLOYEE | ROLE | STARTDATE | MAX Date | Max ID -----|----------|------------|----------------------------------|-------- 3432 | 12311 | Supervisor | 2016-07-12T00:00:00Z | 1 | 1 3421 | 12311 | Analyst | 2016-07-12T00:00:00Z | 1 | 0 4321 | 12311 | Help Desk | 2014-05-12T00:00:00Z | 0 | 0 5432 | 23432 | Manager | 2012-11-02T00:00:00Z | 1 | 1 3452 | 23432 | Associate | 2011-04-23T00:00:00Z | 0 | 0 7652 | 54332 | Analyst | 2015-10-15T00:00:00Z | 1 | 1 5691 | 54332 | Assistant | 2013-10-15T00:00:00Z | 0 | 0 I am completely open to better ways of doing this. Any help you could provide would be much appreciated!
EDIT WITH SOLUTION:
Thanks to @Littlefoot for this. I was able to modify my script to include the following:
SELECT "ID", "EMPLOYEE", "ROLE", "STARTDATE", ROW_NUMBER() OVER (PARTITION BY "EMPLOYEE" ORDER BY "STARTDATE" DESC, "ID" DESC) RN FROM ( SELECT DISTINCT EMPLOYEE "E.EMPLOYEE", E.ID "ID", LR.DESCRIPTION "ROLE", ROLE_START_DATE "STARTDATE" FROM EMPLOYEES E JOIN ROLES R ON E.EMPLOYEE_ID = R.EMPLOYEE_ID JOIN LU_ROLES LR ON R.ROLE_ID = LR.ROLE_ID WHERE ROLE_START_DATE <= DATE '2017-12-03') ORDER BY 2 RESULTS:
ID | EMPLOYEE | ROLE | STARTDATE | RN | -----|----------|------------|----------------------------------| 3432 | 12311 | Supervisor | 2016-07-12T00:00:00Z | 1 | 3421 | 12311 | Analyst | 2016-07-12T00:00:00Z | 2 | 4321 | 12311 | Help Desk | 2014-05-12T00:00:00Z | 3 | 5432 | 23432 | Manager | 2012-11-02T00:00:00Z | 1 | 3452 | 23432 | Associate | 2011-04-23T00:00:00Z | 2 | 7652 | 54332 | Analyst | 2015-10-15T00:00:00Z | 1 | 5691 | 54332 | Assistant | 2013-10-15T00:00:00Z | 2 | I then filtered my results by RN=1