How to get a date interval with condition

Question

How to get a continuous date interval from rows fulfilling specific condition?

I have a table of employees states with 2 types of user_position. The interval is continuous if the next higher date_position per user_id has the same user_id, the next day value and user_position didn't change. The user cannot have different user positions in one day.

Have a feeling it requires several cases, window functions and tsrange, but can't quite get the right result.

I would be really grateful if you could help me.

Fiddle:

http://sqlfiddle.com/#!17/ba641/1/0

The result should look like this:

user_id	user_position	position_start	position_end
1	1	01.01.2019	02.01.2019
1	2	03.01.2019	04.01.2019
1	1	05.01.2019	06.01.2019
2	1	01.01.2019	03.01.2019
2	2	04.01.2019	05.01.2019
2	2	08.01.2019	08.01.2019
2	2	10.01.2019	10.01.2019

Create/insert query for the source data:

CREATE TABLE IF NOT EXISTS users_position ( id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY, user_id integer, user_position integer, date_position date); INSERT INTO users_position (user_id, user_position, date_position) VALUES (1, 1, '2019-01-01'), (1, 1, '2019-01-02'), (1, 2, '2019-01-03'), (1, 2, '2019-01-04'), (1, 1, '2019-01-05'), (1, 1, '2019-01-06'), (2, 1, '2019-01-01'), (2, 1, '2019-01-02'), (2, 1, '2019-01-03'), (2, 2, '2019-01-04'), (2, 2, '2019-01-05'), (2, 2, '2019-01-08'), (2, 2, '2019-01-10');

Please provide a test case with CREATE TABLE and INSERT script. Consider adding a fiddle. — Erwin Brandstetter
– Erwin Brandstetter, Commented Jul 30, 2022 at 23:05
Please define "next row". Is that the next higher id or the next higher date_position per user_id? (Generated IDENTITY values are not guaranteed to be in sequence under concurrent write load.) — Erwin Brandstetter
– Erwin Brandstetter, Commented Jul 31, 2022 at 12:29

Erwin Brandstetter · Accepted Answer · 2022-07-31 12:53:18Z

SELECT user_id, user_position , min(date_position) AS position_start , max(date_position) AS position_end FROM ( SELECT user_id, user_position,date_position , count(*) FILTER (WHERE (date_position = last_date + 1 AND user_position = last_pos) IS NOT TRUE) OVER (PARTITION BY user_id ORDER BY date_position) AS interval FROM ( SELECT user_id, user_position, date_position , lag(date_position) OVER w AS last_date , lag(user_position) OVER w AS last_pos FROM users_position WINDOW w AS (PARTITION BY user_id ORDER BY date_position) ) sub1 ) sub2 GROUP BY user_id, user_position, interval ORDER BY user_id, interval;

db<>fiddle here

Basically, this forms intervals by counting the number of disruptions in continuity. Whenever the "next" row per user_id is not what's expected, a new interval starts.

The WINDOW clause allows to specify a window frame once and use it repeatedly; no effect on performance.

last_date + 1 works while last_date is type date. See:

Is there a way to do date arithmetic on values of type DATE without result being of type TIMESTAMP?

Aggregate columns with additional (distinct) filters

Collectives™ on Stack Overflow

How to get a date interval with condition

1 Answer 1

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Linked

Related