Query to find unique time periods from list of time ranges

Question

I have a table which looks like below:

CREATE TABLE time_records ( id uuid NOT NULL, employee_id uuid NOT NULL, starttime timestampt NOT NULL, endtime timestampt NOT NULL )

There will be overlap in times between the records for the same employee_id:

id	employee_id	starttime	endtime
1	1	'2023-09-01 07:00:00'	'2023-09-01 09:15:00'
2	1	'2023-09-01 07:00:00'	'2023-09-01 15:00:00'
3	1	'2023-09-01 07:00:00'	'2023-09-01 15:00:00'
4	1	'2023-09-01 14:00:00'	'2023-09-01 15:00:00'
5	1	'2023-09-01 23:45:00'	'2023-09-01 23:59:00'
6	1	'2023-09-01 23:45:00'	'2023-09-01 23:59:00'

What I'm trying to do is get the time ranges within all of these times:

employee_id	starttime	endtime	ids
1	'2023-09-01 07:00:00'	'2023-09-01 15:00:00'	[1,2,3,4]
1	'2023-09-01 23:45:00'	'2023-09-01 23:29:00'	[5,6]

I can get this to work if there's only one set of overlapping time within a day using max/min for the start and end times, but I can't seem to make it work when there are multiple sets of overlapping time in a day:

select timea.employee_id, min(timea.starttime) starttime, max(timea.endtime) endtime, array_agg(timea.id) ids from time_records timea inner join time_records timea2 on timea.employee_id = timea2.employee_id and tsrange(timea2.starttime, timea2.endtime, '[]') && tsrange(timea.starttime, timea.endtime, '[]') and timea.id != timea2.id group by timea.employee_id;

With results:

employee_id	starttime	endtime	ids
1	'2023-09-01 07:00:00'	'2023-09-01 23:59:00'	[1,2,3,4,5,6]

Why the outdated Postgres 9.6? Consider postgresql.org/support/versioning Also: timestamp or timestamptz. There is no middle ground ("timestampt"). Typically, it should be timestamptz. — Erwin Brandstetter
– Erwin Brandstetter, Commented Sep 8, 2023 at 21:19

Erwin Brandstetter · Accepted Answer · 2023-09-09 00:08:00Z

make it work when there are multiple sets of overlapping time in a day

Plain aggregation with min() and max() cannot solve this. Which rows eventually form a group only becomes evident after merging ranges.

The aggregate function range_agg() makes the task a whole lot simpler. It was added with Postgres 14. Just computing merged ranges is very simple now:

SELECT unnest(range_agg(tsrange(starttime, endtime, '[]'))) AS merged_range FROM time_records;

To also get an array of involved IDs, we need to do more. One way is to join back to the underlying table and then aggregate once more (groups are identified by the merged ranges now):

SELECT employee_id, lower(merged) AS starttime, upper(merged) AS endtime , array_agg(t.id) AS ids FROM ( SELECT employee_id, unnest(range_agg(tsrange(starttime, endtime, '[]'))) AS merged FROM time_records GROUP BY employee_id ) r JOIN time_records t USING (employee_id) WHERE r.merged @> t.starttime GROUP BY r.employee_id, r.merged ORDER BY r.employee_id, r.merged;

Another way with a LATERAL subquery:

SELECT r.employee_id, lower(r.merged) AS starttime, upper(r.merged) AS endtime, i.ids FROM ( SELECT employee_id, unnest(range_agg(tsrange(starttime, endtime, '[]'))) AS merged FROM time_records GROUP BY employee_id ) r CROSS JOIN LATERAL ( SELECT ARRAY ( SELECT t.id FROM time_records t WHERE t.employee_id = r.employee_id AND t.starttime <@ r.merged ORDER BY t.id ) ) i (ids) ORDER BY r.employee_id, r.merged;

fiddle

Remove duplicates from array_agg, where elements are also arrays

Not sure if either query is also faster than my custom function below, because that only iterates over the whole table once.

Postgres 9.6

While stuck on your outdated version, create a custom set-returning function (once):

CREATE OR REPLACE FUNCTION public.f_merge_ranges() RETURNS TABLE ( employee_id int , starttime timestamp , endtime timestamp , ids int[] ) LANGUAGE plpgsql AS $func$ DECLARE r record; -- current row BEGIN FOR r IN SELECT t.id, t.employee_id, t.starttime, t.endtime FROM time_records t ORDER BY t.employee_id, t.starttime, t.endtime DESC, t.id -- better take longer range first LOOP IF r.employee_id = employee_id THEN -- works for null in first iteration IF r.starttime > endtime THEN RETURN NEXT; starttime := r.starttime; endtime := r.endtime; ids := ARRAY[r.id]; ELSE ids := ids || r.id; IF r.endtime > endtime THEN endtime := r.endtime; END IF; END IF; ELSE IF employee_id IS NOT NULL THEN -- catch first iteration RETURN NEXT; END IF; employee_id := r.employee_id; starttime := r.starttime; endtime := r.endtime; ids := ARRAY[r.id]; END IF; END LOOP; -- return last row (if any) IF FOUND THEN RETURN NEXT; END IF; END $func$;

Call:

SELECT * FROM public.f_merge_ranges();

fiddle

Unlike the queries above, the array in ids is unsorted. You need to do more if you need that.

Calculate Difference in Overlapping Time in PostgreSQL / SSRS

This is great thank you! I'm unfortunately a long way off from getting postgres updated to take advantage of that much cleaner looking solution.

Ajax1234 · Accepted Answer · 2023-09-08 22:11:30Z

Using a cte to produce the maximum endtime for each starttime, the largest overlapping intervals can then be found and the original time_records table joined back onto it with an aggregation:

with cte as ( select t.employee_id, t.starttime, max(t.endtime) m from time_records t group by t.employee_id, t.starttime ) select c.employee_id, c.starttime, c.m, array_agg(t.id) from cte c join time_records t on c.starttime <= t.starttime and t.endtime <= c.m where not exists (select 1 from cte t1 where t1.employee_id = c.employee_id and t1.starttime < c.starttime and c.m <= t1.m) group by c.employee_id, c.starttime, c.m

See fiddle

Collectives™ on Stack Overflow

Query to find unique time periods from list of time ranges

2 Answers 2

Postgres 9.6

1 Comment

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Postgres 9.6

1 Comment

Comments

Linked

Related