2

I have a table which looks like below:

CREATE TABLE time_records ( id uuid NOT NULL, employee_id uuid NOT NULL, starttime timestampt NOT NULL, endtime timestampt NOT NULL ) 

There will be overlap in times between the records for the same employee_id:

id employee_id starttime endtime
1 1 '2023-09-01 07:00:00' '2023-09-01 09:15:00'
2 1 '2023-09-01 07:00:00' '2023-09-01 15:00:00'
3 1 '2023-09-01 07:00:00' '2023-09-01 15:00:00'
4 1 '2023-09-01 14:00:00' '2023-09-01 15:00:00'
5 1 '2023-09-01 23:45:00' '2023-09-01 23:59:00'
6 1 '2023-09-01 23:45:00' '2023-09-01 23:59:00'

What I'm trying to do is get the time ranges within all of these times:

employee_id starttime endtime ids
1 '2023-09-01 07:00:00' '2023-09-01 15:00:00' [1,2,3,4]
1 '2023-09-01 23:45:00' '2023-09-01 23:29:00' [5,6]

I can get this to work if there's only one set of overlapping time within a day using max/min for the start and end times, but I can't seem to make it work when there are multiple sets of overlapping time in a day:

select timea.employee_id, min(timea.starttime) starttime, max(timea.endtime) endtime, array_agg(timea.id) ids from time_records timea inner join time_records timea2 on timea.employee_id = timea2.employee_id and tsrange(timea2.starttime, timea2.endtime, '[]') && tsrange(timea.starttime, timea.endtime, '[]') and timea.id != timea2.id group by timea.employee_id; 

With results:

employee_id starttime endtime ids
1 '2023-09-01 07:00:00' '2023-09-01 23:59:00' [1,2,3,4,5,6]
1
  • 2
    Why the outdated Postgres 9.6? Consider postgresql.org/support/versioning Also: timestamp or timestamptz. There is no middle ground ("timestampt"). Typically, it should be timestamptz. Commented Sep 8, 2023 at 21:19

2 Answers 2

2

make it work when there are multiple sets of overlapping time in a day

Plain aggregation with min() and max() cannot solve this. Which rows eventually form a group only becomes evident after merging ranges.

The aggregate function range_agg() makes the task a whole lot simpler. It was added with Postgres 14. Just computing merged ranges is very simple now:

SELECT unnest(range_agg(tsrange(starttime, endtime, '[]'))) AS merged_range FROM time_records; 

To also get an array of involved IDs, we need to do more. One way is to join back to the underlying table and then aggregate once more (groups are identified by the merged ranges now):

SELECT employee_id, lower(merged) AS starttime, upper(merged) AS endtime , array_agg(t.id) AS ids FROM ( SELECT employee_id, unnest(range_agg(tsrange(starttime, endtime, '[]'))) AS merged FROM time_records GROUP BY employee_id ) r JOIN time_records t USING (employee_id) WHERE r.merged @> t.starttime GROUP BY r.employee_id, r.merged ORDER BY r.employee_id, r.merged; 

Another way with a LATERAL subquery:

SELECT r.employee_id, lower(r.merged) AS starttime, upper(r.merged) AS endtime, i.ids FROM ( SELECT employee_id, unnest(range_agg(tsrange(starttime, endtime, '[]'))) AS merged FROM time_records GROUP BY employee_id ) r CROSS JOIN LATERAL ( SELECT ARRAY ( SELECT t.id FROM time_records t WHERE t.employee_id = r.employee_id AND t.starttime <@ r.merged ORDER BY t.id ) ) i (ids) ORDER BY r.employee_id, r.merged; 

fiddle

Related:

Not sure if either query is also faster than my custom function below, because that only iterates over the whole table once.

Postgres 9.6

While stuck on your outdated version, create a custom set-returning function (once):

CREATE OR REPLACE FUNCTION public.f_merge_ranges() RETURNS TABLE ( employee_id int , starttime timestamp , endtime timestamp , ids int[] ) LANGUAGE plpgsql AS $func$ DECLARE r record; -- current row BEGIN FOR r IN SELECT t.id, t.employee_id, t.starttime, t.endtime FROM time_records t ORDER BY t.employee_id, t.starttime, t.endtime DESC, t.id -- better take longer range first LOOP IF r.employee_id = employee_id THEN -- works for null in first iteration IF r.starttime > endtime THEN RETURN NEXT; starttime := r.starttime; endtime := r.endtime; ids := ARRAY[r.id]; ELSE ids := ids || r.id; IF r.endtime > endtime THEN endtime := r.endtime; END IF; END IF; ELSE IF employee_id IS NOT NULL THEN -- catch first iteration RETURN NEXT; END IF; employee_id := r.employee_id; starttime := r.starttime; endtime := r.endtime; ids := ARRAY[r.id]; END IF; END LOOP; -- return last row (if any) IF FOUND THEN RETURN NEXT; END IF; END $func$; 

Call:

SELECT * FROM public.f_merge_ranges(); 

fiddle

Unlike the queries above, the array in ids is unsorted. You need to do more if you need that.

Related:

Sign up to request clarification or add additional context in comments.

1 Comment

This is great thank you! I'm unfortunately a long way off from getting postgres updated to take advantage of that much cleaner looking solution.
0

Using a cte to produce the maximum endtime for each starttime, the largest overlapping intervals can then be found and the original time_records table joined back onto it with an aggregation:

with cte as ( select t.employee_id, t.starttime, max(t.endtime) m from time_records t group by t.employee_id, t.starttime ) select c.employee_id, c.starttime, c.m, array_agg(t.id) from cte c join time_records t on c.starttime <= t.starttime and t.endtime <= c.m where not exists (select 1 from cte t1 where t1.employee_id = c.employee_id and t1.starttime < c.starttime and c.m <= t1.m) group by c.employee_id, c.starttime, c.m 

See fiddle

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.