Need to split datetime ranges by each day

Question

I have a table that needs to be split on the basis of datetime

Input Table

ID| Start | End -------------------------------------------- A | 2019-03-04 23:18:04| 2019-03-04 23:21:25 -------------------------------------------- A | 2019-03-04 23:45:05| 2019-03-05 00:15:14 --------------------------------------------

Required Output

ID| Start | End -------------------------------------------- A | 2019-03-04 23:18:04| 2019-03-04 23:21:25 -------------------------------------------- A | 2019-03-04 23:45:05| 2019-03-04 23:59:59 -------------------------------------------- A | 2019-03-05 00:00:00| 2019-03-05 00:15:14 --------------------------------------------

Thanks!!

Is it possible that the range is above more than 2 dates? maybe 2019-03-05 - 2019-03-08? — S-Man
– S-Man, Commented Jul 22, 2019 at 9:22
What if the interval crosses more than one day? e.g. (start, end) = (2019-01-01 00:00:00, 2019-01-05 00:00:00)? — user330315
– user330315, Commented Jul 22, 2019 at 9:24
I checked the table doesn't seem to have a range over two days. You make a good point, it should deal with such occurrences if they do appear in the future. — ExelIQ
– ExelIQ, Commented Jul 22, 2019 at 9:46

S-Man · Accepted Answer · 2019-07-22 09:46:56Z

demo:db<>fiddle

This works even when range crosses more than one day

WITH cte AS ( SELECT id, start_time, end_time, gs, lag(gs) over (PARTITION BY id ORDER BY gs) -- 2 FROM a LEFT JOIN LATERAL generate_series(start_time::date + 1, end_time::date, interval '1 day') gs --1 ON TRUE ) SELECT -- 3 id, COALESCE(lag, start_time) AS start_time, gs - interval '1 second' FROM cte WHERE gs IS NOT NULL UNION SELECT DISTINCT ON (id) -- 4 id, CASE WHEN start_time::date = end_time::date THEN start_time ELSE end_time::date END, -- 5 end_time FROM cte

CTE: the generate_series function generates one row per day new day. So, there is no value if there is no date change
CTE: the lag() window function allows to move the current date value into the next row (the current end is the next start)
With this data set you can calculate the new start and end values. If there is no gs value: There is no date change. This ignored at this point. For all cases with date changes: If there is no lag value, it is the beginning (so it cannot got a previous value). In this case, the normal start_time is taken, otherwise it is a new day which takes the date break time. The end_time is taken with the last second of the day (interval - '1 second')
The second part: Because of the date breaks there is always one additional record which need to be unioned. The last record is from the beginning of the end_time (so cast to date). The CASE clause combines this step with the case of no date change which has been ignored so far. So if start_time and end_time are at the same date, here the original start_time is taken.

Im using workbench/j(build 124) and connected to a Postgres DB through AWS.
@JASONMENDES As all answers solve the problem that you mentioned in the question (which not included the Redshift problem at the beginning), it would be great if you upvote all of them! Upvotes honor the time and work the repliers invested into your question. If one one of them completely solves your problem, don't forget to accept this as well!
@JASONMENDES: you should have mentioned from the start that you are using Amazon Redshift, not Postgres. Although they share some ancient roots they are quite different

mkRabbani · Accepted Answer · 2019-07-22 11:01:50Z

Try this below code. This will only work if the start and end date fall in two consecutive day. Not if the start and end date difference is more than 1 day.

MSSQL:

SELECT ID,[Start],[End] FROM Input_Table A WHERE DATEDIFF(DD,[Start],[End]) = 0 UNION ALL SELECT ID,[Start], CAST(CAST(CAST([Start] AS DATE) AS VARCHAR(MAX)) +' 23:59:59' AS DATETIME) FROM Input_Table A WHERE DATEDIFF(DD,[Start],[End]) > 0 UNION ALL SELECT ID,CAST(CAST([End] AS DATE) AS DATETIME),[End] FROM Input_Table A WHERE DATEDIFF(DD,[Start],[End]) > 0 ORDER BY 1,2,3

PostgreSQL:

SELECT ID, TO_TIMESTAMP(startDate,'YYYY-MM-DD HH24:MI:SS'), TO_TIMESTAMP(endDate, 'YYYY-MM-DD HH24:MI:SS') FROM mytemp A WHERE DATE_PART('day', endDate::date) - DATE_PART('day',startDate::date) = 0 UNION ALL SELECT ID, TO_TIMESTAMP(startDate,'YYYY-MM-DD HH24:MI:SS'), TO_TIMESTAMP(CONCAT(CAST(CAST (startDate AS DATE) AS VARCHAR) , ' 23:59:59') , 'YYYY-MM-DD HH24:MI:SS') FROM mytemp A WHERE DATE_PART('day', endDate::date) - DATE_PART('day',startDate::date) > 0 UNION ALL SELECT ID, TO_TIMESTAMP(CAST(CAST (endDate AS DATE) AS VARCHAR) , 'YYYY-MM-DD HH24:MI:SS') , TO_TIMESTAMP(endDate,'YYYY-MM-DD HH24:MI:SS') FROM mytemp A WHERE DATE_PART('day', endDate::date) - DATE_PART('day',startDate::date) > 0;

PostgreSQL Demo Here

All those casts and to_timestamp() calls seem rather useless if those two columns are defined as timestamp (and who in their right mind would store timestamps in a varchar column). Also DATE_PART('day', endDate::date) - DATE_PART('day',startDate::date) = 0 can be simplified to enddate::date - starddate::date) = 0 or simply enddate::date = startdate::date

Gordon Linoff · Accepted Answer · 2019-07-22 11:19:17Z

Unfortunately, Redshift doesn't have a convenient way to generate a series of numbers. If you table is big enough, you can use it to generate numbers. "Big enough" means that the number of rows is greater than the longest span. Perhaps another table would work, if not this one.

Once you have that, you can use this logic:

with n as ( select row_number() over () - 1 as n from t ) select t.id, greatest(t.s, date_trunc('day', t.s) + n.n * interval '1 day') as s, least(t.e, date_trunc('day', t.s) + (n.n + 1) * interval '1 day' - interval '1 second') as e from t join n on t.e >= date_trunc('day', t.s) + n.n * interval '1 day';

Here is a db<>fiddle. It uses an old version of Postgres, but not quite old enough for Redshift.

Could you please explain the idea? The query seems incorrect to me, produces wrong result for expanding range with id=C in dbfiddle.uk/… (example from my answer). This is why I ended with recursive CTE, which is also standard-compliant solution.
@TomášZáluský . . . There are not enough rows in the table to generate enough numbers for that example. That is what the first paragraph is explaining. A recursive CTE is nice, but Redshift does not support that functionality.

Tomáš Záluský · Accepted Answer · 2019-07-22 13:26:52Z

Simulate loop for interval generation using recursive CTE, i.e. take range from start to midnight in seed row, take another day in subsequent rows etc.

with recursive input as ( select 'A' as id, timestamp '2019-03-04 23:18:04' as s, timestamp '2019-03-04 23:21:25' as e union select 'A' as id, timestamp '2019-03-04 23:45:05' as s, timestamp '2019-03-05 00:15:14' as e union select 'B' as id, timestamp '2019-03-06 23:45:05' as s, timestamp '2019-03-08 00:15:14' as e union select 'C' as id, timestamp '2019-03-10 23:45:05' as s, timestamp '2019-03-15 00:15:14' as e ), generate_id as ( select row_number() over () as unique_id, * from input ), rec (unique_id, id, s, e) as ( select unique_id, id, s, least(e, s::date::timestamp + interval '1 day') from generate_id seed union select remaining.unique_id, remaining.id, previous.e, least(remaining.e, previous.e::date::timestamp + interval '1 day') from rec as previous join generate_id remaining on previous.unique_id = remaining.unique_id and previous.e < remaining.e ) select id, s, e from rec order by id,s,e

Note:

your id column appears not to be unique, so I added custom unique_id column. If id was unique, CTE generate_id was unnecessary. Uniqueness is unavoidable for recursive query to work.
close-open range is better for representation of such data, rather than close-close range. So end time in my query returns 00:00:00, not 23:59:59. If it's not suitable for you, modify query as an exercise.

UPDATE: query works on Postgres. OP originally tagged question postgres, then changed tag to redshift.

Collectives™ on Stack Overflow

Need to split datetime ranges by each day

Input Table

4 Answers 4

9 Comments

4 Comments

2 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Input Table

4 Answers 4

9 Comments

4 Comments

2 Comments

Comments

Linked

Related