2

I have a table that needs to be split on the basis of datetime

Input Table

ID| Start | End -------------------------------------------- A | 2019-03-04 23:18:04| 2019-03-04 23:21:25 -------------------------------------------- A | 2019-03-04 23:45:05| 2019-03-05 00:15:14 -------------------------------------------- 

Required Output

ID| Start | End -------------------------------------------- A | 2019-03-04 23:18:04| 2019-03-04 23:21:25 -------------------------------------------- A | 2019-03-04 23:45:05| 2019-03-04 23:59:59 -------------------------------------------- A | 2019-03-05 00:00:00| 2019-03-05 00:15:14 -------------------------------------------- 

Thanks!!

5
  • What about the time 23:59:59.001 to 23:59:59.999? Commented Jul 22, 2019 at 8:54
  • The data format only has a resolution up to the second. Commented Jul 22, 2019 at 8:58
  • Is it possible that the range is above more than 2 dates? maybe 2019-03-05 - 2019-03-08? Commented Jul 22, 2019 at 9:22
  • What if the interval crosses more than one day? e.g. (start, end) = (2019-01-01 00:00:00, 2019-01-05 00:00:00)? Commented Jul 22, 2019 at 9:24
  • I checked the table doesn't seem to have a range over two days. You make a good point, it should deal with such occurrences if they do appear in the future. Commented Jul 22, 2019 at 9:46

4 Answers 4

1

demo:db<>fiddle

This works even when range crosses more than one day

WITH cte AS ( SELECT id, start_time, end_time, gs, lag(gs) over (PARTITION BY id ORDER BY gs) -- 2 FROM a LEFT JOIN LATERAL generate_series(start_time::date + 1, end_time::date, interval '1 day') gs --1 ON TRUE ) SELECT -- 3 id, COALESCE(lag, start_time) AS start_time, gs - interval '1 second' FROM cte WHERE gs IS NOT NULL UNION SELECT DISTINCT ON (id) -- 4 id, CASE WHEN start_time::date = end_time::date THEN start_time ELSE end_time::date END, -- 5 end_time FROM cte 
  1. CTE: the generate_series function generates one row per day new day. So, there is no value if there is no date change
  2. CTE: the lag() window function allows to move the current date value into the next row (the current end is the next start)
  3. With this data set you can calculate the new start and end values. If there is no gs value: There is no date change. This ignored at this point. For all cases with date changes: If there is no lag value, it is the beginning (so it cannot got a previous value). In this case, the normal start_time is taken, otherwise it is a new day which takes the date break time. The end_time is taken with the last second of the day (interval - '1 second')
  4. The second part: Because of the date breaks there is always one additional record which need to be unioned. The last record is from the beginning of the end_time (so cast to date). The CASE clause combines this step with the case of no date change which has been ignored so far. So if start_time and end_time are at the same date, here the original start_time is taken.
Sign up to request clarification or add additional context in comments.

9 Comments

Hi, CTE doesnt seem to work on my IDE, which is Postgres.
What version are you using?
Im using workbench/j(build 124) and connected to a Postgres DB through AWS.
@JASONMENDES As all answers solve the problem that you mentioned in the question (which not included the Redshift problem at the beginning), it would be great if you upvote all of them! Upvotes honor the time and work the repliers invested into your question. If one one of them completely solves your problem, don't forget to accept this as well!
@JASONMENDES: you should have mentioned from the start that you are using Amazon Redshift, not Postgres. Although they share some ancient roots they are quite different
|
1

Try this below code. This will only work if the start and end date fall in two consecutive day. Not if the start and end date difference is more than 1 day.

MSSQL:

SELECT ID,[Start],[End] FROM Input_Table A WHERE DATEDIFF(DD,[Start],[End]) = 0 UNION ALL SELECT ID,[Start], CAST(CAST(CAST([Start] AS DATE) AS VARCHAR(MAX)) +' 23:59:59' AS DATETIME) FROM Input_Table A WHERE DATEDIFF(DD,[Start],[End]) > 0 UNION ALL SELECT ID,CAST(CAST([End] AS DATE) AS DATETIME),[End] FROM Input_Table A WHERE DATEDIFF(DD,[Start],[End]) > 0 ORDER BY 1,2,3 

PostgreSQL:

SELECT ID, TO_TIMESTAMP(startDate,'YYYY-MM-DD HH24:MI:SS'), TO_TIMESTAMP(endDate, 'YYYY-MM-DD HH24:MI:SS') FROM mytemp A WHERE DATE_PART('day', endDate::date) - DATE_PART('day',startDate::date) = 0 UNION ALL SELECT ID, TO_TIMESTAMP(startDate,'YYYY-MM-DD HH24:MI:SS'), TO_TIMESTAMP(CONCAT(CAST(CAST (startDate AS DATE) AS VARCHAR) , ' 23:59:59') , 'YYYY-MM-DD HH24:MI:SS') FROM mytemp A WHERE DATE_PART('day', endDate::date) - DATE_PART('day',startDate::date) > 0 UNION ALL SELECT ID, TO_TIMESTAMP(CAST(CAST (endDate AS DATE) AS VARCHAR) , 'YYYY-MM-DD HH24:MI:SS') , TO_TIMESTAMP(endDate,'YYYY-MM-DD HH24:MI:SS') FROM mytemp A WHERE DATE_PART('day', endDate::date) - DATE_PART('day',startDate::date) > 0; 

PostgreSQL Demo Here

4 Comments

All refer to Input_Table A, is there something im missing?
No, its all input from Input_Table A.
Postgres version added to the answer.
All those casts and to_timestamp() calls seem rather useless if those two columns are defined as timestamp (and who in their right mind would store timestamps in a varchar column). Also DATE_PART('day', endDate::date) - DATE_PART('day',startDate::date) = 0 can be simplified to enddate::date - starddate::date) = 0 or simply enddate::date = startdate::date
0

Unfortunately, Redshift doesn't have a convenient way to generate a series of numbers. If you table is big enough, you can use it to generate numbers. "Big enough" means that the number of rows is greater than the longest span. Perhaps another table would work, if not this one.

Once you have that, you can use this logic:

with n as ( select row_number() over () - 1 as n from t ) select t.id, greatest(t.s, date_trunc('day', t.s) + n.n * interval '1 day') as s, least(t.e, date_trunc('day', t.s) + (n.n + 1) * interval '1 day' - interval '1 second') as e from t join n on t.e >= date_trunc('day', t.s) + n.n * interval '1 day'; 

Here is a db<>fiddle. It uses an old version of Postgres, but not quite old enough for Redshift.

2 Comments

Could you please explain the idea? The query seems incorrect to me, produces wrong result for expanding range with id=C in dbfiddle.uk/… (example from my answer). This is why I ended with recursive CTE, which is also standard-compliant solution.
@TomášZáluský . . . There are not enough rows in the table to generate enough numbers for that example. That is what the first paragraph is explaining. A recursive CTE is nice, but Redshift does not support that functionality.
0

Simulate loop for interval generation using recursive CTE, i.e. take range from start to midnight in seed row, take another day in subsequent rows etc.

with recursive input as ( select 'A' as id, timestamp '2019-03-04 23:18:04' as s, timestamp '2019-03-04 23:21:25' as e union select 'A' as id, timestamp '2019-03-04 23:45:05' as s, timestamp '2019-03-05 00:15:14' as e union select 'B' as id, timestamp '2019-03-06 23:45:05' as s, timestamp '2019-03-08 00:15:14' as e union select 'C' as id, timestamp '2019-03-10 23:45:05' as s, timestamp '2019-03-15 00:15:14' as e ), generate_id as ( select row_number() over () as unique_id, * from input ), rec (unique_id, id, s, e) as ( select unique_id, id, s, least(e, s::date::timestamp + interval '1 day') from generate_id seed union select remaining.unique_id, remaining.id, previous.e, least(remaining.e, previous.e::date::timestamp + interval '1 day') from rec as previous join generate_id remaining on previous.unique_id = remaining.unique_id and previous.e < remaining.e ) select id, s, e from rec order by id,s,e 

Note:

  • your id column appears not to be unique, so I added custom unique_id column. If id was unique, CTE generate_id was unnecessary. Uniqueness is unavoidable for recursive query to work.
  • close-open range is better for representation of such data, rather than close-close range. So end time in my query returns 00:00:00, not 23:59:59. If it's not suitable for you, modify query as an exercise.

UPDATE: query works on Postgres. OP originally tagged question postgres, then changed tag to redshift.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.