SQL query to find gaps within a column of dates

Question

I have a table with status and date for every day and I'm trying to find out when the statuses change and if there's gaps within each status change / how many days were of a certain status.

+------+----------------+--------+ | user | date | status | +----- +----------------+--------+ | 1 | 12/01/2017 | open | | 1 | 12/02/2017 | open | | 1 | 12/03/2017 | open | | 1 | 12/04/2017 | closed | | 1 | 12/05/2017 | closed | | 1 | 12/06/2017 | open | | 1 | 12/07/2017 | open | +------+----------------+--------+

Expected output:

+------+------------+----------------+-------- ----+------------+ | user | status | days_in_status | min | max | +----- +------------+----------------+-------------+------------+ | 1 | open | 3 | 12/01/2017 | 12/03/2017 | | 1 | closed | 2 | 12/04/2017 | 12/05/2017 | | 1 | open | 2 | 12/06/2017 | 12/07/2017 | +------+------ -----+----------------+-------------+-- ---------+

Gordon Linoff · Accepted Answer · 2021-05-12 17:33:52Z

7

This is a type of gaps-and-islands problem. In this case, subtracting a sequential number from each day is probably the simplest solution for identifying the "islands":

select user, status, count(*) as num_days, min(date), max(date) from (select t.*, row_number() over (partition by user, status order by date) as seqnum from t ) t group by user, status, date - seqnum * interval '1 day'

answered May 12, 2021 at 17:33

Gordon Linoff

1.3m62 gold badges706 silver badges857 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

ptlx24 Over a year ago

This worked! Can you explain what the GROUP BY date - seqnum * interval '1 day' is doing? `

eshirvana Over a year ago

great answer , avoiding an extra window function! @ptlx24 it''s basically producing a date fro each batch of userid, status , and since second batch of userid =1 and status = open have a gap in the "date" due to "closed" status in middle, that statement produce different date than first batch of "open" status , so its acting like a separator that you can group by

ptlx24 Over a year ago

wow thank you so much! this is helpful! :)

Gordon Linoff Over a year ago

@ptlx24 . . . It is subtracting seqnum days from a date. That is a constant value when adjacent rows have an incrementing number of days.

Eric Leschinski · Accepted Answer · 2022-07-07 14:00:31Z

SQL query to find the width of date gaps within dates

Fully contained example borrowing from Gordon's answer above that shows island widths as well as island gaps, copy and paste into postgresql and run it.

drop table if exists foobar; CREATE TABLE foobar( tick text, date_val date ); insert into foobar values('XYZ', '2021-01-03'); --island 1 has width 2 insert into foobar values('XYZ', '2021-01-04'); --island 1 insert into foobar values('XYZ', '2021-05-09'); --island 2 has width 3 insert into foobar values('XYZ', '2021-05-10'); --island 2 insert into foobar values('XYZ', '2021-05-11'); --island 2 insert into foobar values('XYZ', '2021-07-07'); --island 3 has width 4 insert into foobar values('XYZ', '2021-07-08'); --island 3 insert into foobar values('XYZ', '2021-07-09'); --island 3 insert into foobar values('XYZ', '2021-07-10'); --island 3 insert into foobar values('XYZ', '2022-10-10'); --island 4 has width 1 select tick, island_width, min_val, max_val, min_val - lag(max_val) over (order by max_val) as gap_width from ( select tick, count(*) as island_width, min(date_val) min_val, max(date_val) max_val from ( select t.*, row_number() over ( partition by tick order by date_val ) as seqnum from foobar t where tick = 'XYZ' ) t group by tick, date_val - seqnum * interval '1 day' ) t2 order by max_val desc

Prints:

┌──────┬──────────────┬────────────┬────────────┬───────────┐ │ tick │ island_width │ min_val │ max_val │ gap_width │ ├──────┼──────────────┼────────────┼────────────┼───────────┤ │ XYZ │ 1 │ 2022-10-10 │ 2022-10-10 │ 457 │ │ XYZ │ 4 │ 2021-07-07 │ 2021-07-10 │ 57 │ │ XYZ │ 3 │ 2021-05-09 │ 2021-05-11 │ 125 │ │ XYZ │ 2 │ 2021-01-03 │ 2021-01-04 │ ¤ │ └──────┴──────────────┴────────────┴────────────┴───────────┘

Details: In this data there are 4 islands. There are 3 gaps between the 4 islands. The last island has a gap. The first island has a null gap.

The SQL finds the days between every min_date and max_date by subtracting a day and decrementing rownumber until all dates are accounted for. partition somehow groups the islands by date and a 2 day moving window using over and lag with default lookbehind 1. Black Box Demon magic aside it goes fast enough and all unit tests pass.

With help from walkthrough here: https://bertwagner.com/posts/gaps-and-islands

Same as above but make gaps count hours rather than days:

drop table if exists foobar; CREATE TABLE foobar( tick text, date_val timestamp); insert into foobar values('XYZ', '2021-01-03 12:00:00'); insert into foobar values('XYZ', '2021-01-03 11:00:00'); insert into foobar values('XYZ', '2021-01-03 10:00:00'); insert into foobar values('XYZ', '2021-01-03 08:00:00'); --2 hr gap insert into foobar values('XYZ', '2021-01-03 07:00:00'); insert into foobar values('XYZ', '2021-01-03 06:00:00'); insert into foobar values('XYZ', '2021-01-03 03:00:00'); --3 hr gap select tick, island_width, min_val, max_val, min_val - lag(max_val) over (order by max_val) as gap_width from ( select tick, count(*) as island_width, min(date_val) min_val, max(date_val) max_val from ( select t.*, row_number() over ( partition by tick order by date_val ) as seqnum from foobar t where tick = 'XYZ' ) t group by tick, date_val - seqnum * interval '1 hour' ) t2 order by max_val desc

Prints:

┌──────┬──────────────┬─────────────────────┬─────────────────────┬───────────┐ │ tick │ island_width │ min_val │ max_val │ gap_width │ ├──────┼──────────────┼─────────────────────┼─────────────────────┼───────────┤ │ XYZ │ 3 │ 2021-01-03 10:00:00 │ 2021-01-03 12:00:00 │ 02:00:00 │ │ XYZ │ 3 │ 2021-01-03 06:00:00 │ 2021-01-03 08:00:00 │ 03:00:00 │ │ XYZ │ 1 │ 2021-01-03 03:00:00 │ 2021-01-03 03:00:00 │ ¤ │ └──────┴──────────────┴─────────────────────┴─────────────────────┴───────────┘

Erwin Brandstetter · Accepted Answer · 2021-05-12 18:05:23Z

In Postgres you can subtract integer from date directly:

SELECT "user", status , count(*) AS days_in_status, min(date) AS min, max(date) AS max FROM ( SELECT "user", status, date , date - row_number() OVER (PARTITION BY "user", status ORDER BY date)::int AS grp_date FROM tbl ) t GROUP BY "user", status, grp_date ORDER BY "user", min;

db<>fiddle here

See:

Aside: user is a reserved word, you would avoid it as actual column name.

Collectives™ on Stack Overflow

SQL query to find gaps within a column of dates

3 Answers 3

4 Comments

SQL query to find the width of date gaps within dates

Same as above but make gaps count hours rather than days:

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

SQL query to find the width of date gaps within dates

Same as above but make gaps count hours rather than days:

Comments

Comments

Linked

Related