3

I have a table with status and date for every day and I'm trying to find out when the statuses change and if there's gaps within each status change / how many days were of a certain status.

+------+----------------+--------+ | user | date | status | +----- +----------------+--------+ | 1 | 12/01/2017 | open | | 1 | 12/02/2017 | open | | 1 | 12/03/2017 | open | | 1 | 12/04/2017 | closed | | 1 | 12/05/2017 | closed | | 1 | 12/06/2017 | open | | 1 | 12/07/2017 | open | +------+----------------+--------+ 

Expected output:

+------+------------+----------------+-------- ----+------------+ | user | status | days_in_status | min | max | +----- +------------+----------------+-------------+------------+ | 1 | open | 3 | 12/01/2017 | 12/03/2017 | | 1 | closed | 2 | 12/04/2017 | 12/05/2017 | | 1 | open | 2 | 12/06/2017 | 12/07/2017 | +------+------ -----+----------------+-------------+-- ---------+ 

3 Answers 3

7

This is a type of gaps-and-islands problem. In this case, subtracting a sequential number from each day is probably the simplest solution for identifying the "islands":

select user, status, count(*) as num_days, min(date), max(date) from (select t.*, row_number() over (partition by user, status order by date) as seqnum from t ) t group by user, status, date - seqnum * interval '1 day' 
Sign up to request clarification or add additional context in comments.

4 Comments

This worked! Can you explain what the GROUP BY date - seqnum * interval '1 day' is doing? `
great answer , avoiding an extra window function! @ptlx24 it''s basically producing a date fro each batch of userid, status , and since second batch of userid =1 and status = open have a gap in the "date" due to "closed" status in middle, that statement produce different date than first batch of "open" status , so its acting like a separator that you can group by
wow thank you so much! this is helpful! :)
@ptlx24 . . . It is subtracting seqnum days from a date. That is a constant value when adjacent rows have an incrementing number of days.
1

SQL query to find the width of date gaps within dates

Fully contained example borrowing from Gordon's answer above that shows island widths as well as island gaps, copy and paste into postgresql and run it.

drop table if exists foobar; CREATE TABLE foobar( tick text, date_val date ); insert into foobar values('XYZ', '2021-01-03'); --island 1 has width 2 insert into foobar values('XYZ', '2021-01-04'); --island 1 insert into foobar values('XYZ', '2021-05-09'); --island 2 has width 3 insert into foobar values('XYZ', '2021-05-10'); --island 2 insert into foobar values('XYZ', '2021-05-11'); --island 2 insert into foobar values('XYZ', '2021-07-07'); --island 3 has width 4 insert into foobar values('XYZ', '2021-07-08'); --island 3 insert into foobar values('XYZ', '2021-07-09'); --island 3 insert into foobar values('XYZ', '2021-07-10'); --island 3 insert into foobar values('XYZ', '2022-10-10'); --island 4 has width 1 select tick, island_width, min_val, max_val, min_val - lag(max_val) over (order by max_val) as gap_width from ( select tick, count(*) as island_width, min(date_val) min_val, max(date_val) max_val from ( select t.*, row_number() over ( partition by tick order by date_val ) as seqnum from foobar t where tick = 'XYZ' ) t group by tick, date_val - seqnum * interval '1 day' ) t2 order by max_val desc 

Prints:

┌──────┬──────────────┬────────────┬────────────┬───────────┐ │ tick │ island_width │ min_val │ max_val │ gap_width │ ├──────┼──────────────┼────────────┼────────────┼───────────┤ │ XYZ │ 1 │ 2022-10-10 │ 2022-10-10 │ 457 │ │ XYZ │ 4 │ 2021-07-07 │ 2021-07-10 │ 57 │ │ XYZ │ 3 │ 2021-05-09 │ 2021-05-11 │ 125 │ │ XYZ │ 2 │ 2021-01-03 │ 2021-01-04 │ ¤ │ └──────┴──────────────┴────────────┴────────────┴───────────┘ 

Details: In this data there are 4 islands. There are 3 gaps between the 4 islands. The last island has a gap. The first island has a null gap.

The SQL finds the days between every min_date and max_date by subtracting a day and decrementing rownumber until all dates are accounted for. partition somehow groups the islands by date and a 2 day moving window using over and lag with default lookbehind 1. Black Box Demon magic aside it goes fast enough and all unit tests pass.

enter image description here

With help from walkthrough here: https://bertwagner.com/posts/gaps-and-islands

Same as above but make gaps count hours rather than days:

drop table if exists foobar; CREATE TABLE foobar( tick text, date_val timestamp); insert into foobar values('XYZ', '2021-01-03 12:00:00'); insert into foobar values('XYZ', '2021-01-03 11:00:00'); insert into foobar values('XYZ', '2021-01-03 10:00:00'); insert into foobar values('XYZ', '2021-01-03 08:00:00'); --2 hr gap insert into foobar values('XYZ', '2021-01-03 07:00:00'); insert into foobar values('XYZ', '2021-01-03 06:00:00'); insert into foobar values('XYZ', '2021-01-03 03:00:00'); --3 hr gap select tick, island_width, min_val, max_val, min_val - lag(max_val) over (order by max_val) as gap_width from ( select tick, count(*) as island_width, min(date_val) min_val, max(date_val) max_val from ( select t.*, row_number() over ( partition by tick order by date_val ) as seqnum from foobar t where tick = 'XYZ' ) t group by tick, date_val - seqnum * interval '1 hour' ) t2 order by max_val desc 

Prints:

┌──────┬──────────────┬─────────────────────┬─────────────────────┬───────────┐ │ tick │ island_width │ min_val │ max_val │ gap_width │ ├──────┼──────────────┼─────────────────────┼─────────────────────┼───────────┤ │ XYZ │ 3 │ 2021-01-03 10:00:00 │ 2021-01-03 12:00:00 │ 02:00:00 │ │ XYZ │ 3 │ 2021-01-03 06:00:00 │ 2021-01-03 08:00:00 │ 03:00:00 │ │ XYZ │ 1 │ 2021-01-03 03:00:00 │ 2021-01-03 03:00:00 │ ¤ │ └──────┴──────────────┴─────────────────────┴─────────────────────┴───────────┘ 

Comments

0

In Postgres you can subtract integer from date directly:

SELECT "user", status , count(*) AS days_in_status, min(date) AS min, max(date) AS max FROM ( SELECT "user", status, date , date - row_number() OVER (PARTITION BY "user", status ORDER BY date)::int AS grp_date FROM tbl ) t GROUP BY "user", status, grp_date ORDER BY "user", min; 

db<>fiddle here

See:

Aside: user is a reserved word, you would avoid it as actual column name.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.