0

My table structure and some sample data (test table), as well as my desired output (results) are available here.

When I do

SELECT distinct colony, min(date_check), max(date_check) from test where protection ='Y' group by colony 

I only detect 1 event by colony.

Should I use a function or is there another way ?

ps : if you think of a better title to help users, don't hestitate, I struggled.

1

2 Answers 2

1

To solve your issue, I did the following (a fiddle for all code below is available here):

A couple of points:

  • This solution makes use of the LAG() window function.

  • Window functions are very powerful and will repay any effort spent learning them many times over.

  • And finally, PostgreSQL 9.5 is no longer supported - you might be better off moving to a supported version.

First step:

We obtain the points at which a change occurs in either colony or protection.

SELECT colony, protection, date_check, CASE WHEN (LAG(protection, 1) OVER (PARTITION BY colony ORDER BY date_check) != protection) THEN 1 ELSE 0 END AS change FROM test; 

Result (snipped for brevity):

colony protection date_check change 2 N 2019-10-26 0 2 N 2019-10-27 0 2 Y 2019-11-01 1 2 Y 2019-11-03 0 7 Y 2019-10-12 0 7 Y 2019-10-13 0 ... ... 

Step 2:

We sum the changes, obtaining distinct records for each start and stop date of when protection = 'Y'.

SELECT colony, date_check, SUM(change) OVER (PARTITION BY colony ORDER BY date_check) AS sc FROM ( SELECT colony, protection, date_check, CASE WHEN (LAG(protection, 1) OVER (PARTITION BY colony ORDER BY date_check) != protection) THEN 1 ELSE 0 END AS change FROM test ) AS tab WHERE protection = 'Y' -- test what happens when we comment out this line... ORDER BY colony, date_check; 

Result:

colony date_check sc 2 2019-11-01 1 2 2019-11-03 1 7 2019-10-12 0 7 2019-10-13 0 7 2019-10-14 0 7 2019-10-15 0 7 2019-10-16 0 7 2019-10-17 0 7 2019-10-23 1 7 2019-10-24 1 7 2019-10-25 1 7 2019-10-26 1 7 2019-11-01 2 7 2019-11-04 2 

Step 3:

Finally, we get the MIN() and MAX() of the check_date where there has been a change from 'N' to 'Y' (or vice versa) but only taking those where protection = 'Y'.

SELECT colony, MIN(date_check) AS "Date in", MAX(date_check) AS "Date out" , sc -- not strictly necessary here in the SELECT - illustrative! FROM ( SELECT colony, date_check, SUM(change) OVER (PARTITION BY colony ORDER BY date_check) AS sc FROM ( SELECT colony, protection, date_check, CASE WHEN (LAG(protection, 1) OVER (PARTITION BY colony ORDER BY date_check) != protection) THEN 1 ELSE 0 END AS change FROM test ) AS tab1 WHERE protection = 'Y' ORDER BY colony, date_check ) AS tab2 GROUP BY colony, sc -- sc not necessary in the SELECT but ORDER BY colony, sc; -- it is required in the GROUP BY - test!!! 

Result:

colony Date in Date out sc 2 2019-11-01 2019-11-03 1 7 2019-10-12 2019-10-17 0 7 2019-10-23 2019-10-26 1 7 2019-11-01 2019-11-04 2 

Q.E.D.

The window function solution appears to be the most performant - see here - run the fiddle several times and vary the order of the queries... I've tried to warm up the cache before running EXPLAIN (ANALYZE...) but test with your own tables(s) and hardware...

1
  • thank you for you very detailed answer. I learnt today. Thanks for editing the title also. Actually, I noticed later that my goal wasn't exactly what I asked in my post but from you answer I got my expected results. Commented Apr 1, 2021 at 16:23
1

Without CTEs and window functions:

SELECT colony, MIN(date_in) date_in, date_out FROM ( SELECT t1.colony, t1.date_check date_in, MAX(t2.date_check) date_out FROM test t1 JOIN test t2 ON t1.colony = t2.colony AND t1.date_check < t2.date_check AND t1.protection = 'Y' AND t2.protection = 'Y' AND NOT EXISTS ( SELECT NULL FROM test t3 WHERE t1.colony = t3.colony AND t1.date_check < t3.date_check AND t3.date_check < t2.date_check AND t3.protection != 'Y' ) GROUP BY t1.colony, t1.date_check ) subquery GROUP BY colony, date_out ORDER BY colony, date_in; 

With CTE and window functions:

WITH cte1 AS (SELECT *, CASE WHEN protection = 'N' OR LAG(protection) OVER (PARTITION BY colony ORDER BY date_check) = 'Y' THEN 0 ELSE 1 END AS row_in, CASE WHEN protection = 'Y' OR LEAD(protection) OVER (PARTITION BY colony ORDER BY date_check) = 'N' THEN 0 ELSE 1 END AS row_out FROM test), cte2 AS (SELECT *, SUM(row_in) OVER (PARTITION BY colony ORDER BY date_check) group_no FROM cte1) SELECT colony, MIN(date_check) date_in, MAX(date_check) date_out FROM cte2 WHERE protection = 'Y' GROUP BY colony, group_no ORDER BY colony, date_in; 

https://dbfiddle.uk/?rdbms=postgres_9.5&fiddle=bf769987d7211b73e89c0564bc6902a5

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.