2

I have a data representing tagged continuous-spans in a single table with a <tag, start & stop>.

Example below.

I'm trying to combine multiple rows into a single row where the condition is that they create a "continuous span". In the query below - I would like the functionality that LEFT_MOST_CONTINUOUS returns the minimum v_start of a continuous span (same for RIGHT_MOST_CONTINUOUS for maximum v_stop). Note that there might be more than a single continuous span (that should have different v_start and v_stop values).

Input:

WITH data AS ( SELECT * FROM (VALUES ('a', 2, 3), ('a', 3, 5), ('a', 5, 7), ('a', 8, 10), ('a', 10, 12), ('a', 12, 14), ('b', 7, 8), ('b', 8, 10), ('b', 12, 15), ('c', 10, 11) ) AS T(tag, v_start, v_stop) ORDER BY tag, v_start, v_stop ) SELECT tag, LEFT_MOST_CONTINUOUS(v_start) OVER (PARTITION BY tag), RIGHT_MOST_CONTINUOUS(v_stop) OVER (PARTITION BY tag) FROM data ORDER BY 1, 2, 3 

Where I expect to get the following output:

"a" 2 7 "a" 8 14 "b" 7 10 "b" 12 15 "c" 10 11 

Since I want to merge the first 3 tuples (for tag "a") which are consecutive into a single value representing the entire span; same for the next 3 tuples (again for "a"). Then for "b" we can merge the next 2, but leave out the 3rd (which has it's v_start != the other's v_stop). And "c" there is nothing to merge with.

Help appreciated,

Tal

3 Answers 3

2

You can use a gaps-and-islands approach by marking the first record of each group when either there is no previous record for the tag or the v_start is greater than v_stop of the previous record:

select tag, v_start, v_stop, coalesce(lag(v_stop) over w < v_start, true) as is_end_grp from data window w as (partition by tag order by v_start) 

Use a windowed sum() of the boolean is_end_grp cast to int (1 if true, 0 if false) to number the groups:

 select tag, sum(is_end_grp::int) over (partition by tag order by v_start) as grp_num, v_start, v_stop from mark_gaps 

Aggregation over (tag, grp_num) will produce your desired result:

select tag, min(v_start) as v_start, max(v_stop) as v_stop from numbered_groups group by tag, grp_num order by tag, v_start 

Working DB<>Fiddle

Sign up to request clarification or add additional context in comments.

3 Comments

Summing over is_end_grp had me stumped, great solution
Thanks! really elegant & explained solution - love the use of grp_num
excellent documentation link for the sum() over () window function, very useful for understanding the above answer.
1

Using the numbered_groups logic from @Mike Organek answer. I just started from a different place

WITH data AS ( SELECT * , case when lead(v_start) over(partition by tag order by v_start) = v_stop then 0 else 1 end stopcheck , case when lag(v_stop) over(partition by tag order by v_stop) = v_start then 0 else 1 end startcheck FROM (VALUES ('a' , 2 , 3), ('a', 3, 5), ('a', 5, 7), ('a', 8, 10), ('a', 10, 12), ('a', 12, 14), ('b', 7, 8), ('b', 8, 10), ('b', 12, 15), ('c', 10, 11) ) AS T(tag, v_start, v_stop) ORDER BY tag, v_start, v_stop ) ,cnt as ( select * , sum(startcheck) over (partition by tag order by v_start) grpn from data) select c1.tag, c1.v_start, c2.v_stop from cnt c1 inner join cnt c2 on c1.tag = c2.tag and c1.grpn = c2.grpn where c1.startcheck = 1 and c2.stopcheck = 1 

This logic is all based on the assumption that your data always starts where the last row left off, there is no overlap etc.

Create a startcheck and stopcheck by comparing the prior row and next row relatively. From here use another window function sum() over to order the start records (so we don't match start of second batch to stop of first batch)

Join the table to itself matching like tag and groups. Filtering start and stop records

Comments

0

You can use following query

WITH data AS ( SELECT * FROM (VALUES ('a', 2, 3), ('a', 3, 5), ('a', 5, 7), ('a', 8, 10), ('a', 10, 12), ('a', 12, 14), ('b', 7, 8), ('b', 8, 10), ('b', 12, 15), ('c', 10, 11) ) AS T(tag, v_start, v_stop) ORDER BY tag, v_start, v_stop ), cte1 as( select *, case when lag(v_stop)over(partition by tag order by(select null)) = v_start then 0 else 1 end as grp from data ), cte2 as( select *, sum(grp) over (partition by tag order by v_start) as rnk from cte1 ) select tag,min(v_start)v_start,max(v_stop)v_stop from cte2 group by tag,rnk order by tag 

Demo in db<>fiddle

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.