MySQL - How to select 'DISTINCT' overlapping periods (dates or number ranges)

Question

Put succinctly, if a query tells me A overlaps B then I don't need it to also tell me that B also overlaps A as they overlap each other.

So I am trying to use a self join in sql to select just 'DISTINCT' overlaps.

To illustrate, here is a simple SQL fiddle that I wrote to show inclusive overlap selection (http://sqlfiddle.com/#!9/7af84f/1)

In detail...

Assume I have a table of name (char), d1 (int), d2 (int) , the schema of which is below. Here d1 and d2 represent the start and end of some interval that might overlap with another interval in the same table,.

CREATE TABLE test ( letter char , d1 int , d2 int ) ;

Given this table I fill it with some values

INSERT INTO test (letter,d1,d2) VALUES ('A', 2, 10), -- overlaps C and D ('B', 12, 20), -- overlaps E ('C', 5, 10), -- overlaps A and D ('D', 1, 8), -- overlaps A and C ('E', 13, 15), -- overlaps B ('F', 25, 30); -- doesn't overlap anything

and run the following query that uses a self join to correctly find the rows where d1 and d2 in one row has an inclusive overlap with d1 and d2 in other rows.

-- selects all records that overlap in the range d1 - d2 inclusive -- (excluding the implicit overlap between a record and itself) -- The results are sorted by letter followed by d1 SELECT basetable.letter as test_letter, basetable.d1, basetable.d2, overlaptable.letter as overlap_letter, overlaptable.d1 as overlap_d1, overlaptable.d2 as overlap_d2 FROM test as basetable, test as overlaptable WHERE -- there is an inclusive overlap basetable.d1 <= overlaptable.d2 and basetable.d2 >= overlaptable.d1 AND -- the row being checked is not itsself basetable.letter <> overlaptable.letter AND basetable.d1 <> overlaptable.d1 AND basetable.d2 <> overlaptable.d2 ORDER BY basetable.letter, basetable.d1

That correctly gives me the following, showing all 6 versions of overlaps eg left hand column indicates that A overlaps C and another row shows that C overlaps A (note the sqlfiddle doesn't seem to understand field aliases so my column headers are different)

test_letter d1 d2 overlap_letter overlap_d1 overlap_d2 A 2 10 D 1 8 B 12 20 E 13 15 C 5 10 D 1 8 D 1 8 A 2 10 D 1 8 C 5 10 E 13 15 B 12 20

My question is this:

How can I alter the sql to just get four rows of 'DISTINCT' or 'one way' overlaps?

ie this result...

test_letter d1 d2 overlap_letter overlap_d1 overlap_d2 A 2 10 D 1 8 A 2 10 C 5 10 B 12 20 E 13 15 C 5 10 D 1 8

eg:
a result that just shows records for A, B and C in the left hand column according to the following reasoning

A(2,10) overlaps with D(1,8) and C(5,10) and {SHOW THESE TWO ROWS}
B(12,20) overlaps with E(13,15) {SHOW THIS ROW}
C(5,10) overlaps with D(1,8) {SHOW THIS ROW but don't show the A(1,10) overlap as row 2 already shows that A and C overlap}
D(1,8) {DON'T SHOW anything new as we already know about A(1,10) and C(5,10)}
E(13,15) {DON'T SHOW anything new as we already know about B(12,20) }
F(25,30) {DON'T SHOW anything as there are no overlaps}

You have a table called test, and then you fill a table call testnames. I'm already confused. — Strawberry
– Strawberry, Commented Oct 10, 2016 at 10:10
But it seems that you're only interested in situations where one letter is less than another one (as opposed to 'not equal to') — Strawberry
– Strawberry, Commented Oct 10, 2016 at 10:13
Change basetable.letter <> overlaptable.letter to basetable.letter < overlaptable.letter. It will also speed up your query by up to 50 percent. (Which is, now that I can see it, exactly what @Strawberry wrote using words). — Solarflare
– Solarflare, Commented Oct 10, 2016 at 10:16
You need a criteria how to choose between (B overlaps E) and (E overlaps B). Besides basetable.letter {< | > }overlaptable.letter other possible options are basetable.d1 > overlaptable.d1' , basetable.d2 < overlaptable.d2` — Serg
– Serg, Commented Oct 10, 2016 at 11:12
Yes, simply changing <> to < missed the case of A overlapping C (by overlap I include the case of one being entirely inside the other) — user3209752
– user3209752, Commented Oct 10, 2016 at 14:46

Gordon Linoff · Accepted Answer · 2016-10-10 23:41:16Z

2

You can just change to an inequality. And, you should also use JOIN:

SELECT basetable.letter as test_letter, basetable.d1, basetable.d2, overlaptable.letter as overlap_letter, overlaptable.d1 as overlap_d1, overlaptable.d2 as overlap_d2 FROM test basetable JOIN test overlaptable ON basetable.d1 <= overlaptable.d2 AND basetable.d2 >= overlaptable.d1 WHERE basetable.letter < overlaptable.letter -- This is the change ORDER BY basetable.letter, basetable.d1;

edited Oct 10, 2016 at 23:41

answered Oct 10, 2016 at 11:11

Gordon Linoff

1.3m62 gold badges706 silver badges857 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user3209752 Over a year ago

Ths misses the case where A overlaps C as it uses the same inequality as suggested in the comments to my original post

Gordon Linoff Over a year ago

@user3209752 . . . I removed the two of the where conditions. The query looks right to me for finding overlaps.

user3209752 Over a year ago

That did it. It produced exactly the table I was looking for, even down to the same order of letters. On balance I think I'll make this the correct answer because, although Serg's answer does indeed work to pick up distinct overlaps, the result table doesn't exactly match that requested in my post. Also, by using a join, and a single where clause, this answer is simpler for other people to understand.

Serg · Accepted Answer · 2016-10-10 12:09:57Z

This can be as simple as already suggested PK ordering. Alternatively you may wish to introduce lexicographic order of a sort.

CREATE TABLE test ( letter char , d1 int , d2 int ) ; INSERT INTO test (letter,d1,d2) VALUES ('A', 2, 10), -- overlaps C and D ('B', 12, 20), -- overlaps E ('C', 5, 10), -- overlaps A and D ('D', 1, 8), -- overlaps A and C ('E', 13, 15), -- overlaps B ('F', 25, 30), -- doesn't overlap anything ('G', 50, 60), -- a set of equal intervals ('H', 50, 60), ('I', 50, 60) SELECT basetable.letter as test_letter, basetable.d1, basetable.d2, overlaptable.letter as overlap_letter, overlaptable.d1 as overlap_d1, overlaptable.d2 as overlap_d2 FROM test as basetable, test as overlaptable WHERE -- there is an inclusive overlap basetable.d1 <= overlaptable.d2 and basetable.d2 >= overlaptable.d1 AND -- require lexicographic order: basetable starts later / finishes earlier / its letter is less then overlaptable basetable.d1 > overlaptable.d1 OR (basetable.d1 = overlaptable.d1 AND (basetable.d2 < overlaptable.d2 OR (basetable.d2 = overlaptable.d2 AND basetable.letter < overlaptable.letter))) ORDER BY overlaptable.d1, basetable.d2, basetable.letter

Thanks, this appears to give me the correct 4 rows including the A overlapping C. I'm guessing you are treating 'letter' as the primary key - which was my intention in the small example I gave but I didn't make it explicit for brevity. In the real application I am retrieving people's forenames, surnames and dates that overlap but each person does have a numerical ID as their PK held in the equivalent of my table 'test', so I think your solution will work if I substitute member_ID for letter in the WHERE clause.
Yes, my guess was 'letter' is the primary key because it's used in your query to exclude the same row. If member_ID is the PK really, you can safely substitute member_ID for letter.
Thanks for the time you took to develop that 'where' clause Serg, I have given it an upvote. However after consideration I have marked Gordon's answer as correct as it produces exactly the table I was looking for and by using a join is able to use simpler logic.

Collectives™ on Stack Overflow

MySQL - How to select 'DISTINCT' overlapping periods (dates or number ranges)

2 Answers 2

3 Comments

3 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

3 Comments

Related