I'm trying to get all users that haven't been sent a message in a given day. I have two tables, one with the messages that have a reporterid indexed column, and a reporter table.
Let me walk you through it:
This inner query returns reporters that have received messages in a given day. It takes tenths of a second on my local dev machine:
select m.reporterid from nubamessage m where m.messagetypeenum =7 and m.createdOn>='2016-06-18 00:00:00' and m.createdOn<='2016-06-18 09:30:00'; There is no difference in the execution time of the above query if I do select(distinct) or not, and it returns the same rows.
But if I used that in an outer join query, it basically never completes:
select * from reporter r left outer join ( SELECT m.reporterid from nubamessage m where m.messagetypeenum =7 and m.createdOn>='2016-06-18 00:00:00' and m.createdOn<='2016-06-18 09:30:00' ) as mm ON mm.reporterid=r.id where r.enabled=1 and m.reporterid is null; However, if I change the inner query to distinct as the below, it takes a fraction of a second:
select * from reporter r left outer join ( SELECT distinct(m.reporterid) -- the only difference from nubamessage m where m.messagetypeenum =7 and m.createdOn>='2016-06-18 00:00:00' and m.createdOn<='2016-06-18 09:30:00' ) as mm ON mm.reporterid=r.id where r.enabled=1 and m.reporterid is null; The execution plan for the non-completing query:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra 1 SIMPLE r NULL ALL NULL NULL NULL NULL 11538 50.00 "Using where" 1 SIMPLE m NULL ALL NULL NULL NULL NULL 968388 100.00 "Using where; Using join buffer (Block Nested Loop)" The execution plan for the query where I just add "distinct", it adds some auto-key:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra 1 PRIMARY r NULL ALL NULL NULL NULL NULL 11538 50.00 "Using where" 1 PRIMARY <derived2> NULL ref <auto_key0> <auto_key0> 5 r.id 10 100.00 "Using index" 2 DERIVED m NULL ALL NULL NULL NULL NULL 968388 1.11 "Using where; Using temporary" If someone could find it in their heart to explain to me why 1. the first query never completes/is so slow and 2. why the distinct makes it run fast, even though the inner query returns the exact same rows in both cases, I'd be most grateful.
Thanks
Edit: Notice that I write "all users that haven't been sent a message. This is why I have the left join - later in the query I have "where m.reporterid is null", in order to only get out the reporters that don't already have a message in the messagetable. I have edited the questions above to reflect that.
I could have used a "not in" instead but from what I've read and in my tests it was a bit slower than left join.