Poor performance when at least one of the left joins should match a record

Question

I've got bookings (bookings) in my database. A booking can have 0 to n flight services (flight_services) and 0 to n hotel services (hotel_services). A user on my website can filter the bookings by setting where conditions on each of these tables.

When SELECTing, only bookings that have at least one flight service or at least one hotel service should be returned. Furthermore, there's a flight_pivot_table and a hotel_pivot_table and only the services should be considered that have a fix id (here 82) in those pivot tables.

The query:

select `bookings`.* from `bookings` left join ( select `flight_services`.* from `flight_services` inner join `flight_pivot_table` on `flight_services`.`id` = `flight_pivot_table`.`flight_service_id` where `flight_pivot_table`.`some_id` = 82 ) as `flight_services` on `bookings`.`id` = `flight_services`.`booking_id` left join ( select `hotel_services`.* from `hotel_services` inner join `hotel_pivot_table` on `hotel_services`.`id` = `hotel_pivot_table`.`hotel_service_id` where `hotel_pivot_table`.`some_id` = 82 ) as `hotel_services` on `bookings`.`id` = `hotel_services`.`booking_id` where ( flight_services.id is not null or hotel_services.id is not null ) group by `bookings`.`id`

Unfortunately, this is extremely slow, although all indexes are used. With the data I've got, this query takes about 300 ms to execute. Here's the output of EXPLAIN:

id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	bookings	index	PRIMARY	PRIMARY	4		35173	100
1	SIMPLE	flight_services	ref	PRIMARY,flight_services_uq	flight_services_uq	4	my_db.bookings.id	2	100	Using index
1	SIMPLE	flight_pivot_table	eq_ref	PRIMARY	PRIMARY	8	const,my_db.flight_services.id	1	100	Using index
1	SIMPLE	hotel_services	ref	PRIMARY,hotel_services_uq	hotel_services_uq	4	my_db.bookings.id	1	100	Using where; Using index
1	SIMPLE	hotel_pivot_table	eq_ref	PRIMARY	PRIMARY	8	const,my_db.hotel_services.id	1	100	Using index

One of the following actions reduces the time to about 3 ms but obviously break the functionality:

Remove the INNER JOIN from one or both of the sub queries.
Remove the WHERE conditions.
Replace the or with and in the WHERE conditions.

Notes:

As it should be possible to add where conditions for flight_services and hotel_services, the aliases given for the left join sub-queries match the table names.
I use GROUP BY because every booking should returned only once, of course.

How can I accelerate this?

Perhaps you can try where exists instead of the joins? Also, you shouldn't select flight_services.* when all you need is booking_id. — mustaccio
– mustaccio, Commented Jan 4, 2023 at 17:37
Thanks, but using WHERE EXISTS would take away the possibility to set WHERE conditions on those tables. The user should be able to filter the bookings by flight and hotel services. — quick-brown-fox
– quick-brown-fox, Commented Jan 5, 2023 at 9:04

Rick James · Accepted Answer · 2023-01-05 19:53:11Z

Whenever possible, it is usually better to use JOINs rather than "derived tables". (There are cases where the opposite is faster.)

That is, instead of

SELECT ... FROM ( SELECT ... a JOIN b ... ) JOIN ...

do

SELECT ... FROM a JOIN b ... JOIN ...

That being said, I see that you may have the "explode-implode" case:

SELECT ... FROM x JOIN z -- explode GROUP BY x.id -- implode

The JOIN flight_services seems to be 1:many -- hence "explode". But you are not using any of the data from that table??? Why JOIN to it? Why fetch all the columns, only to throw them away?

The pattern

LEFT JOIN z ON blah ... WHERE z.id IS NOT NULL

can be replaced by

WHERE EXISTS ( SELECT 1 FROM z WHERE blah )

if you don't actually use any of the data from z (ie, flight_services).

This "semi-join" is faster because it stops once a match is found. It does not do the "inflate", thereby helping obviate the need for the GROUP BY.

Ditto for the other LEFT JOIN.

I'm confused by the JOIN inside your derived tables. Anyway, it can also be done in the SELECT inside the EXISTS.

Here is my suggested code

select b.* from `bookings` AS b WHERE EXISTS ( SELECT 1 from `flight_services` AS fs inner join `flight_pivot_table` AS fpt ON fs.`id` = fpt.`flight_service_id` where fpt.`some_id` = 82 AND b.`id` = fs.`booking_id` ) AND EXISTS (... hotel_services join ...) -- no WHERE

Suggested indexes:

fs: INDEX(booking_id, id) fpt: INDEX(some_id, flight_service_id)

Both the reformulation and the composite indexes should help speed things up.

OR not AND

Oops, I misread the original query. OR is usually an Optimization killer. But rewriting the query to use UNION may be the answer.

( SELECT ... WHERE ... -- checking flight_services ) UNION ALL ( SELECT ... WHERE ... -- checking hotel )

So you want to show cases where there are flight services but no hotel services? And vice versa?

(Note: UNION ALL should probably be UNION DISTINCT because there could be the same 'booking' from both?)

Anyway, let's work toward another optimization. Since you want just the bookings and nothing about the flight and hotel, lets start by finding the booking.id values, then reach for bookings.*

This should give you the booking.ids based on flight stuff:

 SELECT fs.`booking_id` from `flight_services` AS fs inner join `flight_pivot_table` AS fpt ON fs.`id` = fpt.`flight_service_id` where fpt.`some_id` = 82

Then, to finish

SELECT b.* FROM ( ( that query ) UNION DISTINCT ( a similar query, but for hotel ) ) AS u JOIN bookings AS b ON u.booking_id = b.id

Thank you for your extensive answer, Rick! When using your suggested code, one is not able to filter on flight_services/hotel_services anymore. I've tried it anyways and set your suggested indexes: Surprisingly the query still takes 300 ms to complete. The only difference is that I have connected the WHERE EXISTS' with OR instead of AND (WHERE EXISTS(...) OR EXISTS(...)), because that is how it is supposed to work. (And again, changing it to AND reduces the time to about 5 ms, but breaks functionality.) Why is OR so slow here? — quick-brown-fox
– quick-brown-fox, Commented Jan 5, 2023 at 9:00
@quick-brown-fox - Ouch. OR is deadly for performance. (I may have missed "OR" when I wrote my answer.) I'll suggest turning OR into UNION; come back in a few minutes. — Rick James
– Rick James, Commented Jan 5, 2023 at 19:34
Thanks for your update! To answer your question: I'd like to show only bookings that have at least one flight service and/or at least one hotel service (with the 82 in their respective pivot tables). And your UNION query does just that and is incredibly fast (5 ms)! However, filtering on both service tables is still not possible. To solve that, I've added both LEFT JOIN from my initial query. However, this makes it possible to get the same booking multiple times again, so I re-added that GROUP BY. But that does not seem to impact the execution time. Or do you have a better approach? — quick-brown-fox
– quick-brown-fox, Commented Jan 5, 2023 at 21:46
@quick-brown-fox - I assumed that UNION DISTINCT would avoid the need for the GROUP BY. I'm running out of ideas. I hope you understand LEFT JOIN, UNION, etc and can experiment with other options. — Rick James
– Rick James, Commented Jan 5, 2023 at 22:36
To make it clear what I mean with filtering: It should be possible to add WHERE conditions on flight_services and hotel_services on the outer query. But it works with said LEFT JOINs and GROUP BY from my initial query. Thank you very much for your help! — quick-brown-fox
– quick-brown-fox, Commented Jan 6, 2023 at 9:39

Stack Exchange Network

Poor performance when at least one of the left joins should match a record

1 Answer 1

Hot Network Questions

Poor performance when at least one of the left joins should match a record

1 Answer 1

Related

Hot Network Questions