Why are bulk multi-column key queries so slow in MySQL?

Question

(For this question, I am using AWS/Aurora MySQL with a reasonably-spec'd RDS instance)

Consider the following schema:

Table T: col0: the usual autoincrement primary key col1: varchar col2: varchar col3: varchar col4...N: various data

Consider that there is a unique index on:

<col1, col2, col3>

And a non-unique index on:

<col1, col2>

And consider the following query:

SELECT * FROM T WHERE (col1 = 'val1' AND col2 = 'id1') OR (col1 = 'val2' AND col2 = 'id2') OR ... (col1 = 'valN' AND col2 = 'idN');

I would (perhaps naively) expected MySQL to figure out that each element of the OR set matched the (non-unique) index, and performed the query in the way it would have if I had said:

WHERE col0 in (v1, v2, ... , vN)

But it doesn't seem to do that: the timing for these two queries is WAY OFF, on the order of 10x slower for the "or of ands" query. EVEN WITH the secondary key lookup, and the fact that it's a string column lookup, 10x seems a bit severe. Note that EXPLAIN claims to be using the correct/expected index whether I specify (col1, col2) or (col1, col2, col3)

Please note also that:

SELECT * from T WHERE col1 in (list1) AND col2 in (list2);

Is also slow when there are a lot of different values in list1 and list2. Doing an "and" for the three columns is almost intractably slow.

Perhaps not surprisingly, this query works better than the "or of ands" when list1 is of length 1.

No there isn't' a match against every OR instance, more generally a recognition that the same index could be used would improve things. Is there a question here? A temp table of col1,col2 and then join against it might the highest performing way. — danblack
– danblack, Commented Jan 15, 2019 at 21:45
The question "why" was purposely not restated, that seemed a bit obvious. Am bummed that the "convert to an in method" isn't performed. Temptable with join has been also tried, with meh performance. I am assuming that creation of the temp table is the slow part here. But the salient issue is that MySQL is NOT doing what I would have expected/hoped would happen. — Mark Gerolimatos
– Mark Gerolimatos, Commented Jan 15, 2019 at 21:51
@danblack , could you please restate your comment as an answer, but perhaps a bit more specifically? If I understand you correctly, you are essentially saying that "yes, MySQL will NOT convert an 'or of ands' of conforming index-only and clauses into a virtual 'in' query" — Mark Gerolimatos
– Mark Gerolimatos, Commented Jan 15, 2019 at 21:53
Quite right, I'll have a coffee first. dev.mysql.com/doc/refman/5.7/en/mysql-indexes.html is the way's mysql uses indexes. Your second, two element non-unique index is effectively a duplicate. — danblack
– danblack, Commented Jan 15, 2019 at 21:59

Rick James · Accepted Answer · 2019-01-15 22:52:46Z

With "row constructors", you might get an optimization:

WHERE (col1, col2) IN (('v1', 'id1'), ('v2', 'id2'), ...)

But... In old versions, that would work, but lead to a table scan. I can't say specifically about the version you are running.

When you have this pair of indexes:

UNIQUE(col1, col2, col3) -- (or plain INDEX) INDEX(col1, col2)

there is no need for the latter, since the former can handle any queries that need it.

Perhaps the optimal way to write your query is

WHERE col1 in ('v1', 'v2', ...) AND (col1, col2) IN (('v1', 'id1'), ('v2', 'id2'), ...)

With that, it will use any index starting with col1 as a crude filter, then use the other part for the rest of the filtering.

Re "convert to an in method" -- MySQL started out as a clean and mean database; it did most of what anyone needed and did it reasonably well. That was 90% of the development. We are now into the other 90% of the development -- the "long tail". Quite possibly some list somewhere includes "convert to an in method". If so, it is being prioritized along with the thousands of other rare and obscure optimizations. Feel free to file a 'feature request' at bugs.mysql.com; that is the way to add it to the list, or bump it up in priority.

Thank you, that answer was AWESOME. I was also unaware that MySQL handled tuples of columns "in" such a way. — Mark Gerolimatos
– Mark Gerolimatos, Commented Jan 16, 2019 at 22:18
@MarkGerolimatos - It's been available for a long time, but due to the total lack of optimization (in the past), it is just as well that users have not noticed. — Rick James
– Rick James, Commented Jan 16, 2019 at 22:20
Well then @rickjames (!!!!!!), it's good that I didn't know. I assume that whatever version AWS/Aurora runs does it right? — Mark Gerolimatos
– Mark Gerolimatos, Commented Jan 16, 2019 at 22:23
I don't know for sure about Aurora. That variant of MySQL does have some noticeable improvements (Query cache, replication/backup at the block level, etc); I don't know if they did anything with "row constructor" optimization. I think MySQL 5.7 has done some improvements. I see bug fixes as far back as 5.0, so they existed at least that long ago. — Rick James
– Rick James, Commented Jan 16, 2019 at 22:40

Stack Exchange Network

Why are bulk multi-column key queries so slow in MySQL?

1 Answer 1

Hot Network Questions

Why are bulk multi-column key queries so slow in MySQL?

1 Answer 1

Related

Hot Network Questions