1

I have been trying to create an index in MySQL, but keep getting temporary and filesort whenever I run an explain on my query.

A simplified version of my tables looks like:

ordered_products op_id INT UNSIGNED NOT NULL AUTO_INCREMENT op_orderid INT UNSIGNED NOT NULL op_orderdate TIMESTAMP NOT NULL op_productid INT UNSIGNED NOT NULL products p_id INT UNSIGNED NOT NULL AUTO_INCREMENT p_productname VARCHAR(128) NOT NULL p_enabled TINYINT NOT NULL 

The 'ordered_products' table currently has more than 1,000,000 rows and is a record of all products that have been ordered, as well as the orders that they belong to. This table grows rapidly.

The 'products' table currently has around 3,000 rows and contains a list of products that are for sale.

The site displays a list of the top products for a given period (normally the last 3 days) and my query looks like:

SELECT COUNT(op.op_productid) AS ProductCount, op.op_productid FROM ordered_products op LEFT JOIN products p ON op.op_productid=p.p_id WHERE op.op_orderdate>='2014-03-08 00:00:00' AND p.p_enabled=1 GROUP BY op.op_productid ORDER BY ProductCount DESC, p.p_productname ASC 

When I run that query, it normally takes around 800 milliseconds (0.8 seconds) to execute, which is ridiculous. We've remedied this with caching, however whenever the cache expires, we have a slowdown. I need to fix this.

I have tried to index the tables, but no matter what I try, I can't avoid temporary and filesort. The output from EXPLAIN is:

id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE p index PRIMARY,idx_enabled_id_name idx_enabled_id_name 782 \N 1477 Using where; Using index; Using temporary; Using filesort 1 SIMPLE op ref idx_pid_oid_date idx_pid_oid_date 4 test_store.p.p_id 9 Using where; Using index 

If I remove the GROUP BY, the filesort disappears, however I need it to ensure the ProductCount value shows me every product count rather than a total sum of all products.

If I remove the GROUP BY and the ORDER BY ProductCount, both temporary and filesort disappear, but now I am left with a very bad result set.

Can anyone please help me solve this? I have tried a multitude of different indexes, and have tried rewriting the SQL numerous times, but can never succeed.

Any help would be greatly appreciated.

1 Answer 1

1

You can't get rid of the temp table and filesort while you are using ORDER BY on a calculated column ProductCount. There's no index for the calculated column, so it has to do do the sorting at the time of the query.

I tried experimentally to reproduce your results. I can put an index on op_productid and then the optimizer might use it to perform the GROUP BY.

mysql> EXPLAIN SELECT COUNT(op.op_productid) AS ProductCount, op.op_productid FROM ordered_products op FORCE INDEX (op_productid) STRAIGHT_JOIN products p ON op.op_productid=p.p_id WHERE op.op_orderdate>='2014-03-08 00:00:00' AND p.p_enabled=1 GROUP BY op.op_productid ORDER BY null; 

In my case, I had to use STRAIGHT_JOIN and FORCE INDEX to override the optimizer. But that might be due to my test environment, where I have only 1 or 2 rows per table for testing, and it throws off the optimizer's choices. In your real data, it might make a more sensible choice.

Also, don't use LEFT JOIN if you have conditions in the WHERE clause that make the join implicitly an inner join. Learn the types of joins and how they work -- don't always use LEFT JOIN by default.

+----+-------------+-------+-------+---------------+--------------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+--------------+---------+------+------+-------------+ | 1 | SIMPLE | op | index | op_productid | op_productid | 4 | NULL | 5 | Using where | | 1 | SIMPLE | p | ALL | PRIMARY | NULL | NULL | NULL | 1 | Using where | +----+-------------+-------+-------+---------------+--------------+---------+------+------+-------------+ 

Your only alternative is to store a denormalized table, where the counts are persisted. Then if your cache fails, it isn't an expensive query to refresh the cache.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for the response. The optimizer does seem to be making a sensible choice for me, so no need to FORCE INDEX. I've never heard of STRAIGHT JOIN, so I'll look into that. As for the LEFT JOIN, that was a mistake on my part. It is actually an INNER JOIN, but I was experimenting with the query to see if it would help and forgot to revert it back. I was contemplating another table that stores counts for the day and it may be what I have to do if it's just not possible.
STRAIGHT_JOIN (note the underscore) is a MySQL thing for overriding the optimizer's ability to access tables in a different order than how you specify them in the query. See dev.mysql.com/doc/refman/5.6/en/index-hints.html
Interesting. Thanks for the tip!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.