1

I'm using MySQL InnoDB, one of the most important tables has over 700 million records with totally 23 actively used indexes.

I am trying to delete the records in a batch of 2000 based on the record date (and order by on primary key column). Each date has around 6 million records, I delete it date by date with limit 2000. Each batch takes around 25 seconds to complete. Since it is a Production database, I want this delete operation to complete faster. Is there a better way to do this?

5
  • How does deleting date by date work in batches of 2000 if each date has 6 million rows? Commented Jan 18, 2023 at 17:18
  • @BillKarwin with a Limit clause. Commented Jan 18, 2023 at 17:37
  • 1
    No, use a precomputed range on the PRIMARY KEY. See mysql.rjweb.org/doc.php/deletebig . Commented Jan 19, 2023 at 7:20
  • Additional DB information request, please. RAM size, # cores, any SSD or NVME devices on MySQL Host server? Post TEXT data on justpaste.it and share the links. From your SSH login root, Text results of: A) SELECT COUNT(*), sum(data_length), sum(index_length), sum(data_free) FROM information_schema.tables; B) SHOW GLOBAL STATUS; after minimum 24 hours UPTIME C) SHOW GLOBAL VARIABLES; D) SHOW FULL PROCESSLIST; E) STATUS; not SHOW STATUS, just STATUS; G) SHOW ENGINE INNODB STATUS; for server workload tuning analysis to provide DELETE improving suggestions. Commented Jan 19, 2023 at 17:31
  • Post TEXT data on justpaste.it and share the links. Additional very helpful OS information includes - please, htop 1st page, if available, TERMINATE, top -b -n 1 for most active apps, top -b -n 1 -H for details on your mysql threads memory and cpu usage, ulimit -a for list of limits, iostat -xm 5 3 for IOPS by device & core/cpu count, df -h for Used - Free space by device, df -i for inode info by device, free -h for Used - Free Mem: and Swap:, cat /proc/meminfo includes VMallocUused, for server workload tuning analysis to provide suggestions. Commented Jan 19, 2023 at 17:31

1 Answer 1

2

There are many solutions. See http://mysql.rjweb.org/doc.php/deletebig .

Which takes 25 seconds? A batch of 2000 rows? Or several of those? Are they in a single, big, transaction? Lots of little transactions would be slower, but less invasive. And there would be no ACID problem since re-deleting the same 'date' should be idempotent.

If the table is "locked" at some level for 25 seconds, then I understand your concern. If it is a bunch of sub-second deletes, then does it really matter that it takes a long time?

Furthermore, instead of deleting once a day, you could delete once an hour. This might decrease the tasks to 2 seconds instead of 25.

25 indexes is terribly large. Please provide SHOW CREATE TABLE. It is all too common that there are "redundant" indexes that could (should) be dropped. The main example is INDEX(a,b) takes care of INDEX(a) (but not b), so if you have both, drop the latter.

It may be impractical to make the change now, but PARTITION BY RANGE(TO_DAYS(...)) lets you DROP PARTITION, which is virtually instantaneous. (Adding partitioning to a 700M row table would take a long time.)

Sign up to request clarification or add additional context in comments.

2 Comments

I have checked all the indexes and there is no redundant or duplicate index. The table has ~ 50 columns. The total index size occupied by the table is ~500gb and the table size is almost a TB.
Please provide the SHOW anyway. And RAM size and innodb_buffer_pool_size.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.