MariaDB InnoDB bulk INSERT slow overtime

Question

Currently, I have a Server A that is holding about 25 billion records (several terabytes size) with the following structure:

CREATE TABLE `table_x` ( `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT, `a1` char(64) DEFAULT NULL, `b1` int(11) unsigned DEFAULT NULL, `c1` tinyint(1) DEFAULT NULL, `LastUpdate` timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(), PRIMARY KEY (`id`), UNIQUE KEY `idxb1a1` (`b1`,`a1`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

As the data is growing too big, I am trying to migrate these records into Server B with the same schema structure using bulk insert of 10K records (e.g INSERT INTO yourtable VALUES (1,2), (5,5), ...;) in asecending order by the column id.

Initially, the insertion rate was really quick - however, it gradually slowed down and now takes about 10 secs to bulk insert 10K records (i.e 1K/sec). I am guessing its because it needs to update the indexes after every insertion.

I have done the following configuration on Server B before starting the migration :

innodb_flush_log_at_trx_commit=2
SET unique_checks=0;
autocommit=0 and commit every 50K

Server B hardware configuration :

300GB+ ram (240GB used for innodb_buffer_pool_size)
SSDs for data storage

Server B my.cnf :

innodb_buffer_pool_size=240G innodb_buffer_pool_instances=64 innodb_page_cleaners=32 innodb_purge_threads=1 innodb_read_io_threads=64 innodb_write_io_threads=64 innodb_use_native_aio=0 innodb_flush_log_at_trx_commit=2 innodb_doublewrite=0 innodb_autoinc_lock_mode=2 innodb_file_per_table=1 max_connections=10000 skip_name_resolve=1 tmp_table_size=134217728 max_heap_table_size=134217728 back_log=1000 wait_timeout=900 innodb_log_buffer_size=32M innodb_log_file_size=768M

Is there anything else I can do or configure to speed up the insertion?

Update #1 :

The reason why I am trying to migrate the records over to Server B is because I would like to break/shard the data into few servers (to use MariaDB SPIDER engine sharding solution). As such, solutions that involved sending a snapshot of the data or directly copying over the data doesn't seem viable.

You are guessing the indexes are the cause; maybe try removing the idxb1a1 index, and readding it after all the data is loaded? (Which will copy the full table so will need ample extra disk space). While you are migrating seems like a great chance to upgrade to utf8mb4, by the way. — ysth
– ysth, Commented May 20, 2020 at 7:01
@ysth will dropping the index and rebuilding take longer than the actual migration? — GitPirateFlare
– GitPirateFlare, Commented May 20, 2020 at 7:09
Have you considered a file based copy? (E.g. shut the server down&copy the data dir/use a backup/...) — Solarflare
– Solarflare, Commented May 20, 2020 at 7:53
@Solarflare shutting down is not possible because the server is production, and I would like to reduce the records size in the table (which means I can't simply copy and paste). — GitPirateFlare
– GitPirateFlare, Commented May 20, 2020 at 8:43

Gordan Bobić · Accepted Answer · 2020-05-20 09:07:32Z

1

The reason it slows down is likely because your transaction log gets full and the purging isn't keeping up. Increasing innodb_log_file_size (requires shutdown with innodb_fast_shutdown=0 and removing the logs) and innodb_log_files_in_group will postpone the slowdown. Increasing innodb_io_capacity and innidb_io_capacity_max to match what your storage can achieve should help.

Why don't you use xtrabackup to take a point-in-time copy and replication to finish the sync? That will be orders of magnitude faster than INSERT-ing mysqldump style.

answered May 20, 2020 at 9:07

Gordan Bobić

1,91815 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

GitPirateFlare Over a year ago

@Gordon Bobic the reason why I dont use a snapshot feature (like xtrabackup) is because actually Server B will hold a subset of the data, while Server C will hold the remaining of the data. thanks for the suggestion on the configuration - I will try them out.

Gordan Bobić Over a year ago

You may want to look at: 1) partitioning 2) mysqldump's --where option. 3) MariaDB's spider engine

GitPirateFlare Over a year ago

@Gordon Bobic 1) partitioning is not possible due to my schema having 2 unique indexes. 2) thanks, i'll explore and see if mysqldump --where option process is overall faster than the migration 3) yes, actually that is the precise reason why I am breaking these records down into smaller servers. this is so I can utilize MariaDB Spider engine

Rick James Over a year ago

@GitPirateFlare - So you are setting up "sharding"? What determines the split between the machines?

GitPirateFlare Over a year ago

@Rick James because of the db schema, i had to do a workaround for mariadb spider solution to work. the spider will have the unique key constraint idxb1a1 removed so that I can set up partitioning (that points to the data servers) using id (range partitioning) - however, at the data server, they'll retain the unique key constraint

snitch182 · Accepted Answer · 2020-05-20 10:15:23Z

1

in addition to the answer from @Gordon-bobic removing the indices and reapplying at the end speeds things up a lot.

answered May 20, 2020 at 10:15

snitch182

72312 silver badges22 bronze badges

2 Comments

GitPirateFlare Over a year ago

my concern is that the rebuilding of the index will take unimaginably long because of the data size (in terabytes range)

snitch182 Over a year ago

yes, it will, but doing it incremenally will take more time overall - after all you have to write terabytes of data. There is no way around that. A spotifiy database migration i heard of took a year ... and they still lost 0.2 % of the data ... was a talk at codetalks 2019

Severalnines · Accepted Answer · 2022-12-15 07:35:46Z

Agree with @gordan-bobić regarding the use of xtrabackup. If you are applying for a data migration, using physical copy/backup is your best approach if you want speed. Using logical copy such as using mysqldump or a query-based copy can take much time because it has to apply checks based on the configuration set loaded during runtime and done dynamically. However, if bulk insert is your option, then consider adjusting the innodb_autoinc_lock_mode = 2 if this is applicable.

Another thing is that, if you are using INSERT statement, are you loading it one value at a time? Consider using the multiple-value lists as this is proven to be faster. On the other hand, consider also using LOAD DATA instead of the INSERT statement.

Also have your innodb_change_buffering=insert (since you are using unique_checks=0 in MariaDB) and consider increasing the innodb_change_buffer_max_size for example from 30 to 50. It is most likely best to do this when there's not much activity on the target table you are inserting, so that you can monitor such activity on your target database server. Also consider that there's not much disturbance from other applications or daemons running on this target server as well.

Collectives™ on Stack Overflow

MariaDB InnoDB bulk INSERT slow overtime

3 Answers 3

5 Comments

2 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

2 Comments

Comments

Related