Currently, I have a Server A that is holding about 25 billion records (several terabytes size) with the following structure:
CREATE TABLE `table_x` ( `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT, `a1` char(64) DEFAULT NULL, `b1` int(11) unsigned DEFAULT NULL, `c1` tinyint(1) DEFAULT NULL, `LastUpdate` timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(), PRIMARY KEY (`id`), UNIQUE KEY `idxb1a1` (`b1`,`a1`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
As the data is growing too big, I am trying to migrate these records into Server B with the same schema structure using bulk insert of 10K records (e.g INSERT INTO yourtable VALUES (1,2), (5,5), ...;) in asecending order by the column id.
Initially, the insertion rate was really quick - however, it gradually slowed down and now takes about 10 secs to bulk insert 10K records (i.e 1K/sec). I am guessing its because it needs to update the indexes after every insertion.
I have done the following configuration on Server B before starting the migration :
innodb_flush_log_at_trx_commit=2SET unique_checks=0;autocommit=0and commit every 50K
Server B hardware configuration :
- 300GB+ ram (240GB used for
innodb_buffer_pool_size) - SSDs for data storage
Server B my.cnf :
innodb_buffer_pool_size=240G innodb_buffer_pool_instances=64 innodb_page_cleaners=32 innodb_purge_threads=1 innodb_read_io_threads=64 innodb_write_io_threads=64 innodb_use_native_aio=0 innodb_flush_log_at_trx_commit=2 innodb_doublewrite=0 innodb_autoinc_lock_mode=2 innodb_file_per_table=1 max_connections=10000 skip_name_resolve=1 tmp_table_size=134217728 max_heap_table_size=134217728 back_log=1000 wait_timeout=900 innodb_log_buffer_size=32M innodb_log_file_size=768M Is there anything else I can do or configure to speed up the insertion?
Update #1 :
The reason why I am trying to migrate the records over to Server B is because I would like to break/shard the data into few servers (to use MariaDB SPIDER engine sharding solution). As such, solutions that involved sending a snapshot of the data or directly copying over the data doesn't seem viable.