We are in the process of upgrading our MariaDB 10.1 Master-Slave network to MariaDB 10.11. We are conducting them in 2 stages - upgrade all slaves to v10.11 and then on a later day introduce a new master with v10.11.
App is fully compatible with the new version and no issues there. I was setting up new production slaves (v10.11) in the same network, with the same Linode instance type (50 vcpu, 128GB RAM) as production. Database size is approx 500GB.
New v10.11 instances, have the same InnoDB config settings as the other v10.1 slaves. innodb_buffer_pool_size=100G for all instances. When starting the replication the lag just keeps growing in these. I noticed the CPU of these are below 50% with queries / second less than 4k. When there is usually a lag in the older version (v10.1) it's CPU is usually above 120% with queries/second above 15k and it will catch up with master in few minutes.
seconds_behind_master in v10.11 became almost 3-4 days. In one of the new slaves, I increased the slave_parallel_thread=12 and that improved. It's CPU came to about 300% with queries / second averaging to about 20k. But the other v10.11 with slave_parallel_therads=0, is still a few days behind Master. It's not fully utilizing the CPU - just 40-50% CPU. Queries per second is below 4k. MariaDB v10.1 slaves is running with slave_parallel_threads=0 and I have never experienced this long lag with them. Like I said lag usually settle in few minutes.
I have been trying to tweak some settings, restart slaves some time, rebuild the slaves but to none effect.
Some questions / concerns -
- Why is MariaDB 10.1 is performing better in replicating with better efficiency than the new MariaDB 10.11. v10.1 - don't have slave_parallel_threads enabled
- CPU utilization is much lower in the new v10.11 than v10.1 with better queries / second
Has anyone experienced this earlier?
All slaves (both v10.1 and v10.11) have these settings -
wait_timeout = 60 interactive_timeout = 60 max_connections = 1000 innodb_buffer_pool_size = 100G max_allowed_packet = 1G innodb_strict_mode=OFF sql_mode="NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION" innodb_data_file_path=ibdata1:500M:autoextend read_rnd_buffer_size = 256M sort_buffer_size = 256M join_buffer_size = 256M tmp_table_size= 1536M max_heap_table_size= 1536M skip_slave_start slave-type-conversions=ALL_NON_LOSSY read_buffer_size = 20M #innodb_additional_mem_pool_size = 256M innodb_log_file_size = 1G innodb_buffer_pool_instances = 40 skip_name_resolve = 1 tmp_table_size= 1536M max_heap_table_size= 1536M server_id=114 #slave-type-conversions=ALL_NON_LOSSY max_connect_errors = 1000000 log_warnings = 2 #### read_buffer_size = 20M #innodb_additional_mem_pool_size = 256M query_cache_type=0 