2

We are in the process of upgrading our MariaDB 10.1 Master-Slave network to MariaDB 10.11. We are conducting them in 2 stages - upgrade all slaves to v10.11 and then on a later day introduce a new master with v10.11.

App is fully compatible with the new version and no issues there. I was setting up new production slaves (v10.11) in the same network, with the same Linode instance type (50 vcpu, 128GB RAM) as production. Database size is approx 500GB.

New v10.11 instances, have the same InnoDB config settings as the other v10.1 slaves. innodb_buffer_pool_size=100G for all instances. When starting the replication the lag just keeps growing in these. I noticed the CPU of these are below 50% with queries / second less than 4k. When there is usually a lag in the older version (v10.1) it's CPU is usually above 120% with queries/second above 15k and it will catch up with master in few minutes.

seconds_behind_master in v10.11 became almost 3-4 days. In one of the new slaves, I increased the slave_parallel_thread=12 and that improved. It's CPU came to about 300% with queries / second averaging to about 20k. But the other v10.11 with slave_parallel_therads=0, is still a few days behind Master. It's not fully utilizing the CPU - just 40-50% CPU. Queries per second is below 4k. MariaDB v10.1 slaves is running with slave_parallel_threads=0 and I have never experienced this long lag with them. Like I said lag usually settle in few minutes.

I have been trying to tweak some settings, restart slaves some time, rebuild the slaves but to none effect.

Some questions / concerns -

  1. Why is MariaDB 10.1 is performing better in replicating with better efficiency than the new MariaDB 10.11. v10.1 - don't have slave_parallel_threads enabled
  2. CPU utilization is much lower in the new v10.11 than v10.1 with better queries / second

Has anyone experienced this earlier?

All slaves (both v10.1 and v10.11) have these settings -

wait_timeout = 60 interactive_timeout = 60 max_connections = 1000 innodb_buffer_pool_size = 100G max_allowed_packet = 1G innodb_strict_mode=OFF sql_mode="NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION" innodb_data_file_path=ibdata1:500M:autoextend read_rnd_buffer_size = 256M sort_buffer_size = 256M join_buffer_size = 256M tmp_table_size= 1536M max_heap_table_size= 1536M skip_slave_start slave-type-conversions=ALL_NON_LOSSY read_buffer_size = 20M #innodb_additional_mem_pool_size = 256M innodb_log_file_size = 1G innodb_buffer_pool_instances = 40 skip_name_resolve = 1 tmp_table_size= 1536M max_heap_table_size= 1536M server_id=114 #slave-type-conversions=ALL_NON_LOSSY max_connect_errors = 1000000 log_warnings = 2 #### read_buffer_size = 20M #innodb_additional_mem_pool_size = 256M query_cache_type=0 

Setup diagram

2
  • Oh, just noticed - I did a count(*) in one of the big tables (213+ Million records). v10.1 -- took approx 22 seconds. v10.11 (slave_parallel_threads) -- took approx 52 seconds. Both slaves gets the same amount of traffic and are in sync with master. Commented Mar 15 at 4:47
  • I removed the mysql tag from this question. Please be aware that MySQL and MariaDB are no longer compatible, and you should think of them as entirely different products. Commented Mar 15 at 15:28

1 Answer 1

1

Disclaimer: I don't have experience with your use-case and the below is based on my reading of the documentation only.

From the documentation about the buffer pool:

Be aware that before MariaDB 10.4.4 the total memory allocated is about 10% more than the specified size as extra space is also reserved for control structures and buffers.

So, it looks like you have to set a higher value for innodb_buffer_pool_size in 10.11 to get the same actual size of buffer pool and similar performance as you have in 10.1. (Or am I misunderstanding what the documentation says?)

(Note that innodb_buffer_pool_size can now be set dynamically without restarting.)

I also notice you have innodb_buffer_pool_instances=40. This system variable was removed in 10.6, so you should get an error if you try using it in 10.11. (The documentation says "the original reasons for splitting the buffer pool have mostly gone away".)

Another variable to look at could be the new (since 10.9) innodb_log_file_buffering.

Also, innodb_log_file_size: This is the size of each file in the group. If I understand correctly, the default for innodb_log_files_in_group was 2 in 10.1 whereas the variable no longer exists in 10.11. Maybe you can compensate by doubling the value for innodb_log_file_size? (This is also now a dynamic variable, by the way.)

4
  • 1
    innodb_buffer_pool_size - I have increased this by 10 GB. innodb_log_file_buffering - this was an amazing insight. I think this will improve the performance drastically. I have made this to "ON" and did a quick restart in one of the servers. innodb_log_file_size- I have made this to 2 GB. I see positive impact in the slave with 12 parallel threads. The other server that was lagging many days is getting rebuilt and it should be complete in next 6 hours. I will then apply the above settings there too. Will post the progress here. Thank you! Commented Mar 15 at 13:32
  • 1
    Update: started the slave after a full restoration of backup. We have a current lag of 10 hours and it covered 1 hour in about 20 minutes. Much much better than what it used to be. I guess in next couple of hours we will be in sync with master. The solution is working. I will still keep this in monitoring and update here how it goes. Commented Mar 15 at 18:10
  • Marking this as Answer. The solution worked. Commented Mar 25 at 7:47
  • Let's tackle the Slow Queries Commented Apr 19 at 0:20

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.