I recently broke replication and when I tried to get past the one incorrect transaction. I got the following.
MariaDB [(none)]> STOP SLAVE; Query OK, 0 rows affected (0.05 sec) MariaDB [(none)]> SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; ERROR 1966 (HY000): When using parallel replication and GTID with multiple replication domains, @@sql_slave_skip_counter cannot be used. Instead, setting @@gtid_slave_pos explicitly can be used to skip to after a given GTID position. MariaDB [(none)]> select @@gtid_slave_pos; +---------------------------------------------+ | @@gtid_slave_pos | +---------------------------------------------+ | 0-1051-1391406,1-1050-1182069,57-1051-98897 | +---------------------------------------------+ 1 row in set (0.00 sec) MariaDB [(none)]> show variables like '%_pos%'; +----------------------+---------------------------------------------------------+ | Variable_name | Value | +----------------------+---------------------------------------------------------+ | gtid_binlog_pos | 0-1051-1391406,2-1051-4474,57-1051-98897 | | gtid_current_pos | 0-1051-1391406,1-1050-1182069,2-1051-4474,57-1051-98897 | | gtid_slave_pos | 0-1051-1391406,1-1050-1182069,57-1051-98897 | | wsrep_start_position | 00000000-0000-0000-0000-000000000000:-1 | +----------------------+---------------------------------------------------------+ What do I need to do to fix this.
Update 1
MariaDB [(none)]> show variables like '%gtid%'; +------------------------+------------------------------------------+ | Variable_name | Value | +------------------------+------------------------------------------+ | gtid_binlog_pos | 1-1050-4820789,2-1051-379101,3-1010-3273 | | gtid_binlog_state | 1-1050-4820789,2-1051-379101,3-1010-3273 | | gtid_current_pos | 1-1050-4819948,2-1051-379101,3-1010-3273 | | gtid_domain_id | 3 | | gtid_ignore_duplicates | OFF | | gtid_seq_no | 0 | | gtid_slave_pos | 1-1050-4819948,2-1051-379101,3-1010-3273 | | gtid_strict_mode | OFF | | last_gtid | | | wsrep_gtid_domain_id | 0 | | wsrep_gtid_mode | OFF | +------------------------+------------------------------------------+ I tried the following as per the instructions to set the @@gtid_slave_pos;
MariaDB [(none)]> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: [redacted] Master_User: [redacted] Master_Port: 3306 Connect_Retry: 5 Master_Log_File: binary.000591 Read_Master_Log_Pos: 526511543 Relay_Log_File: tmsdb-relay-bin.001239 Relay_Log_Pos: 4 Relay_Master_Log_File: binary.000591 Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 1062 Last_Error: Could not execute Write_rows_v1 event on table [redacted] Duplicate entry '1134890' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log binary.000591, end_log_pos 60726493 Skip_Counter: 0 Exec_Master_Log_Pos: 60724897 Relay_Log_Space: 465787660 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 1062 Last_SQL_Error: Could not execute Write_rows_v1 event on table [redacted] Duplicate entry '1134890' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log binary.000591, end_log_pos 60726493 Replicate_Ignore_Server_Ids: Master_Server_Id: 1050 Master_SSL_Crl: Master_SSL_Crlpath: Using_Gtid: Current_Pos Gtid_IO_Pos: 1-1050-4827753,2-1051-379101,3-1010-3273 Replicate_Do_Domain_Ids: Replicate_Ignore_Domain_Ids: Parallel_Mode: optimistic 1 row in set (0.00 sec) Using the gtid_slave_pos varialbe
MariaDB [(none)]> select @@gtid_slave_pos\G; *************************** 1. row *************************** @@gtid_slave_pos: 1-1050-4819948,2-1051-379101,3-1010-3273 MariaDB [(none)]> stop slave; Query OK, 0 rows affected (0.21 sec) MariaDB [(none)]> SET GLOBAL gtid_slave_pos='1-1050-4819948,2-1051-379101,3-1010-3274'; Query OK, 0 rows affected (0.10 sec) MariaDB [(none)]> start slave; Query OK, 0 rows affected (0.21 sec) When I check the status after running the above Got fatal error 1236 from master when reading data from binary log: 'Error: connecting slave requested to start from GTID 3-1010-3274, which is not in the master's binlog'
MariaDB [(none)]> show slave status\G *************************** 1. row *************************** Slave_IO_State: Master_Host: 10.56.228.64 Master_User: maxscale Master_Port: 3306 Connect_Retry: 5 Master_Log_File: binary.000591 Read_Master_Log_Pos: 60724897 Relay_Log_File: tmsdb-relay-bin.001239 Relay_Log_Pos: 4 Relay_Master_Log_File: binary.000591 Slave_IO_Running: No Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 60724897 Relay_Log_Space: 249 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 1236 Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Error: connecting slave requested to start from GTID 3-1010-3274, which is not in the master's binlog' Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 1050 Master_SSL_Crl: Master_SSL_Crlpath: Using_Gtid: Current_Pos Gtid_IO_Pos: 1-1050-4819948,2-1051-379101,3-1010-3274 Replicate_Do_Domain_Ids: Replicate_Ignore_Domain_Ids: Parallel_Mode: optimistic 1 row in set (0.00 sec) I can get this back to the previous state by
MariaDB [(none)]> stop slave; Query OK, 0 rows affected (0.01 sec) MariaDB [(none)]> SET GLOBAL gtid_slave_pos='1-1050-4819948,2-1051-379101,3-1010-3273'; Query OK, 0 rows affected (0.09 sec) MariaDB [(none)]> start slave; Query OK, 0 rows affected (0.06 sec)