MySQL高可用之MHA切换测试(switchover & failover)

2019-09-27 17:44 来源:未知

 

MHA故障切换和在线切换的代码解析

Preface

前段时间我的同事沈龙星整理了一下MHA故障切换和在线切换的代码流程,在征得其同意后,在此转发。以下是正文

本文是以MySQL5.5为基础的,因此没有涉及到gtid相关内容。MHA的主从切换过程分为failover和rotate两种,前者适用于原Master down的情况,后者是在在线切换的情况下使用。下面分别讲解

 

failover的处理流程

  1. MHA::MasterFailover::main()
  2. ->do_master_failover
  3. Phase 1: Configuration Check Phase
  4. -> check_settings:
  5. check_node_version:查看MHA的版本信息
  6. connect_all_and_read_server_status:确认各个node的MySQL实例是否可以连接
  7. get_dead_servers/get_alive_servers/get_alive_slaves:double check各个node的死活状态
  8. start_sql_threads_if:查看Slave_SQL_Running是否为Yes,若不是则启动SQL thread
  9. Phase 2: Dead Master Shutdown Phase:对于我们来说,唯一的作用就是stop IO thread
  10. -> force_shutdown($dead_master):
  11. stop_io_thread:所有slave的IO thread stop掉(将stop掉master)
  12. force_shutdown_internal(实际上就是执行配置文件中的master_ip_failover_script/shutdown_script,若无则不执行):
  13. master_ip_failover_script:如果设置了VIP,则首先切换VIP
  14. shutdown_script:如果设置了shutdown脚本,则执行
  15. Phase 3: Master Recovery Phase
  16. -> Phase 3.1: Getting Latest Slaves Phase(取得latest slave)
  17. read_slave_status:取得各个slave的binlog file/position
  18. check_slave_status:调用"SHOW SLAVE STATUS"来取得slave的如下信息:
  19. Slave_IO_State, Master_Host,
  20. Master_Port, Master_User,
  21. Slave_IO_Running, Slave_SQL_Running,
  22. Master_Log_File, Read_Master_Log_Pos,
  23. Relay_Master_Log_File, Last_Errno,
  24. Last_Error, Exec_Master_Log_Pos,
  25. Relay_Log_File, Relay_Log_Pos,
  26. Seconds_Behind_Master, Retrieved_Gtid_Set,
  27. Executed_Gtid_Set, Auto_Position
  28. Replicate_Do_DB, Replicate_Ignore_DB, Replicate_Do_Table,
  29. Replicate_Ignore_Table, Replicate_Wild_Do_Table,
  30. Replicate_Wild_Ignore_Table
  31. identify_latest_slaves:
  32. 通过比较各个slave中的Master_Log_File/Read_Master_Log_Pos,来找到latest的slave
  33. identify_oldest_slaves:
  34. 通过比较各个slave中的Master_Log_File/Read_Master_Log_Pos,来找到oldest的slave
  35. -> Phase 3.2: Saving Dead Master's Binlog Phase:
  36. save_master_binlog:
  37. 如果dead master可以ssh连接,则走如下分支:
  38. save_master_binlog_internal:(使用node节点的save_binary_logs脚本在dead master上做拷贝)
  39. save_binary_logs --command=save --start_file=mysql-bin.000281 --start_pos=107 --binlog_dir=/opt/mysql/data/binlog --output_file=/opt/mha/log/saved_master_binlog_from_10.27.177.245_3306_20160108211857.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.55
  40. generate_diff_binary_log:
  41. concat_all_binlogs_from:
  42. dump_binlog:就是将binlog文件dump到target文件中,用的就是binmode read
  43. dump_binlog_header_fde:从0读到position-1
  44. dump_binlog_from_pos:从position开始,dump binlog file到target file
  45. file_copy:
  46. 文件拷贝,是将上述生成的binlog文件拷贝到manage节点的manager_workdir目录下
  47. 如果dead master无法ssh登录,则master上未同步到slave的txn丢失
  48. -> Phase 3.3: Determining New Master Phase
  49. find_latest_base_slave:
  50. find_latest_base_slave_internal:
  51. pos_cmp( $oldest_mlf, $oldest_mlp, $latest_mlf, $latest_mlp )
  52. 判断latest/oldest slave的binlog位置是不是相同,若相同则不需要同步relay log
  53. apply_diff_relay_logs --command=find --latest
  54. 查看latest slave中是否有oldest缺少的relay log,若无则继续,否则failover失败
  55. 查找的方法很简单,就是逆序的读latest slave的relay log文件,一直找到file/position为止
  56. select_new_master:选出新的master节点
  57. If preferred node is specified, one of active preferred nodes will be new master.
  58. If the latest server behinds too much (i.e. stopping sql thread for online backups),
  59. we should not use it as a new master, we should fetch relay log there. Even though preferred
  60. master is configured, it does not become a master if it's far behind.
  61. get_candidate_masters:
  62. 就是配置文件中配置了candidate_master>0的节点
  63. get_bad_candidate_masters:
  64. # The following servers can not be master:
  65. # - dead servers
  66. # - Set no_master in conf files (i.e. DR servers)
  67. # - log_bin is disabled
  68. # - Major version is not the oldest
  69. # - too much replication delay(slave与master的binlog position差距大于100000000)
  70. Searching from candidate_master slaves which have received the latest relay log events
  71. if NOT FOUND:
  72. Searching from all candidate_master slaves
  73. if NOT FOUND:
  74. Searching from all slaves which have received the latest relay log events
  75. if NOT FOUND:
  76. Searching from all slaves
  77. -> Phase 3.4: New Master Diff Log Generation Phase
  78. recover_relay_logs:
  79. 判断new master是不是latest slave,若不是则使用apply_diff_relay_logs --命令生成差分log,
  80. 并发送到新new master
  81. recover_master_internal:
  82. 将3.2中生成的daed master上的binlog发送到new master
  83. -> Phase 3.5: Master Log Apply Phase
  84. recover_slave:
  85. apply_diff:
    1. wait_until_relay_log_applied,等待new master将relaylog执行完
    1. 判断Exec_Master_Log_Pos == Read_Master_Log_Pos,
  86. 如果不相等则使用save_binary_logs --command=save生成差分log
    1. 调用apply_diff_relay_logs命令,让new master进行recover.其中:
  87. 2.1 recover的log分为三部分:
  88. exec_diff:Exec_Master_Log_Pos和Read_Master_Log_Pos的差分
  89. read_diff:new master与lastest slave的relay log的差分
  90. binlog_diff:lastest slave与daed master之间的binlog差分
  91. 实际上apply_diff_relay_logs就是调用mysqlbinlog command进行recover
  92. //如果设置了vip,则需要调用master_ip_failover_script进行vip的failover
  93. Phase 4: Slaves Recovery Phase
  94. -> Phase 4.1: Starting Parallel Slave Diff Log Generation Phase
  95. 生成Slave与New Slave之间的差异日志,并将该日志拷贝到各Slave的工作目录下。
  96. -> Phase 4.2: Starting Parallel Slave Log Apply Phase
  97. recover_slave:
  98. 对各个slave进行恢复,同Phase3.5
  99. change_master_and_start_slave:
  100. 通过CHANGE MASTER TO命令将这些Slave指向新的New Master,最后开始复制(start slave)
  101. Phase 5: New master cleanup phase
  102. reset_slave_on_new_master
  103. 清理New Master其实就是重置slave info,即取消原来的Slave信息。至此整个Master故障切换过程完成

    I've installed MasterHA yesterday,Now let's test the master-slave switch and failover feature.

rotate的处理过程

  1. MHA::MasterRotate::main()
    -> do_master_online_switch:
    Phase 1: Configuration Check Phase
    -> identify_orig_master
    connect_all_and_read_server_status:
    connect_check:首先进行connect check,确保各个server的MySQL服务都正常
    connect_and_get_status:获取MySQL实例的server_id/mysql_version/log_bin..等信息
    这一步还有一个重要的作用,是获取当前的master节点。通过执行show slave status,
    如果输出为空,说明当前节点是master节点。
    validate_current_master:取得master节点的信息,并判断配置的正确性
    check是否有server down,若有则退出rotate
    check master alive or not,若dead则退出rotate
    check_repl_priv:
    查看用户是否有replication的权限
    获取monitor_advisory_lock,以保证当前没有其他的monitor进程在master上运行
    执行:SELECT GET_LOCK('MHA_Master_High_Availability_Monitor', ?) AS Value
    获取failover_advisory_lock,以保证当前没有其他的failover进程在slave上运行
    执行:SELECT GET_LOCK('MHA_Master_High_Availability_Failover', ?) AS Value
    check_replication_health:
    执行:SHOW SLAVE STATUS来判断如下状态:current_slave_position/has_replication_problem
    其中,has_replication_problem具体check如下内容:IO线程/SQL线程/Seconds_Behind_Master(1s)
    get_running_update_threads:
    使用show processlist来查询当前有没有执行update的线程存在,若有则退出switch
    -> identify_new_master
    set_latest_slaves:当前的slave节点都是latest slave
    select_new_master:选出新的master节点
    If preferred node is specified, one of active preferred nodes will be new master.
    If the latest server behinds too much (i.e. stopping sql thread for online backups),
    we should not use it as a new master, we should fetch relay log there. Even though preferred
    master is configured, it does not become a master if it's far behind.
    get_candidate_masters:
    就是配置文件中配置了candidate_master>0的节点
    get_bad_candidate_masters:
    # The following servers can not be master:
    # - dead servers
    # - Set no_master in conf files (i.e. DR servers)
    # - log_bin is disabled
    # - Major version is not the oldest
    # - too much replication delay(slave与master的binlog position差距大于100000000)
    Searching from candidate_master slaves which have received the latest relay log events
    if NOT FOUND:
    Searching from all candidate_master slaves
    if NOT FOUND:
    Searching from all slaves which have received the latest relay log events
    if NOT FOUND:
    Searching from all slaves

    Phase 2: Rejecting updates Phase
    reject_update:lock table来reject write binlog
    如果MHA的配置文件中设置了"master_ip_online_change_script"参数,则执行该脚本来disable writes on the current master
    该脚本在使用了vip的时候才需要设置
    reconnect:确保当前与master的连接正常
    lock_all_tables:执行FLUSH TABLES WITH READ LOCK,来lock table
    check_binlog_stop:连续两次show master status,来判断写binlog是否已经停止

    read_slave_status:
    get_alive_slaves:
    check_slave_status:调用"SHOW SLAVE STATUS"来取得slave的如下信息:
    Slave_IO_State, Master_Host,
    Master_Port, Master_User,
    Slave_IO_Running, Slave_SQL_Running,
    Master_Log_File, Read_Master_Log_Pos,
    Relay_Master_Log_File, Last_Errno,
    Last_Error, Exec_Master_Log_Pos,
    Relay_Log_File, Relay_Log_Pos,
    Seconds_Behind_Master, Retrieved_Gtid_Set,
    Executed_Gtid_Set, Auto_Position
    Replicate_Do_DB, Replicate_Ignore_DB, Replicate_Do_Table,
    Replicate_Ignore_Table, Replicate_Wild_Do_Table,
    Replicate_Wild_Ignore_Table
    switch_master:
    switch_master_internal:
    master_pos_wait:调用select master_pos_wait函数,等待主从同步完成
    get_new_master_binlog_position:执行'show master status'
    Allow write access on the new master:
    调用master_ip_online_change_script --command=start ...,将vip指向new master
    disable_read_only:
    在新master上执行:SET GLOBAL read_only=0
    switch_slaves:
    switch_slaves_internal:
    change_master_and_start_slave
    change_master:
    start_slave:
    unlock_tables:在orig master上执行unlock table
    Phase 5: New master cleanup phase
    reset_slave_on_new_master
    release_failover_advisory_lock

http://www.bkjia.com/Javabc/1119671.htmlwww.bkjia.comtruehttp://www.bkjia.com/Javabc/1119671.htmlTechArticleMHA故障切换和在线切换的代码解析 前段时间我的同事沈龙星整理了一下MHA故障切换和在线切换的代码流程,在征得其同意后,在此转发。以...

 

Framework

 

Hostname IP Port Identity OS Version MySQL Version
zlm2 192.168.1.101 3306 master CentOS 7.0 5.7.21
zlm3 192.168.1.102 3306 slave/mha-manager CentOS 7.0 5.7.21
null 192.168.1.200 null vip null null

澳门新浦金网站,Procedure

 

Test 1:Manual master switchover

 

Check state of MHA-manager on zlm3.

 1 [root@zlm3 07:35:00 ~]
 2 #masterha_check_ssh --conf=/etc/masterha/app1.conf
 3 Fri Aug  3 07:37:13 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
 4 Fri Aug  3 07:37:13 2018 - [info] Reading application default configuration from /etc/masterha/app1.conf..
 5 Fri Aug  3 07:37:13 2018 - [info] Reading server configuration from /etc/masterha/app1.conf..
 6 Fri Aug  3 07:37:13 2018 - [info] Starting SSH connection tests..
 7 Fri Aug  3 07:37:13 2018 - [debug] 
 8 Fri Aug  3 07:37:13 2018 - [debug]  Connecting via SSH from root@192.168.1.101(192.168.1.101:22) to root@192.168.1.102(192.168.1.102:22)..
 9 Fri Aug  3 07:37:13 2018 - [debug]   ok.
10 Fri Aug  3 07:37:14 2018 - [debug] 
11 Fri Aug  3 07:37:13 2018 - [debug]  Connecting via SSH from root@192.168.1.102(192.168.1.102:22) to root@192.168.1.101(192.168.1.101:22)..
12 Fri Aug  3 07:37:13 2018 - [debug]   ok.
13 Fri Aug  3 07:37:14 2018 - [info] All SSH connection tests passed successfully.
14 
15 [root@zlm3 07:37:14 ~]
16 #masterha_check_repl --conf=/etc/masterha/app1.conf --global_conf=/etc/masterha/masterha_default.conf 
17 Fri Aug  3 07:37:37 2018 - [info] Reading default configuration from /etc/masterha/masterha_default.conf..
18 Fri Aug  3 07:37:37 2018 - [info] Reading application default configuration from /etc/masterha/app1.conf..
19 Fri Aug  3 07:37:37 2018 - [info] Reading server configuration from /etc/masterha/app1.conf..
20 Fri Aug  3 07:37:37 2018 - [info] MHA::MasterMonitor version 0.56.
21 Fri Aug  3 07:37:38 2018 - [info] GTID failover mode = 1
22 Fri Aug  3 07:37:38 2018 - [info] Dead Servers:
23 Fri Aug  3 07:37:38 2018 - [info] Alive Servers:
24 Fri Aug  3 07:37:38 2018 - [info]   192.168.1.101(192.168.1.101:3306)
25 Fri Aug  3 07:37:38 2018 - [info]   192.168.1.102(192.168.1.102:3306)
26 Fri Aug  3 07:37:38 2018 - [info] Alive Slaves:
27 Fri Aug  3 07:37:38 2018 - [info]   192.168.1.102(192.168.1.102:3306)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
28 Fri Aug  3 07:37:38 2018 - [info]     GTID ON
29 Fri Aug  3 07:37:38 2018 - [info]     Replicating from 192.168.1.101(192.168.1.101:3306)
30 Fri Aug  3 07:37:38 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
31 Fri Aug  3 07:37:38 2018 - [info] Current Alive Master: 192.168.1.101(192.168.1.101:3306)
32 Fri Aug  3 07:37:38 2018 - [info] Checking slave configurations..
33 Fri Aug  3 07:37:38 2018 - [info]  read_only=1 is not set on slave 192.168.1.102(192.168.1.102:3306).
34 Fri Aug  3 07:37:38 2018 - [info] Checking replication filtering settings..
35 Fri Aug  3 07:37:38 2018 - [info]  binlog_do_db= , binlog_ignore_db= 
36 Fri Aug  3 07:37:38 2018 - [info]  Replication filtering check ok.
37 Fri Aug  3 07:37:38 2018 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
38 Fri Aug  3 07:37:38 2018 - [info] Checking SSH publickey authentication settings on the current master..
39 ssh_exchange_identification: Connection closed by remote host
40 Fri Aug  3 07:37:38 2018 - [warning] HealthCheck: SSH to 192.168.1.101 is NOT reachable.
41 Fri Aug  3 07:37:38 2018 - [info] 
42 192.168.1.101(192.168.1.101:3306) (current master)
43  +--192.168.1.102(192.168.1.102:3306)
44 
45 Fri Aug  3 07:37:38 2018 - [info] Checking replication health on 192.168.1.102..
46 Fri Aug  3 07:37:38 2018 - [info]  ok.
47 Fri Aug  3 07:37:38 2018 - [info] Checking master_ip_failover_script status:
48 Fri Aug  3 07:37:38 2018 - [info]   /etc/masterha/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.1.101 --orig_master_ip=192.168.1.101 --orig_master_port=3306  --orig_master_ssh_port=3306
49 Fri Aug  3 07:37:38 2018 - [info]  OK.
50 Fri Aug  3 07:37:38 2018 - [warning] shutdown_script is not defined.
51 Fri Aug  3 07:37:38 2018 - [info] Got exit code 0 (Not master dead).
52 
53 MySQL Replication Health is OK.
54 
55 [root@zlm3 07:40:03 ~]
56 #Fri Aug  3 07:40:03 2018 - [info] Reading default configuration from /etc/masterha/masterha_default.conf..
57 Fri Aug  3 07:40:03 2018 - [info] Reading application default configuration from /etc/masterha/app1.conf..
58 Fri Aug  3 07:40:03 2018 - [info] Reading server configuration from /etc/masterha/app1.conf..
59 ssh_exchange_identification: Connection closed by remote host
60 ^C
61 
62 [root@zlm3 07:40:11 ~]
63 #masterha_check_status --conf=/etc/masterha/app1.conf
64 app1 (pid:5628) is running(0:PING_OK), master:192.168.1.101

 

Switch master to slave and make it become a new slave of new master.

  1 [root@zlm3 08:21:27 ~]
  2 #masterha_master_switch --conf=/etc/masterha/app1.conf --global_conf=/etc/masterha/masterha_default.conf --master_state=alive --new_master_host=192.168.1.102 --orig_master_is_new_slave --running_updates_limit=60
  3 Fri Aug  3 08:21:29 2018 - [info] MHA::MasterRotate version 0.56.
  4 Fri Aug  3 08:21:29 2018 - [info] Starting online master switch..
  5 Fri Aug  3 08:21:29 2018 - [info] 
  6 Fri Aug  3 08:21:29 2018 - [info] * Phase 1: Configuration Check Phase..
  7 Fri Aug  3 08:21:29 2018 - [info] 
  8 Fri Aug  3 08:21:29 2018 - [info] Reading default configuration from /etc/masterha/masterha_default.conf..
  9 Fri Aug  3 08:21:29 2018 - [info] Reading application default configuration from /etc/masterha/app1.conf..
 10 Fri Aug  3 08:21:29 2018 - [info] Reading server configuration from /etc/masterha/app1.conf..
 11 Fri Aug  3 08:21:30 2018 - [info] GTID failover mode = 1
 12 Fri Aug  3 08:21:30 2018 - [info] Current Alive Master: 192.168.1.101(192.168.1.101:3306)
 13 Fri Aug  3 08:21:30 2018 - [info] Alive Slaves:
 14 Fri Aug  3 08:21:30 2018 - [info]   192.168.1.102(192.168.1.102:3306)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
 15 Fri Aug  3 08:21:30 2018 - [info]     GTID ON
 16 Fri Aug  3 08:21:30 2018 - [info]     Replicating from 192.168.1.101(192.168.1.101:3306)
 17 Fri Aug  3 08:21:30 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
 18 
 19 It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.1.101(192.168.1.101:3306)? (YES/no): yes
 20 Fri Aug  3 08:21:33 2018 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
 21 Fri Aug  3 08:21:33 2018 - [info]  ok.
 22 Fri Aug  3 08:21:33 2018 - [info] Checking MHA is not monitoring or doing failover..
 23 Fri Aug  3 08:21:33 2018 - [error][/usr/share/perl5/vendor_perl/MHA/MasterRotate.pm, ln142] Getting advisory lock failed on the current master. MHA Monitor runs on the current master. Stop MHA Manager/Monitor and try again.
 24 Fri Aug  3 08:21:33 2018 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR:  at /usr/bin/masterha_master_switch line 53.
 25 
 26 //It means that we should stop MHA-manager when donging switchover master.
 27 
 28 [root@zlm3 08:21:33 ~]
 29 #masterha_stop --conf=/etc/masterha/app1.conf --global_conf=/etc/masterha/masterha_default.conf
 30 Stopped app1 successfully.
 31 [1]+  Exit 1                  masterha_manager --conf=/etc/masterha/app1.conf --global_conf=/etc/masterha/masterha_default.conf
 32 
 33 [root@zlm3 08:28:07 ~]
 34 #masterha_master_switch --conf=/etc/masterha/app1.conf --global_conf=/etc/masterha/masterha_default.conf --master_state=alive --new_master_host=192.168.1.102 --orig_master_is_new_slave --running_updates_limit=60
 35 Fri Aug  3 08:28:21 2018 - [info] MHA::MasterRotate version 0.56.
 36 Fri Aug  3 08:28:21 2018 - [info] Starting online master switch..
 37 Fri Aug  3 08:28:21 2018 - [info] 
 38 Fri Aug  3 08:28:21 2018 - [info] * Phase 1: Configuration Check Phase..
 39 Fri Aug  3 08:28:21 2018 - [info] 
 40 Fri Aug  3 08:28:21 2018 - [info] Reading default configuration from /etc/masterha/masterha_default.conf..
 41 Fri Aug  3 08:28:21 2018 - [info] Reading application default configuration from /etc/masterha/app1.conf..
 42 Fri Aug  3 08:28:21 2018 - [info] Reading server configuration from /etc/masterha/app1.conf..
 43 Fri Aug  3 08:28:22 2018 - [info] GTID failover mode = 1
 44 Fri Aug  3 08:28:22 2018 - [info] Current Alive Master: 192.168.1.101(192.168.1.101:3306)
 45 Fri Aug  3 08:28:22 2018 - [info] Alive Slaves:
 46 Fri Aug  3 08:28:22 2018 - [info]   192.168.1.102(192.168.1.102:3306)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
 47 Fri Aug  3 08:28:22 2018 - [info]     GTID ON
 48 Fri Aug  3 08:28:22 2018 - [info]     Replicating from 192.168.1.101(192.168.1.101:3306)
 49 Fri Aug  3 08:28:22 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
 50 
 51 It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.1.101(192.168.1.101:3306)? (YES/no): yes
 52 Fri Aug  3 08:28:25 2018 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
 53 Fri Aug  3 08:28:25 2018 - [info]  ok.
 54 Fri Aug  3 08:28:25 2018 - [info] Checking MHA is not monitoring or doing failover..
 55 Fri Aug  3 08:28:25 2018 - [info] Checking replication health on 192.168.1.102..
 56 Fri Aug  3 08:28:25 2018 - [info]  ok.
 57 Fri Aug  3 08:28:25 2018 - [info] 192.168.1.102 can be new master.
 58 Fri Aug  3 08:28:25 2018 - [info] 
 59 From:
 60 192.168.1.101(192.168.1.101:3306) (current master)
 61  +--192.168.1.102(192.168.1.102:3306)
 62 
 63 To:
 64 192.168.1.102(192.168.1.102:3306) (new master)
 65  +--192.168.1.101(192.168.1.101:3306)
 66 
 67 Starting master switch from 192.168.1.101(192.168.1.101:3306) to 192.168.1.102(192.168.1.102:3306)? (yes/NO): yes
 68 Fri Aug  3 08:28:31 2018 - [info] Checking whether 192.168.1.102(192.168.1.102:3306) is ok for the new master..
 69 Fri Aug  3 08:28:31 2018 - [info]  ok.
 70 Fri Aug  3 08:28:31 2018 - [info] 192.168.1.101(192.168.1.101:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
 71 Fri Aug  3 08:28:31 2018 - [info] 192.168.1.101(192.168.1.101:3306): Resetting slave pointing to the dummy host.
 72 Fri Aug  3 08:28:31 2018 - [info] ** Phase 1: Configuration Check Phase completed.
 73 Fri Aug  3 08:28:31 2018 - [info] 
 74 Fri Aug  3 08:28:31 2018 - [info] * Phase 2: Rejecting updates Phase..
 75 Fri Aug  3 08:28:31 2018 - [info] 
 76 Fri Aug  3 08:28:31 2018 - [info] Executing master ip online change script to disable write on the current master:
 77 Fri Aug  3 08:28:31 2018 - [info]   /etc/masterha/master_ip_online_change --command=stop --orig_master_host=192.168.1.101 --orig_master_ip=192.168.1.101 --orig_master_port=3306 --orig_master_user='zlm' --orig_master_password='zlmzlm' --new_master_host=192.168.1.102 --new_master_ip=192.168.1.102 --new_master_port=3306 --new_master_user='zlm' --new_master_password='zlmzlm' --orig_master_ssh_user=root --new_master_ssh_user=root  --orig_master_ssh_port=3306  --new_master_ssh_port=3306 --orig_master_is_new_slave
 78 Unknown option: new_master_ssh_port
 79 Fri Aug  3 08:28:32 2018 116409 Set read_only on the new master.. ok.
 80 Fri Aug  3 08:28:32 2018 125643 drop vip 10.33.101.239..
 81 ssh_exchange_identification: Connection closed by remote host
 82 Fri Aug  3 08:28:32 2018 142948 Waiting all running 1 threads are disconnected.. (max 1500 milliseconds)
 83 {'Time' => '13435','db' => undef,'Id' => '21','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => 'zlm3:40535'}
 84 Fri Aug  3 08:28:32 2018 646769 Waiting all running 1 threads are disconnected.. (max 1000 milliseconds)
 85 {'Time' => '13435','db' => undef,'Id' => '21','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => 'zlm3:40535'}
 86 Fri Aug  3 08:28:33 2018 149221 Waiting all running 1 threads are disconnected.. (max 500 milliseconds)
 87 {'Time' => '13436','db' => undef,'Id' => '21','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => 'zlm3:40535'}
 88 Fri Aug  3 08:28:33 2018 650816 Set read_only=1 on the orig master.. ok.
 89 Fri Aug  3 08:28:33 2018 653323 Waiting all running 1 queries are disconnected.. (max 500 milliseconds)
 90 {'Time' => '13436','db' => undef,'Id' => '21','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => 'zlm3:40535'}
 91 Fri Aug  3 08:28:34 2018 154965 Killing all application threads..
 92 Fri Aug  3 08:28:34 2018 167919 done.
 93 Fri Aug  3 08:28:34 2018 - [info]  ok.
 94 Fri Aug  3 08:28:34 2018 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
 95 Fri Aug  3 08:28:34 2018 - [info] Executing FLUSH TABLES WITH READ LOCK..
 96 Fri Aug  3 08:28:34 2018 - [info]  ok.
 97 Fri Aug  3 08:28:34 2018 - [info] Orig master binlog:pos is mysql-bin.000050:2361.
 98 Fri Aug  3 08:28:34 2018 - [info]  Waiting to execute all relay logs on 192.168.1.102(192.168.1.102:3306)..
 99 Fri Aug  3 08:28:34 2018 - [info]  master_pos_wait(mysql-bin.000050:2361) completed on 192.168.1.102(192.168.1.102:3306). Executed 0 events.
100 Fri Aug  3 08:28:34 2018 - [info]   done.
101 Fri Aug  3 08:28:34 2018 - [info] Getting new master's binlog name and position..
102 Fri Aug  3 08:28:34 2018 - [info]  mysql-bin.000003:2321
103 Fri Aug  3 08:28:34 2018 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.1.102', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';
104 Fri Aug  3 08:28:34 2018 - [info] Executing master ip online change script to allow write on the new master:
105 Fri Aug  3 08:28:34 2018 - [info]   /etc/masterha/master_ip_online_change --command=start --orig_master_host=192.168.1.101 --orig_master_ip=192.168.1.101 --orig_master_port=3306 --orig_master_user='zlm' --orig_master_password='zlmzlm' --new_master_host=192.168.1.102 --new_master_ip=192.168.1.102 --new_master_port=3306 --new_master_user='zlm' --new_master_password='zlmzlm' --orig_master_ssh_user=root --new_master_ssh_user=root  --orig_master_ssh_port=3306  --new_master_ssh_port=3306 --orig_master_is_new_slave
106 Unknown option: new_master_ssh_port
107 Fri Aug  3 08:28:34 2018 327146 Set read_only=0 on the new master.
108 Fri Aug  3 08:28:34 2018 328259Add vip 10.33.101.239 on p3p1..
109 ssh_exchange_identification: Connection closed by remote host
110 Fri Aug  3 08:28:34 2018 - [info]  ok.
111 Fri Aug  3 08:28:34 2018 - [info] 
112 Fri Aug  3 08:28:34 2018 - [info] * Switching slaves in parallel..
113 Fri Aug  3 08:28:34 2018 - [info] 
114 Fri Aug  3 08:28:34 2018 - [info] Unlocking all tables on the orig master:
115 Fri Aug  3 08:28:34 2018 - [info] Executing UNLOCK TABLES..
116 Fri Aug  3 08:28:34 2018 - [info]  ok.
117 Fri Aug  3 08:28:34 2018 - [info] Starting orig master as a new slave..
118 Fri Aug  3 08:28:34 2018 - [info]  Resetting slave 192.168.1.101(192.168.1.101:3306) and starting replication from the new master 192.168.1.102(192.168.1.102:3306)..
119 Fri Aug  3 08:28:34 2018 - [info]  Executed CHANGE MASTER.
120 Fri Aug  3 08:28:35 2018 - [info]  Slave started.
121 Fri Aug  3 08:28:35 2018 - [info] All new slave servers switched successfully.
122 Fri Aug  3 08:28:35 2018 - [info] 
123 Fri Aug  3 08:28:35 2018 - [info] * Phase 5: New master cleanup phase..
124 Fri Aug  3 08:28:35 2018 - [info] 
125 Fri Aug  3 08:28:35 2018 - [info]  192.168.1.102: Resetting slave info succeeded.
126 Fri Aug  3 08:28:35 2018 - [info] Switching master to 192.168.1.102(192.168.1.102:3306) completed successfully.
127 
128 [root@zlm3 08:28:35 ~]
129 #

 

Check the master-slave replication status.

 1 //New master(original slave)
 2 (zlm@192.168.1.102 3306)[(none)]>show master status;
 3 +------------------+----------+--------------+------------------+------------------------------------------------+
 4 | File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                              |
 5 +------------------+----------+--------------+------------------+------------------------------------------------+
 6 | mysql-bin.000003 |     2321 |              |                  | 1b7181ee-6eaf-11e8-998e-080027de0e0e:1-3730259 |
 7 +------------------+----------+--------------+------------------+------------------------------------------------+
 8 1 row in set (0.00 sec)
 9 
10 (zlm@192.168.1.102 3306)[(none)]>show slave statusG
11 Empty set (0.00 sec)
12 
13 //New slave(original master)
14 (zlm@192.168.1.101 3306)[(none)]>show master status;
15 +------------------+----------+--------------+------------------+------------------------------------------------+
16 | File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                              |
17 +------------------+----------+--------------+------------------+------------------------------------------------+
18 | mysql-bin.000050 |     2361 |              |                  | 1b7181ee-6eaf-11e8-998e-080027de0e0e:1-3730259 |
19 +------------------+----------+--------------+------------------+------------------------------------------------+
20 1 row in set (0.01 sec)
21 
22 (zlm@192.168.1.101 3306)[(none)]>show slave statusG
23 *************************** 1. row ***************************
24                Slave_IO_State: Waiting for master to send event
25                   Master_Host: 192.168.1.102
26                   Master_User: repl
27                   Master_Port: 3306
28                 Connect_Retry: 60
29               Master_Log_File: mysql-bin.000003
30           Read_Master_Log_Pos: 2321
31                Relay_Log_File: relay-bin.000002
32                 Relay_Log_Pos: 398
33         Relay_Master_Log_File: mysql-bin.000003
34              Slave_IO_Running: Yes
35             Slave_SQL_Running: Yes
36               Replicate_Do_DB: 
37           Replicate_Ignore_DB: 
38            Replicate_Do_Table: 
39        Replicate_Ignore_Table: 
40       Replicate_Wild_Do_Table: 
41   Replicate_Wild_Ignore_Table: 
42                    Last_Errno: 0
43                    Last_Error: 
44                  Skip_Counter: 0
45           Exec_Master_Log_Pos: 2321
46               Relay_Log_Space: 591
47               Until_Condition: None
48                Until_Log_File: 
49                 Until_Log_Pos: 0
50            Master_SSL_Allowed: No
51            Master_SSL_CA_File: 
52            Master_SSL_CA_Path: 
53               Master_SSL_Cert: 
54             Master_SSL_Cipher: 
55                Master_SSL_Key: 
56         Seconds_Behind_Master: 0
57 Master_SSL_Verify_Server_Cert: No
58                 Last_IO_Errno: 0
59                 Last_IO_Error: 
60                Last_SQL_Errno: 0
61                Last_SQL_Error: 
62   Replicate_Ignore_Server_Ids: 
63              Master_Server_Id: 1023306
64                   Master_UUID: 842ea497-9551-11e8-83ca-080027de0e0e
65              Master_Info_File: mysql.slave_master_info
66                     SQL_Delay: 0
67           SQL_Remaining_Delay: NULL
68       Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
69            Master_Retry_Count: 86400
70                   Master_Bind: 
71       Last_IO_Error_Timestamp: 
72      Last_SQL_Error_Timestamp: 
73                Master_SSL_Crl: 
74            Master_SSL_Crlpath: 
75            Retrieved_Gtid_Set: 
76             Executed_Gtid_Set: 1b7181ee-6eaf-11e8-998e-080027de0e0e:1-3730259
77                 Auto_Position: 1
78          Replicate_Rewrite_DB: 
79                  Channel_Name: 
80            Master_TLS_Version: 
81 1 row in set (0.00 sec)

 

Check the log of MasterHA on zlm3.

 1 [root@zlm3 08:28:35 ~]
 2 #cd /var/log/masterha/app1
 3 
 4 [root@zlm3 08:29:12 /var/log/masterha/app1]
 5 #cat app1.log 
 6 Fri Aug  3 07:39:13 2018 - [info] MHA::MasterMonitor version 0.56.
 7 Fri Aug  3 07:39:14 2018 - [info] GTID failover mode = 1
 8 Fri Aug  3 07:39:14 2018 - [info] Dead Servers:
 9 Fri Aug  3 07:39:14 2018 - [info] Alive Servers:
10 Fri Aug  3 07:39:14 2018 - [info]   192.168.1.101(192.168.1.101:3306)
11 Fri Aug  3 07:39:14 2018 - [info]   192.168.1.102(192.168.1.102:3306)
12 Fri Aug  3 07:39:14 2018 - [info] Alive Slaves:
13 Fri Aug  3 07:39:14 2018 - [info]   192.168.1.102(192.168.1.102:3306)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
14 Fri Aug  3 07:39:14 2018 - [info]     GTID ON
15 Fri Aug  3 07:39:14 2018 - [info]     Replicating from 192.168.1.101(192.168.1.101:3306)
16 Fri Aug  3 07:39:14 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
17 Fri Aug  3 07:39:14 2018 - [info] Current Alive Master: 192.168.1.101(192.168.1.101:3306)
18 Fri Aug  3 07:39:14 2018 - [info] Checking slave configurations..
19 Fri Aug  3 07:39:14 2018 - [info]  read_only=1 is not set on slave 192.168.1.102(192.168.1.102:3306).
20 Fri Aug  3 07:39:14 2018 - [info] Checking replication filtering settings..
21 Fri Aug  3 07:39:14 2018 - [info]  binlog_do_db= , binlog_ignore_db= 
22 Fri Aug  3 07:39:14 2018 - [info]  Replication filtering check ok.
23 Fri Aug  3 07:39:14 2018 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
24 Fri Aug  3 07:39:14 2018 - [info] Checking SSH publickey authentication settings on the current master..
25 Fri Aug  3 07:39:14 2018 - [warning] HealthCheck: SSH to 192.168.1.101 is NOT reachable.
26 Fri Aug  3 07:39:14 2018 - [info] 
27 192.168.1.101(192.168.1.101:3306) (current master)
28  +--192.168.1.102(192.168.1.102:3306)
29 
30 Fri Aug  3 07:39:14 2018 - [info] Checking master_ip_failover_script status:
31 Fri Aug  3 07:39:14 2018 - [info]   /etc/masterha/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.1.101 --orig_master_ip=192.168.1.101 --orig_master_port=3306  --orig_master_ssh_port=3306
32 Fri Aug  3 07:39:14 2018 - [info]  OK.
33 Fri Aug  3 07:39:14 2018 - [warning] shutdown_script is not defined.
34 Fri Aug  3 07:39:14 2018 - [info] Set master ping interval 1 seconds.
35 Fri Aug  3 07:39:14 2018 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
36 Fri Aug  3 07:39:14 2018 - [info] Starting ping health check on 192.168.1.101(192.168.1.101:3306)..
37 Fri Aug  3 07:39:14 2018 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
38 Fri Aug  3 07:39:27 2018 - [info] Got terminate signal. Exit.
39 Fri Aug  3 07:40:03 2018 - [info] MHA::MasterMonitor version 0.56.
40 Fri Aug  3 07:40:04 2018 - [info] GTID failover mode = 1
41 Fri Aug  3 07:40:04 2018 - [info] Dead Servers:
42 Fri Aug  3 07:40:04 2018 - [info] Alive Servers:
43 Fri Aug  3 07:40:04 2018 - [info]   192.168.1.101(192.168.1.101:3306)
44 Fri Aug  3 07:40:04 2018 - [info]   192.168.1.102(192.168.1.102:3306)
45 Fri Aug  3 07:40:04 2018 - [info] Alive Slaves:
46 Fri Aug  3 07:40:04 2018 - [info]   192.168.1.102(192.168.1.102:3306)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
47 Fri Aug  3 07:40:04 2018 - [info]     GTID ON
48 Fri Aug  3 07:40:04 2018 - [info]     Replicating from 192.168.1.101(192.168.1.101:3306)
49 Fri Aug  3 07:40:04 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
50 Fri Aug  3 07:40:04 2018 - [info] Current Alive Master: 192.168.1.101(192.168.1.101:3306)
51 Fri Aug  3 07:40:04 2018 - [info] Checking slave configurations..
52 Fri Aug  3 07:40:04 2018 - [info]  read_only=1 is not set on slave 192.168.1.102(192.168.1.102:3306).
53 Fri Aug  3 07:40:04 2018 - [info] Checking replication filtering settings..
54 Fri Aug  3 07:40:04 2018 - [info]  binlog_do_db= , binlog_ignore_db= 
55 Fri Aug  3 07:40:04 2018 - [info]  Replication filtering check ok.
56 Fri Aug  3 07:40:04 2018 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
57 Fri Aug  3 07:40:04 2018 - [info] Checking SSH publickey authentication settings on the current master..
58 Fri Aug  3 07:40:04 2018 - [warning] HealthCheck: SSH to 192.168.1.101 is NOT reachable.
59 Fri Aug  3 07:40:04 2018 - [info] 
60 192.168.1.101(192.168.1.101:3306) (current master)
61  +--192.168.1.102(192.168.1.102:3306)
62 
63 Fri Aug  3 07:40:04 2018 - [info] Checking master_ip_failover_script status:
64 Fri Aug  3 07:40:04 2018 - [info]   /etc/masterha/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.1.101 --orig_master_ip=192.168.1.101 --orig_master_port=3306  --orig_master_ssh_port=3306
65 Fri Aug  3 07:40:04 2018 - [info]  OK.
66 Fri Aug  3 07:40:04 2018 - [warning] shutdown_script is not defined.
67 Fri Aug  3 07:40:04 2018 - [info] Set master ping interval 1 seconds.
68 Fri Aug  3 07:40:04 2018 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
69 Fri Aug  3 07:40:04 2018 - [info] Starting ping health check on 192.168.1.101(192.168.1.101:3306)..
70 Fri Aug  3 07:40:04 2018 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
71 Fri Aug  3 08:28:07 2018 - [info] Got terminate signal. Exit.

 

Test 2:Manual master failover

TAG标签:
版权声明:本文由澳门新浦金网站发布于数据库,转载请注明出处:MySQL高可用之MHA切换测试(switchover & failover)