You can isolate a zone when it fails. In this way, new read or write requests are not routed to processes in the failed zone. An isolated zone is in the Stopped state and cannot host the leader. Therefore, you must switch the leader from the failed zone to another zone, so as to recover the write service for users and log synchronization in the cluster.
If necessary, you can restart the failed node to solve the problem. For more information, see Restart an OBServer node. To guarantee data safety, we recommend that you stop the node first and then restart it. For more information, see STOP SERVER. If all the nodes in the failed zone need to be restarted, you can stop the failed zone. For more information, see STOP ZONE. After the node or the zone is stopped, you can safely restart the node.
After you troubleshoot the zone, you must start the zone to switch the leader back. For more information, see START ZONE.
Note
- A zone failure may be caused by network isolation between OBServer nodes in the zone, or software faults of the nodes, such as request backlog, memory exhaustion, and process core dump.
- For more information about leader and follower, see Overview of distributed database objects.
Sample statement for isolating a failed zone:
obclient> ALTER SYSTEM ISOLATE ZONE 'zone_name';
This statement can be executed only in the sys tenant.
The following is an example:
Log on to the
systenant as therootuser.Execute the following statement to isolate the failed zone:
obclient> ALTER SYSTEM ISOLATE ZONE 'zone1';If the statement succeeds, the failed zone is isolated. In the
__all_zonetable, the value ofstatusof the isolated zone in thenamecolumn isINACTIVE, which indicates that the zone is in theINACTIVEstate.Example of querying the
oceanbase.__all_zonetable:obclient> SELECT * FROM oceanbase.__all_zone WHERE zone= 'zone1'; +----------------------------+----------------------------+-------+---------------------+------------------+-----------+ | gmt_create | gmt_modified | zone | name | value | info | +----------------------------+----------------------------+-------+---------------------+------------------+-----------+ | 2021-11-22 18:00:13.814701 | 2022-01-04 02:00:38.221218 | zone1 | all_merged_version | 45 | | | 2021-11-22 18:00:13.814422 | 2022-01-04 02:00:21.210032 | zone1 | broadcast_version | 45 | | | 2021-11-22 18:00:13.815255 | 2021-11-22 18:00:22.691700 | zone1 | idc | 0 | BJ1 | | 2021-11-22 18:00:13.814904 | 2022-01-04 02:00:38.220994 | zone1 | is_merge_timeout | 0 | | | 2021-11-22 18:00:13.814331 | 2022-01-04 02:00:38.220232 | zone1 | is_merging | 0 | | | 2021-11-22 18:00:13.814600 | 2022-01-04 02:00:38.220771 | zone1 | last_merged_time | 1641232838219704 | | | 2021-11-22 18:00:13.814510 | 2022-01-04 02:00:38.220535 | zone1 | last_merged_version | 45 | | | 2021-11-22 18:00:13.814809 | 2022-01-04 02:00:21.210365 | zone1 | merge_start_time | 1641232821208596 | | | 2021-11-22 18:00:13.815080 | 2022-01-04 02:00:38.221669 | zone1 | merge_status | 0 | IDLE | | 2021-11-22 18:00:13.815168 | 2021-11-22 18:00:13.815168 | zone1 | region | 0 | BEIJING | | 2021-11-22 18:00:13.814239 | 2021-11-22 18:00:13.814239 | zone1 | status | 2 | INACTIVE | | 2021-11-22 18:00:13.814993 | 2021-11-22 18:00:13.814993 | zone1 | suspend_merging | 0 | | | 2021-11-22 18:00:13.815343 | 2021-11-22 18:00:13.815343 | zone1 | zone_type | 0 | ReadWrite | +----------------------------+----------------------------+-------+---------------------+------------------+-----------+ 13 rows in set
You can execute the ALTER SYSTEM START ZONE 'zone1' statement to cancel the isolation of a zone.