You can isolate an OBServer node when it fails. In this way, new read or write requests are not routed to the failed OBServer node. The failed OBServer node is in the stopped state. The leader on the failed OBServer node is switched to another OBServer node to recover the write service for users and log synchronization in the cluster.
If necessary, you can restart the failed node to solve the problem. For more information, see Restart an OBServer node. To guarantee data safety, we recommend that you stop the node first and then restart it. For more information, see STOP SERVER. After the node is stopped, you can safely restart it.
Note
- An OBServer node failure may be caused by network isolation or software faults of the node, such as request backlog, memory exhaustion, and process core dump.
- For more information about replicas, see Overview of distributed database objects.
Here is the sample statement for isolating a failed OBServer node:
ALTER SYSTEM ISOLATE SERVER 'ip:port' [,'ip:port'...] [ZONE [=] 'zone']
This statement can be executed only in the sys tenant.
Here is an example:
Log on to the
systenant as therootuser.Execute the following statement to isolate the failed OBServer node.
For example,
obclient> ALTER SYSTEM ISOLATE SERVER '10.10.10.10:2882' ZONE='zone1';If the statement succeeds, the failed OBServer node is isolated. In the
__all_servertable, the value ofstatusof the OBServer node is stillactive, but the value ofstop_timeof the OBServer node is not0, which indicates that the OBServer node is in thestoppedstate. In this case, the value ofstop_timeis the timestamp when the OBServer node is isolated.Here is an example of querying the
__all_servertable:obclient> SELECT * FROM oceanbase.__all_server\G *************************** 1. row *************************** gmt_create: 2021-12-03 09:50:42.548125 gmt_modified: 2021-12-08 10:27:02.114234 svr_ip: 10.10.10.10 svr_port: 2882 id: 2 zone: zone2 inner_port: 2881 with_rootserver: 0 status: active block_migrate_in_time: 0 build_version: 3.2.1_20211031212624-2c7eade2fd94a4ae32bec1683d1118da9d30cf8b(Oct 31 2021 22:03:03) stop_time: 1638930422111133 start_service_time: 1638496494238956 first_sessid: 0 with_partition: 1 last_offline_time: 0 *************************** 2. row *************************** gmt_create: 2021-12-03 09:50:42.443685 gmt_modified: 2021-12-03 09:54:55.222649 svr_ip: 10.10.10.1 svr_port: 2882 id: 1 zone: zone1 inner_port: 2881 with_rootserver: 1 status: active block_migrate_in_time: 0 build_version: 3.2.1_20211031212624-2c7eade2fd94a4ae32bec1683d1118da9d30cf8b(Oct 31 2021 22:03:03) stop_time: 0 start_service_time: 1638496493237400 first_sessid: 0 with_partition: 1 last_offline_time: 0 2 rows in set
You can execute the ALTER SYSTEM START SERVER '10.10.10.10:2882' statement to cancel the isolation of an OBServer node.