Restart is a general O&M action. You can perform a restart for provisional maintenance of a server or for system parameter modifications to take effect. During the restart, the offline time of the node must be within the time specified by the server_permanent_offline_time parameter. Otherwise, the node will be permanently offline. If the maintenance of a server takes a long time, you would need to replace the server. For more information about how to replace a server, see Replace a node.
Note
The cluster-level parameter server_permanent_offline_time specifies the maximum heartbeat interruption period for a node to be considered permanently offline. The replica on a permanently offline node is automatically replaced. The default value is 3600s. For more information about this parameter, see server_permanent_offline_time.
Background
OceanBase Database is a distributed database and is typically built based on a multi-replica architecture, for example, three replicas in three IDCs of the same region, or five replicas in five IDCs across three regions. A transaction is committed only when the transaction log is synchronized among the majority of replicas by using the Paxos protocol, thereby ensuring data consistency among replicas. When a minority of replicas are abnormal, a service level agreement (SLA) objective with a recovery point objective (RPO) of 0 is met.
In a multi-replica architecture, the STOP SERVER command can implement a restart without business loss. The STOP SERVER command involves two steps:
Switch all leaders away from the node to be restarted and make sure that the replicas on the remaining nodes are still the majority.
Mark the node to be started as stopped (in the
ACTIVEstate and the value of thestop_timefield is greater than 0) on Root Service. After the client detects a stopped node, it will no longer route business requests to this node.
After the Stop Server operation succeeds, the node restart will not cause issues such as election without a leader or errors returned on the client. The restart is transparent to business traffic. If the Stop Server operation fails, you must stop the restart and identify the causes. For example, replicas are insufficient, a latency exists in REDO logs, or the total number of voting members is less than 3.
Procedure
The general procedure for restarting a node consists of the following steps: stop the node, initiate a minor compaction, stop the process, start the process, and start the node.
This topic describes how to restart a single node in a cluster. To restart multiple nodes, repeat the restart procedure multiple times.
Log on to the
systenant of the cluster as therootuser.Run the following command to log on. You must replace the related information in the command based on the actual database environment.
obclient -h10.xx.xx.xx -P2883 -uroot@sys -p***** -AFor more information about how to connect to OceanBase Database, see Overview of the Connect to OceanBase Database chapter in Develop Applications in MySQL Mode and Overview of the Connect to OceanBase Database chapter in Develop Applications in Oracle Mode.
Execute the following statement to isolate the target node.
The service continuity may be affected during the restart of a node. For example, if a cluster contains only one or two nodes or the tenant data is distributed only on two nodes, database services may become unavailable when a node is restarted. When you perform the Stop Server operation to isolate the node, the system performs a security check to avoid affecting service continuity during the node restart. After the node is isolated, it no longer provides services. If the Stop Server operation fails, you must check the cluster deployment information or resolve the issue based on the error information and then perform the Stop Server operation again. If the service unavailability of the node is not a concern, skip this step.
obclient [(none)]> ALTER SYSTEM STOP SERVER 'svr_ip:svr_port';svr_ip: the IP address of the node to be stopped.svr_port: the RPC port of the node to be stopped. Default value: 2882.
Here is an example:
obclient [(none)]> ALTER SYSTEM STOP SERVER '172.xx.xx.xx:2882';After the statement is executed, you can query the
oceanbase.DBA_OB_SERVERSview for theSTATUSfield of this server. In the view, the value of this field is stillACTIVEbut the value ofSTOP_TIMEchanges fromNULLto the point in time when the node is stopped.For more information about how to query the
oceanbase.DBA_OB_SERVERSview, see View a node.Execute the following statement to initiate a minor compaction for the node to be restarted. This is to shorten the time required to replay REDO logs after the restart, thereby accelerating the restart.
obclient [(none)]> ALTER SYSTEM MINOR FREEZE SERVER = ('svr_ip:svr_port');svr_ip: the IP address of the node to be restarted.svr_port: the RPC port of the node to be restarted. Default value: 2882.
Sample code:
obclient [(none)]> ALTER SYSTEM MINOR FREEZE SERVER = ('172.xx.xx.xx:2882');After you initiate a minor compaction, you must wait for the minor compaction to complete before you proceed to the next step. For more information about how to view the minor compaction progress, see View minor compaction information.
Stop the observer process.
Log on as the
adminuser to the server containing the node whose process is to be stopped.Access the
/home/admin/oceanbase/bindirectory from the command-line interface (CLI).[admin@xxx /]$ cd /home/admin/oceanbase/binRun the following command to obtain the process ID of the node.
[admin@xxx oceanbase]$ ps -ef | grep observer | grep -v grep admin 103364 1 99 2022 ? 51-17:24:41 /home/admin/oceanbase/bin/observerIn this example,
103364is the process ID of the node.Stop the observer process using the following command.
[admin@xxx oceanbase]$ kill -9 pidIn the command,
pidis the ID of the observer process on the node to be stopped.Here is an example:
[admin@xxx oceanbase]$ kill -9 103364Notice
You can stop only one observer process on a node in the deployment directory. To stop the observer processes on multiple nodes, log on to the corresponding servers one by one and repeat the preceding procedure.
Run the following command to verify whether the process is stopped.
[admin@xxx oceanbase]$ ps aux | grep observerIf no response is returned after you run the command, the process is stopped.
(Optional) Perform provisional maintenance on the server as needed.
Start the observer process.
Log on as the
adminuser to the server containing the node whose process is to be started.Start the observer process.
[admin@xxx oceanbase]$ cd /home/admin/oceanbase && ./bin/observerNotice
You can start only one observer process on a node in the deployment directory. To start the observer processes on multiple nodes, log on to the corresponding servers one by one and repeat the procedure.
Execute the following statement to start the node.
obclient [(none)]> ALTER SYSTEM START SERVER 'svr_ip:svr_port';svr_ip: the IP address of the node to be started.svr_port: the RPC port of the node to be started. Default value: 2882.
Here is an example:
obclient [(none)]> ALTER SYSTEM START SERVER '172.xx.xx.xx:2882';After the statement is executed, you can query the
oceanbase.DBA_OB_SERVERSview. The value of theSTART_SERVICE_TIMEfield indicates the time when the node is started. If the value isNULL, the node is not started.For more information about how to query the
oceanbase.DBA_OB_SERVERSview, see View a node.
More information
For more node-related O&M operations, see the following topics: