Restart is a general O&M action. You can perform a restart for provisional maintenance of a server or for system parameter modifications to take effect. During the restart, the offline time of the node must be within the time specified by the server_permanent_offline_time parameter. Otherwise, the node will be permanently offline. If the maintenance of a server takes a long time, replace it. For more information about how to replace a server, see Replace a node.
Note
The server_permanent_offline_time parameter specifies the time threshold for heartbeat missing. When the heartbeat of an OBServer node is missing for the specified period of time, the OBServer node is considered permanently offline. Data replicas on a permanently offline OBServer node must be automatically supplemented. The default value is 3600s. For more information about this parameter, see server_permanent_offline_time.
Background information
OceanBase Database is a distributed database and is typically built based on a multi-replica architecture, such as three replicas in three IDCs of the same region and five replicas in five IDCs across three regions. A transaction is committed only when the transaction log is synchronized among the majority of replicas by using the Paxos protocol, thereby ensuring data consistency among replicas. When a minority of replicas are abnormal, a service level agreement (SLA) objective with a recovery point objective (RPO) of 0 is met.
In a multi-replica architecture, the STOP SERVER command can implement a restart without business loss. The STOP SERVER command involves two steps:
Switch all leaders away from the node to be restarted and make sure that the replicas on the remaining nodes are still in the majority.
Mark the node to be restarted (which is in the
ACTIVEstate, and the value of thestop_timefield is greater than 0) as stopped on RootService. After the client detects a stopped node, the client will no longer route business requests to this node.
After the Stop Server operation succeeds, the node restart will not cause issues such as election without a leader or errors returned on the client. The restart is transparent to business traffic. If the Stop Server operation fails, you must stop the restart and identify the causes. For example, replicas are insufficient, a latency exists in redo logs, or the total number of voting members is less than 3.
Procedure
The general procedure for restarting a node consists of the following steps: stop the node, initiate a minor compaction, stop the process, start the process, and start the node.
This topic describes how to restart a single node in a cluster. To restart multiple nodes, repeat the restart procedure multiple times.
Log in to the
systenant of the cluster as therootuser.Note that you must specify the corresponding options in the following sample code based on your actual database configurations.
obclient -h10.xx.xx.xx -P2883 -uroot@sys#obdemo -p***** -AFor more information about how to connect to a database, see Connection methods (MySQL mode) or Connection methods (Oracle mode).
Execute the following statement to isolate the target node.
The service continuity may be affected during the restart of a node. For example, if a cluster contains only one or two nodes or the tenant data is distributed only on two nodes, database services may become unavailable when a node is restarted. When you perform the
Stop Serveroperation to isolate the node, the system performs a security check to avoid affecting service continuity during the node restart. After the node is isolated, it no longer provides services. If theStop Serveroperation fails, you must check the cluster deployment information or resolve the issue based on the error information and then perform theStop Serveroperation again. If the service unavailability of the node is not a concern, skip this step.obclient [(none)]> ALTER SYSTEM STOP SERVER 'svr_ip:svr_port';where
svr_ipspecifies the IP address of the node to be stopped.svr_portspecifies the RPC port of the node to be stopped. The default value is 2882.
Here is an example:
obclient [(none)]> ALTER SYSTEM STOP SERVER '172.xx.xx.xx:2882';After the statement is executed, you can query the
oceanbase.DBA_OB_SERVERSview for theSTATUSfield of this server. In the view, the value of this field is stillACTIVEbut the value ofSTOP_TIMEchanges fromNULLto the point in time when the node is stopped.For more information about how to query the
oceanbase.DBA_OB_SERVERSview, see View a node.Execute the following statement to initiate a minor compaction for the node to be restarted. This is to shorten the time required to replay redo logs after the restart, thereby accelerating the restart.
obclient [(none)]> ALTER SYSTEM MINOR FREEZE SERVER = ('svr_ip:svr_port');where
svr_ipspecifies the IP address of the node to be restarted.svr_portspecifies the RPC port of the node to be restarted. The default value is 2882.
Here is an example:
obclient [(none)]> ALTER SYSTEM MINOR FREEZE SERVER = ('172.xx.xx.xx:2882');After you initiate a minor compaction, you must wait for the minor compaction to complete before you proceed to the next step. For more information about how to view the minor compaction progress, see View minor compaction information.
For more information about minor compactions, see Minor compaction.
Stop the observer process.
Log in as the
adminuser to the server where the node whose process is to be stopped is located.Access the
/home/admin/oceanbasedirectory from the command-line interface (CLI).[admin@xxx /]$ cd /home/admin/oceanbaseFor more information about the installation directory of OceanBase Database, see Structure of the OBServer installation directory.
Run the following command to obtain the process ID of the node.
[admin@xxx oceanbase]$ ps -ef | grep observer | grep -v grep admin 103364 1 99 2022 ? 51-17:24:41 /home/admin/oceanbase/bin/observerHere,
103364is the process ID of the node.Stop the observer process.
The syntax of the command is as follows:
[admin@xxx oceanbase]$ kill -9 pidHere,
pidis the ID of the observer process on the node to be stopped.Here is an example:
[admin@xxx oceanbase]$ kill -9 103364Notice
You can stop only one observer process on a node in the deployment directory. To stop the observer processes on multiple nodes, log in to the corresponding servers one by one and repeat the preceding procedure.
Run the following command to verify whether the process is stopped.
[admin@xxx oceanbase]$ ps aux | grep observerIf no response is returned after the command is executed, the process is stopped.
(Optional) Perform provisional maintenance on the server as needed.
Start the observer process.
Log in as the
adminuser to the server where the node whose process is to be started is located.Start the observer process.
[admin@xxx oceanbase]$ cd /home/admin/oceanbase && ./bin/observerNotice
You can start only one observer process on a node in the deployment directory. To start the observer processes on multiple nodes, log in to the corresponding servers one by one and repeat the preceding procedure.
For more information about the installation directory of OceanBase Database, see Structure of the OBServer installation directory.
After the statement is executed, you can query the
START_SERVICE_TIMEfield in theoceanbase.DBA_OB_SERVERSview. If the field value is notNULL, the observer process is started.
Execute the following statement to start the node.
obclient [(none)]> ALTER SYSTEM START SERVER 'svr_ip:svr_port';where
svr_ipspecifies the IP address of the node to be started.svr_portspecifies the RPC port of the node to start. The default value is 2882.
Here is an example:
obclient [(none)]> ALTER SYSTEM START SERVER '172.xx.xx.xx:2882';After the statement is executed, you can query the
STOP_TIMEfield in theoceanbase.DBA_OB_SERVERSview. If the field value isNULL, the observer process is started on the node and the node can provide external services.For more information about how to query the
oceanbase.DBA_OB_SERVERSview, see View a node.
References
For more node-related O&M operations, see the following topics: