Restart is a common O&M action. It is useful for brief maintenance of servers and for applying system configuration changes. However, the node must be offline for no more than the period specified in the server_permanent_offline_time parameter. Otherwise, the node will be permanently offline. If the server needs long-term maintenance, you must replace the server. For more information, see Replace a node.
Note
The cluster-level parameter server_permanent_offline_time specifies the time threshold of interrupted heartbeats, in seconds, after which a node is considered permanently offline. The data replicas on a permanently offline node need to be automatically supplemented. The default value is 3600s. For more information about this parameter, see server_permanent_offline_time.
Background information
As a distributed database, OceanBase Database is typically deployed with multiple replicas (five replicas in a three IDC deployment and three replicas in a five IDC deployment). When a transaction is committed, it is forwarded to multiple replicas, and the Paxos protocol is used to achieve majority vote commitment among the replicas to maintain data consistency among replicas. In the case of an exception in a minority of replicas, the service level agreement (SLA) with RPO=0 can be achieved.
The STOP SERVER command can restart a server without data loss in a multi-replica architecture. The STOP SERVER command performs the following operations:
It strips all leaders from the server to be restarted and ensures that the replicas on other servers (excluding the server to be restarted) meet the majority requirement.
It uses the Root Service to mark the server to be restarted as stopped (with the
ACTIVEstatus and a value greater than 0 in thestop_timefield). After a client recognizes this mark, it will not route business requests to the server.
If the STOP SERVER command succeeds, the server is successfully stopped without triggering leader elections or causing errors in clients, thus making the operation transparent to the business traffic. If the STOP SERVER command fails, you need to stop the operation and check the cause. Some possible causes include insufficient replicas, redo log latency, and fewer than the majority of voting members.
Procedure
The major steps to restart a node are: stop services, perform a minor compaction, close the process, start the process, and start services.
This topic provides the procedure to restart one node in a cluster. If you want to restart multiple nodes, you can perform the same operation multiple times.
Log in to the
systenant of the cluster as therootuser.Note that you must specify the corresponding parameters in the following sample code based on your actual database configurations.
obclient -h10.xx.xx.xx -P2883 -uroot@sys#obdemo -p***** -AFor more information about how to connect to a database, see Overview (MySQL mode) and Overview (Oracle mode).
Execute the following command to isolate the node.
During the restart of the node, the service continuity may be interrupted. For example, if the cluster contains only one or two nodes or the data of a tenant is distributed on only two nodes, the system cannot provide services during the restart of the node. The Stop Server operation ensures that the interruption to service continuity is minimized. After the node is isolated, the system will perform a safety check to ensure that the restart of the node does not interrupt the continuity of the system services. Once the node is successfully isolated, it will no longer provide services. If the Stop Server operation fails, check the deployment of the cluster or resolve the issue based on the error message, and then perform the Stop Server operation again; or if you can tolerate the stop of the node, you can skip this step.
obclient [(none)]> ALTER SYSTEM STOP SERVER 'svr_ip:svr_port';The parameters are described as follows:
svr_ip: the IP address of the node to be stopped.svr_port: the RPC port of the node to be stopped. The default value is 2882.
Here is an example:
obclient [(none)]> ALTER SYSTEM STOP SERVER '172.xx.xx.xx:2882';After the execution is successful, query the
STATUScolumn of theoceanbase.DBA_OB_SERVERSview for the server that you want to restart. If the value of theSTATUScolumn remainsACTIVEbut the value of theSTOP_TIMEcolumn changes fromNULLto the time when the service is stopped, the server has been isolated.For more information about how to query the
oceanbase.DBA_OB_SERVERSview, see View a node.Execute the following command to perform a minor compaction on the node to shorten the time required for redo log replay after the node is restarted, thus accelerating the restart process.
obclient [(none)]> ALTER SYSTEM MINOR FREEZE SERVER = ('svr_ip:svr_port');The parameters are described as follows:
svr_ip: the IP address of the node to be restarted.svr_port: the RPC port of the node to be restarted. The default value is 2882.
Here is an example:
obclient [(none)]> ALTER SYSTEM MINOR FREEZE SERVER = ('172.xx.xx.xx:2882');After the minor compaction is completed, proceed to the next step. For more information about how to check the progress of a minor compaction, see View minor compaction information.
For more information about minor compactions, see Major and minor compactions.
Stop the observer process.
Log in to the server where the observer process resides as the
adminuser.Use the command-line tool to navigate to the
/home/admin/oceanbasedirectory.[admin@xxx /]$ cd /home/admin/oceanbaseFor more information about the installation directories of OceanBase Database, see OBServer installation directory structure.
Execute the following command to view and obtain the process ID of the node.
[admin@xxx oceanbase]$ ps -ef | grep observer | grep -v grep admin 103364 1 99 2022 ? 51-17:24:41 /home/admin/oceanbase/bin/observerIn this example,
103364is the process ID of the node.Stop the observer process.
The command is as follows:
[admin@xxx oceanbase]$ kill -9 pidHere,
pidis the observer process ID of the node to be stopped.Here is an example:
[admin@xxx oceanbase]$ kill -9 103364Notice
You can stop only one observer process in a deployment directory. If you want to stop observer processes on multiple nodes, you need to log in to each server in sequence.
Execute the following command to check whether the observer process has stopped.
[admin@xxx oceanbase]$ ps aux | grep observerIf no information is returned after the command execution, the observer process has been stopped successfully.
(Optional) If you want to perform maintenance on the server, perform the maintenance in this step.
Start the observer process.
Log in to the server where the observer process resides as the
adminuser.Start the observer process.
[admin@xxx oceanbase]$ cd /home/admin/oceanbase && ./bin/observerNotice
You can start only one observer process in a deployment directory. If you want to start observer processes on multiple nodes, you need to log in to each server in sequence.
For more information about the installation directories of OceanBase Database, see OBServer installation directory structure.
If the observer process starts successfully, the value of the
START_SERVICE_TIMEcolumn in theoceanbase.DBA_OB_SERVERSview changes fromNULLto the time when the service is started.
Execute the following command to start the services of the node.
obclient [(none)]> ALTER SYSTEM START SERVER 'svr_ip:svr_port';where:
svr_ip: the IP address of the node.svr_port: the RPC port of the node. The default value is 2882.
Here is an example:
obclient [(none)]> ALTER SYSTEM START SERVER '172.xx.xx.xx:2882';After the execution is successful, query the
STOP_TIMEcolumn of theoceanbase.DBA_OB_SERVERSview for the node. If the value of theSTOP_TIMEcolumn isNULL, the node has started the services and can provide services.For more information about how to query the
oceanbase.DBA_OB_SERVERSview, see View a node.
References
For more information about node O&M, see the following topics: