Restarting a node is a common O&M action. It is suitable for brief maintenance of the server and scenarios where you need to restart the system after modifying system configuration items. The duration of the restart must be shorter than the time specified in the configuration item server_permanent_offline_time. Otherwise, the node will be permanently taken offline. If the server needs long-term maintenance, you need to replace the server. For more information, see Replace a node.
Note
The cluster-level configuration item server_permanent_offline_time sets the threshold of the time interval for considering a node to be permanently taken offline after its heartbeats are interrupted. In other words, a node is considered to be permanently taken offline if its heartbeats are interrupted for a time period longer than the specified threshold. The replica data on the permanently offline node needs to be automatically supplemented. The default value is 3600s. For more information about this configuration item, see server_permanent_offline_time.
Background information
As a distributed database, OceanBase Database is typically deployed with multiple replicas (five replicas in a three IDC deployment and three replicas in a five IDC deployment). When a transaction is committed, it is forwarded to multiple replicas and the majority of replicas commit the transaction to maintain data consistency. If an exception occurs in the minority of replicas, the system can still meet the SLA of RPO=0.
The STOP SERVER command can achieve lossless restart in a multi-replica architecture. The STOP SERVER command performs the following operations:
It strips all leaders from the target node and ensures that the remaining replicas on other nodes achieve the majority.
It uses the Root Service to mark the target node as stopped (node status is
ACTIVEandstop_timeis greater than 0). After a client recognizes this mark, it will not route business requests to the node.
After the STOP SERVER command succeeds, the restarted node does not cause leader elections or client errors, remaining transparent to the business traffic. If the STOP SERVER command fails, you need to stop the restart and check the cause. Some possible causes include insufficient replicas, redo log delay, and fewer than the majority of voting members.
Procedure
The major steps to restart a node are: stop services, perform a minor compaction, shut down the process, start the process, and start services.
This topic provides the procedure to restart a node in a cluster. If you want to restart multiple nodes, you can perform the same operation multiple times.
Log in to the
systenant of the cluster as therootuser.Note that you must specify the corresponding parameters in the following sample code based on your database connection details.
obclient -h10.xx.xx.xx -P2883 -uroot@sys#obdemo -p***** -AFor more information about how to connect to the database, see Overview (MySQL mode) and Overview (Oracle mode).
Run the following command to isolate the node to be restarted.
During the restart of a node, the service continuity may be interrupted. For example, if a cluster has only one or two nodes or the data of a tenant is distributed on only two nodes, the system cannot provide services during the restart of a node. The Stop Server operation ensures that the interruption to service continuity is minimized. After a node is isolated, it will no longer provide services. If the Stop Server operation fails, you can perform the operation again or skip this step if you can tolerate the interruption of service continuity.
obclient [(none)]> ALTER SYSTEM STOP SERVER 'svr_ip:svr_port';The parameters are described as follows:
svr_ip: the IP address of the node to be stopped.svr_port: the RPC port of the node to be stopped. The default value is 2882.
Here is an example:
obclient [(none)]> ALTER SYSTEM STOP SERVER '172.xx.xx.xx:2882';After the execution is successful, query the
STATUScolumn of theoceanbase.DBA_OB_SERVERSview for the status of the server. You will find that the value of this column remainsACTIVEunchanged, but the value of theSTOP_TIMEcolumn changes fromNULLto the time when the service is stopped.For more information about how to query the
oceanbase.DBA_OB_SERVERSview, see View a node.Run the following command to perform a minor compaction on the node to be restarted. This helps shorten the time required for redo log replay after the restart and speed up the restart.
obclient [(none)]> ALTER SYSTEM MINOR FREEZE SERVER = ('svr_ip:svr_port');The parameters are described as follows:
svr_ip: the IP address of the node to be restarted.svr_port: the RPC port of the node to be restarted. The default value is 2882.
Here is an example:
obclient [(none)]> ALTER SYSTEM MINOR FREEZE SERVER = ('172.xx.xx.xx:2882');After the minor compaction is completed, proceed to the next step. For more information about how to check the minor compaction progress, see View information about minor compactions.
For more information about minor compactions, see Minor compaction.
Stop the observer process.
Log in to the server where the process to be stopped is located as the
adminuser.Use the command-line tool to navigate to the
/home/admin/oceanbasedirectory of the server.[admin@xxx /]$ cd /home/admin/oceanbaseFor more information about the installation directories of OceanBase Database, see OBServer installation directory structure.
Run the following command to view and obtain the process ID of the node.
[admin@xxx oceanbase]$ ps -ef | grep observer | grep -v grep admin 103364 1 99 2022 ? 51-17:24:41 /home/admin/oceanbase/bin/observerIn this example,
103364is the process ID of the node.Stop the observer process.
The command is as follows:
[admin@xxx oceanbase]$ kill -9 pidHere,
pidis the observer process ID of the node to be stopped.Here is an example:
[admin@xxx oceanbase]$ kill -9 103364Notice
You can stop only one observer process in a deployment directory. If you want to stop observer processes on multiple nodes, you need to log in to each server in sequence.
Run the following command to check whether the process has stopped.
[admin@xxx oceanbase]$ ps aux | grep observerIf no information is returned after the command execution, the process has stopped successfully.
(Optional) If you want to perform maintenance on the server, perform the maintenance in this step.
Start the observer process.
Log in to the server where the process to be started is located as the
adminuser.Start the observer process.
[admin@xxx oceanbase]$ cd /home/admin/oceanbase && ./bin/observerNotice
You can start only one observer process in a deployment directory. If you want to start observer processes on multiple nodes, you need to log in to each server in sequence.
For more information about the installation directories of OceanBase Database, see OBServer installation directory structure.
After the execution is successful, query the
START_SERVICE_TIMEcolumn of theoceanbase.DBA_OB_SERVERSview. If this value is notNULL, the observer process has been started successfully.
Run the following command to start the services of the node.
obclient [(none)]> ALTER SYSTEM START SERVER 'svr_ip:svr_port';where:
svr_ip: the IP address of the node.svr_port: the RPC port of the node. The default value is 2882.
Here is an example:
obclient [(none)]> ALTER SYSTEM START SERVER '172.xx.xx.xx:2882';After the execution is successful, query the
STOP_TIMEcolumn of theoceanbase.DBA_OB_SERVERSview. If this value isNULL, the node has started services and can provide services.For more information about how to query the
oceanbase.DBA_OB_SERVERSview, see View a node.
References
For more information about node O&M, see the following topics: