Restarting is a common O&M task. It is suitable for scenarios such as brief server maintenance or applying system configuration changes that require a restart to take effect. The restart duration must be shorter than the time specified by the cluster-level parameter server_permanent_offline_time. Otherwise, the node will be permanently marked as offline. If the server requires prolonged maintenance, you need to follow the server replacement procedure. For more information, see Replace a node.
Note
The cluster-level parameter server_permanent_offline_time specifies the time threshold for determining when a node is permanently marked as offline after its heartbeat is interrupted. Once a node is permanently offline, its data replicas will be automatically supplemented. The default value is 3600 seconds. For more information, see server_permanent_offline_time.
Background information
OceanBase Database, as a distributed database, is typically deployed with multiple replicas (for example, three replicas in a three-IDC architecture within the same region, or five replicas across three regions with five IDCs). The Paxos protocol is utilized to achieve majority consensus among replicas during transaction commits, ensuring data consistency across replicas and maintaining an SLA of RPO=0 even in cases where minority replicas fail.
The STOP SERVER command enables a lossless restart in a multi-replica architecture. When executed, the STOP SERVER command performs the following operations:
Removes all leaders from the node to be restarted and ensures that the remaining replicas on other nodes still meet the majority requirement.
Marks the node to be restarted as stopped in the RootService (the node status is set to
ACTIVE, and thestop_timefield is updated to a value greater than 0). The client detects this status and avoids routing business requests to the stopped node.
Once the STOP SERVER command is successfully executed, restarting the node will not trigger leader re-elections or client errors, ensuring complete transparency to business traffic. If the STOP SERVER command fails, you must halt the restart process and investigate the cause. Potential reasons for failure include insufficient replicas, redo log delays, or the total number of voting members falling below the majority requirement.
Procedure
The procedure for restarting a node involves the following steps: stopping services, performing a minor compaction, shutting down the process, starting the process, and restarting services.
This topic provides guidance for restarting one node in a cluster. If you want to restart multiple nodes, you can repeat the same steps for each node.
Log in to the
systenant of the cluster as therootuser.Make sure to replace the sample parameters in the following command with the actual configurations of your database environment.
obclient -h10.xx.xx.xx -P2883 -uroot@sys#obdemo -p***** -AFor more information about how to connect to a database, see: Connection methods (MySQL mode) and Connection methods (Oracle mode).
Run the following command to isolate the node.
During the restart process, service continuity may be interrupted. For example, if the cluster contains only one or two nodes, or if the data of a tenant is distributed across only two nodes, the system may become unavailable during the restart. The
Stop Serveroperation ensures service continuity during the restart by isolating the node. Once the node is successfully isolated, it will no longer provide services. If theStop Serveroperation fails, troubleshoot the issue based on the error messages, adjust the cluster deployment if necessary, and retry the operation. Alternatively, if acceptable, you can skip this step and proceed with stopping the service directly.obclient [(none)]> ALTER SYSTEM STOP SERVER 'svr_ip:svr_port';Parameter descriptions:
svr_ip: the IP address of the node to be stopped.svr_port: the RPC port of the node to be stopped. The default value is 2882.
Example:
obclient [(none)]> ALTER SYSTEM STOP SERVER '172.xx.xx.xx:2882';After successful execution, query the
STATUScolumn of the specified server in theoceanbase.DBA_OB_SERVERSview. The value of this column remainsACTIVE, but theSTOP_TIMEcolumn changes fromNULLto the time when the service was stopped.For more information about how to query the
oceanbase.DBA_OB_SERVERSview, see View a node.Run the following command to perform a minor compaction on the node to shorten the time required for redo log replay after the restart and speed up the restart.
obclient [(none)]> ALTER SYSTEM MINOR FREEZE SERVER = ('svr_ip:svr_port');Parameter descriptions:
svr_ip: the IP address of the node to be restarted.svr_port: the RPC port of the node to be restarted. The default value is 2882.
Example:
obclient [(none)]> ALTER SYSTEM MINOR FREEZE SERVER = ('172.xx.xx.xx:2882');Wait for the minor compaction to complete before proceeding to the next step. For more information about how to check the minor compaction progress, see View minor compaction information.
For more details about minor compactions, see Major and minor compactions.
Stop the observer process.
Log in to the server where the observer process is running as the
adminuser.Navigate to the
/home/admin/oceanbasedirectory using the command line.[admin@xxx /]$ cd /home/admin/oceanbaseFor more information about the installation directories of OceanBase Database, see OBServer installation directory structure.
Run the following command to view and obtain the process ID of the node.
[admin@xxx oceanbase]$ ps -ef | grep observer | grep -v grep admin 103364 1 99 2022 ? 51-17:24:41 /home/admin/oceanbase/bin/observerIn this example,
103364is the process ID of the node.Stop the observer process.
The sample command is as follows:
[admin@xxx oceanbase]$ kill -9 pidHere,
pidis the observer process ID of the node to be stopped.Example:
[admin@xxx oceanbase]$ kill -9 103364Notice
You can stop only one observer process in a deployment directory. If you want to stop observer processes on multiple nodes, you need to log in to each server in sequence.
Confirm that the observer process has stopped by running the following command:
[admin@xxx oceanbase]$ ps aux | grep observerIf no information is returned, the process has stopped successfully.
(Optional) If necessary, perform maintenance on the server during this step.
Start the observer process.
Log in to the server where the observer process is to be started as the
adminuser.Start the observer process.
[admin@xxx oceanbase]$ cd /home/admin/oceanbase && ./bin/observerNotice
You can start only one observer process in a deployment directory. If you want to start observer processes on multiple nodes, you need to log in to each server in sequence.
For more information about the installation directories of OceanBase Database, see OBServer installation directory structure.
After successful execution, query the
START_SERVICE_TIMEcolumn in theoceanbase.DBA_OB_SERVERSview. If the value of this column is notNULL, the observer process has started successfully.
Run the following command to restart the node's services:
obclient [(none)]> ALTER SYSTEM START SERVER 'svr_ip:svr_port';where:
svr_ip: the IP address of the node to be started.svr_port: the RPC port of the node to be started. The default value is 2882.
Example:
obclient [(none)]> ALTER SYSTEM START SERVER '172.xx.xx.xx:2882';After successful execution, query the
STOP_TIMEcolumn in theoceanbase.DBA_OB_SERVERSview. If the value of this column isNULL, the node services have started successfully, and the node is ready to provide services.For more information about querying the
oceanbase.DBA_OB_SERVERSview, see View a node.
References
For more information about node O&M, see the following topics: