Restart a node|V4.3.5| docs|Distributed Database

Restart a node

Last Updated：2025-04-02 06:37:50 Updated

Restarting is a common O&M task. It is suitable for scenarios such as brief server maintenance or applying system configuration changes that require a restart to take effect. The restart duration must be shorter than the time specified by the cluster-level parameter server_permanent_offline_time. Otherwise, the node will be permanently marked as offline. If the server requires prolonged maintenance, you need to follow the server replacement procedure. For more information, see Replace a node.

Note

The cluster-level parameter server_permanent_offline_time specifies the time threshold for determining when a node is permanently marked as offline after its heartbeat is interrupted. Once a node is permanently offline, its data replicas will be automatically supplemented. The default value is 3600 seconds. For more information, see server_permanent_offline_time.

Background information

OceanBase Database, as a distributed database, is typically deployed with multiple replicas (for example, three replicas in a three-IDC architecture within the same region, or five replicas across three regions with five IDCs). The Paxos protocol is utilized to achieve majority consensus among replicas during transaction commits, ensuring data consistency across replicas and maintaining an SLA of RPO=0 even in cases where minority replicas fail.

The STOP SERVER command enables a lossless restart in a multi-replica architecture. When executed, the STOP SERVER command performs the following operations:

Removes all leaders from the node to be restarted and ensures that the remaining replicas on other nodes still meet the majority requirement.
Marks the node to be restarted as stopped in the RootService (the node status is set to ACTIVE, and the stop_time field is updated to a value greater than 0). The client detects this status and avoids routing business requests to the stopped node.

Once the STOP SERVER command is successfully executed, restarting the node will not trigger leader re-elections or client errors, ensuring complete transparency to business traffic. If the STOP SERVER command fails, you must halt the restart process and investigate the cause. Potential reasons for failure include insufficient replicas, redo log delays, or the total number of voting members falling below the majority requirement.

Procedure

The procedure for restarting a node involves the following steps: stopping services, performing a minor compaction, shutting down the process, starting the process, and restarting services.

This topic provides guidance for restarting one node in a cluster. If you want to restart multiple nodes, you can repeat the same steps for each node.

Log in to the sys tenant of the cluster as the root user.

Make sure to replace the sample parameters in the following command with the actual configurations of your database environment.
```
obclient -h10.xx.xx.xx -P2883 -uroot@sys#obdemo -p***** -A
```
For more information about how to connect to a database, see: Connection methods (MySQL mode) and Connection methods (Oracle mode).
Run the following command to isolate the node.

During the restart process, service continuity may be interrupted. For example, if the cluster contains only one or two nodes, or if the data of a tenant is distributed across only two nodes, the system may become unavailable during the restart. The Stop Server operation ensures service continuity during the restart by isolating the node. Once the node is successfully isolated, it will no longer provide services. If the Stop Server operation fails, troubleshoot the issue based on the error messages, adjust the cluster deployment if necessary, and retry the operation. Alternatively, if acceptable, you can skip this step and proceed with stopping the service directly.
```
obclient [(none)]> ALTER SYSTEM STOP SERVER 'svr_ip:svr_port';
```
Parameter descriptions:
- svr_ip: the IP address of the node to be stopped.
- svr_port: the RPC port of the node to be stopped. The default value is 2882.
Example:
```
obclient [(none)]> ALTER SYSTEM STOP SERVER '172.xx.xx.xx:2882';
```
After successful execution, query the STATUS column of the specified server in the oceanbase.DBA_OB_SERVERS view. The value of this column remains ACTIVE, but the STOP_TIME column changes from NULL to the time when the service was stopped.

For more information about how to query the oceanbase.DBA_OB_SERVERS view, see View a node.
Run the following command to perform a minor compaction on the node to shorten the time required for redo log replay after the restart and speed up the restart.
```
obclient [(none)]> ALTER SYSTEM MINOR FREEZE SERVER = ('svr_ip:svr_port');
```
Parameter descriptions:
- svr_ip: the IP address of the node to be restarted.
- svr_port: the RPC port of the node to be restarted. The default value is 2882.
Example:
```
obclient [(none)]> ALTER SYSTEM  MINOR FREEZE SERVER = ('172.xx.xx.xx:2882');
```
Wait for the minor compaction to complete before proceeding to the next step. For more information about how to check the minor compaction progress, see View minor compaction information.

For more details about minor compactions, see Major and minor compactions.
Stop the observer process.
1. Log in to the server where the observer process is running as the admin user.
2. Navigate to the /home/admin/oceanbase directory using the command line.
```
[admin@xxx /]$ cd /home/admin/oceanbase
```
  For more information about the installation directories of OceanBase Database, see OBServer installation directory structure.
3. Run the following command to view and obtain the process ID of the node.
```
[admin@xxx oceanbase]$ ps -ef | grep observer | grep -v grep
admin    103364      1 99  2022 ?        51-17:24:41 /home/admin/oceanbase/bin/observer
```
  In this example, 103364 is the process ID of the node.
4. Stop the observer process.
  
  The sample command is as follows:
```
[admin@xxx oceanbase]$ kill -9 pid
```
  Here, pid is the observer process ID of the node to be stopped.
  
  Example:
```
[admin@xxx oceanbase]$ kill -9 103364
```
  Notice
  
  You can stop only one observer process in a deployment directory. If you want to stop observer processes on multiple nodes, you need to log in to each server in sequence.
5. Confirm that the observer process has stopped by running the following command:
```
[admin@xxx oceanbase]$ ps aux | grep observer
```
  If no information is returned, the process has stopped successfully.
(Optional) If necessary, perform maintenance on the server during this step.
Start the observer process.
1. Log in to the server where the observer process is to be started as the admin user.
2. Start the observer process.
```
[admin@xxx oceanbase]$ cd /home/admin/oceanbase  &&  ./bin/observer
```
  Notice
  
  You can start only one observer process in a deployment directory. If you want to start observer processes on multiple nodes, you need to log in to each server in sequence.
  
  For more information about the installation directories of OceanBase Database, see OBServer installation directory structure.
  
  After successful execution, query the START_SERVICE_TIME column in the oceanbase.DBA_OB_SERVERS view. If the value of this column is not NULL, the observer process has started successfully.
Run the following command to restart the node's services:
```
obclient [(none)]> ALTER SYSTEM START SERVER 'svr_ip:svr_port';
```
where:
- svr_ip: the IP address of the node to be started.
- svr_port: the RPC port of the node to be started. The default value is 2882.
Example:
```
obclient [(none)]> ALTER SYSTEM START SERVER '172.xx.xx.xx:2882';
```
After successful execution, query the STOP_TIME column in the oceanbase.DBA_OB_SERVERS view. If the value of this column is NULL, the node services have started successfully, and the node is ready to provide services.

For more information about querying the oceanbase.DBA_OB_SERVERS view, see View a node.

References

For more information about node O&M, see the following topics: