Restart a node|V4.3.3| docs|Distributed Database

Restart a node

Last Updated：2024-12-02 03:48:29 Updated

Restart is a general O&M action. You can perform a restart for provisional maintenance of a server or for system parameter modifications to take effect. During the restart, the offline time of the node must be within the time specified by the server_permanent_offline_time parameter. Otherwise, the node will be permanently offline. If the maintenance of a server takes a long time, replace it. For more information about how to replace a server, see Replace a node.

Note

The server_permanent_offline_time parameter specifies the time threshold for heartbeat missing. When the heartbeat of an OBServer node is missing for the specified period of time, the OBServer node is considered permanently offline. Data replicas on a permanently offline OBServer node must be automatically supplemented. The default value is 3600s. For more information about this parameter, see server_permanent_offline_time.

Background information

OceanBase Database is a distributed database and is typically built based on a multi-replica architecture, such as three replicas in three IDCs of the same region and five replicas in five IDCs across three regions. A transaction is committed only when the transaction log is synchronized among the majority of replicas by using the Paxos protocol, thereby ensuring data consistency among replicas. When a minority of replicas are abnormal, a service level agreement (SLA) objective with a recovery point objective (RPO) of 0 is met.

In a multi-replica architecture, the STOP SERVER command can implement a restart without business loss. The STOP SERVER command involves two steps:

Switch all leaders away from the node to be restarted and make sure that the replicas on the remaining nodes are still in the majority.
Mark the node to be restarted (which is in the ACTIVE state, and the value of the stop_time field is greater than 0) as stopped on RootService. After the client detects a stopped node, the client will no longer route business requests to this node.

After the Stop Server operation succeeds, the node restart will not cause issues such as election without a leader or errors returned on the client. The restart is transparent to business traffic. If the Stop Server operation fails, you must stop the restart and identify the causes. For example, replicas are insufficient, a latency exists in redo logs, or the total number of voting members is less than 3.

Procedure

The general procedure for restarting a node consists of the following steps: stop the node, initiate a minor compaction, stop the process, start the process, and start the node.

This topic describes how to restart a single node in a cluster. To restart multiple nodes, repeat the restart procedure multiple times.

Log in to the sys tenant of the cluster as the root user.

Note that you must specify the corresponding options in the following sample code based on your actual database configurations.
```
obclient -h10.xx.xx.xx -P2883 -uroot@sys#obdemo -p***** -A
```
For more information about how to connect to a database, see Connection methods (MySQL mode) or Connection methods (Oracle mode).
Execute the following statement to isolate the target node.

The service continuity may be affected during the restart of a node. For example, if a cluster contains only one or two nodes or the tenant data is distributed only on two nodes, database services may become unavailable when a node is restarted. When you perform the Stop Server operation to isolate the node, the system performs a security check to avoid affecting service continuity during the node restart. After the node is isolated, it no longer provides services. If the Stop Server operation fails, you must check the cluster deployment information or resolve the issue based on the error information and then perform the Stop Server operation again. If the service unavailability of the node is not a concern, skip this step.
```
obclient [(none)]> ALTER SYSTEM STOP SERVER 'svr_ip:svr_port';
```
where
- svr_ip specifies the IP address of the node to be stopped.
- svr_port specifies the RPC port of the node to be stopped. The default value is 2882.
Here is an example:
```
obclient [(none)]> ALTER SYSTEM STOP SERVER '172.xx.xx.xx:2882';
```
After the statement is executed, you can query the oceanbase.DBA_OB_SERVERS view for the STATUS field of this server. In the view, the value of this field is still ACTIVE but the value of STOP_TIME changes from NULL to the point in time when the node is stopped.

For more information about how to query the oceanbase.DBA_OB_SERVERS view, see View a node.
Execute the following statement to initiate a minor compaction for the node to be restarted. This is to shorten the time required to replay redo logs after the restart, thereby accelerating the restart.
```
obclient [(none)]> ALTER SYSTEM MINOR FREEZE SERVER = ('svr_ip:svr_port');
```
where
- svr_ip specifies the IP address of the node to be restarted.
- svr_port specifies the RPC port of the node to be restarted. The default value is 2882.
Here is an example:
```
obclient [(none)]> ALTER SYSTEM  MINOR FREEZE SERVER = ('172.xx.xx.xx:2882');
```
After you initiate a minor compaction, you must wait for the minor compaction to complete before you proceed to the next step. For more information about how to view the minor compaction progress, see View minor compaction information.

For more information about minor compactions, see Minor compaction.
Stop the observer process.
1. Log in as the admin user to the server where the node whose process is to be stopped is located.
2. Access the /home/admin/oceanbase directory from the command-line interface (CLI).
```
[admin@xxx /]$ cd /home/admin/oceanbase
```
  For more information about the installation directory of OceanBase Database, see Structure of the OBServer installation directory.
3. Run the following command to obtain the process ID of the node.
```
[admin@xxx oceanbase]$ ps -ef | grep observer | grep -v grep
admin    103364      1 99  2022 ?        51-17:24:41 /home/admin/oceanbase/bin/observer
```
  Here, 103364 is the process ID of the node.
4. Stop the observer process.
  
  The syntax of the command is as follows:
```
[admin@xxx oceanbase]$ kill -9 pid
```
  Here, pid is the ID of the observer process on the node to be stopped.
  
  Here is an example:
```
[admin@xxx oceanbase]$ kill -9 103364
```
  Notice
  
  You can stop only one observer process on a node in the deployment directory. To stop the observer processes on multiple nodes, log in to the corresponding servers one by one and repeat the preceding procedure.
5. Run the following command to verify whether the process is stopped.
```
[admin@xxx oceanbase]$ ps aux | grep observer
```
  If no response is returned after the command is executed, the process is stopped.
(Optional) Perform provisional maintenance on the server as needed.
Start the observer process.
1. Log in as the admin user to the server where the node whose process is to be started is located.
2. Start the observer process.
```
[admin@xxx oceanbase]$ cd /home/admin/oceanbase  &&  ./bin/observer
```
  Notice
  
  You can start only one observer process on a node in the deployment directory. To start the observer processes on multiple nodes, log in to the corresponding servers one by one and repeat the preceding procedure.
  
  For more information about the installation directory of OceanBase Database, see Structure of the OBServer installation directory.
  
  After the statement is executed, you can query the START_SERVICE_TIME field in the oceanbase.DBA_OB_SERVERS view. If the field value is not NULL, the observer process is started.
Execute the following statement to start the node.
```
obclient [(none)]> ALTER SYSTEM START SERVER 'svr_ip:svr_port';
```
where
- svr_ip specifies the IP address of the node to be started.
- svr_port specifies the RPC port of the node to start. The default value is 2882.
Here is an example:
```
obclient [(none)]> ALTER SYSTEM START SERVER '172.xx.xx.xx:2882';
```
After the statement is executed, you can query the STOP_TIME field in the oceanbase.DBA_OB_SERVERS view. If the field value is NULL, the observer process is started on the node and the node can provide external services.

For more information about how to query the oceanbase.DBA_OB_SERVERS view, see View a node.

References

For more node-related O&M operations, see the following topics: