Stop or start a node|V4.1.0| docs|Distributed Database

Stop or start a node

Last Updated：2023-08-04 08:22:14 Updated

You can start or stop a node based on the operating status of the database.

Stop a node

To stop a node, perform the following steps:

Stop the node service by performing the Stop Server operation
Stop the observer process

Stop the node service

You can perform the Stop Server operation to stop the node. The purpose of the Stop Server operation is to switch the partition leader on the current node to another node. If the current node has no partition leader, the system will internally mark the node as stopped, so that client requests will no longer be sent to the node and the node will not provide services to external applications.

For more information about the node status, see View the status of a node.

The Stop Server operation is usually performed under special O&M scenarios such as server diagnostics or the repair, replacement, or upgrade of hardware.

Sample statement of the Stop Server operation:

obclient> ALTER SYSTEM STOP SERVER 'ip:port' [,'ip:port'...] [ZONE='zone'];

This statement can be executed only in the sys tenant.

Before you stop an OBServer node, make sure that the active replicas are in the majority.

When you stop an OBServer node, note that:

Do not perform the Stop Server operation for an OBServer node in another zone. You can perform the Stop Server operation for multiple OBServer nodes in the same zone.
Do not initiate another Stop Server operation if the current Stop Server operation has not completed.
If a large number of partitions or partition leaders exist on the OBServer node on which the Stop Server operation is performed, the Stop Server operation takes a long time. If the operation times out, you can change the SQL timeout period to a larger value.

The SQL timeout duration can be specified by the ob_query_timeout variable in microseconds. The default value is 10000000. For more information about how to set the variable, see Set variables.

For more information about the ob_query_timeout variable, see ob_query_timeout.
If the command fails to be executed in a short while, the logs may be out of synchronization. In this case, query the oceanbase.DBA_OB_ROOTSERVICE_EVENT_HISTORY view to check whether the Stop Server record exists. If no Stop Server record is found, the failure has occurred before the Root Service performs the Stop Server operation. If a Stop Server record is found, the Stop Server operation has failed because the logs are not synchronized.

Sample statement:
```
obclient> SELECT rs_svr_ip FROM oceanbase.DBA_OB_ROOTSERVICE_EVENT_HISTORY WHERE event='stop_server';
+----------------+
| RS_SVR_IP      |
+----------------+
| xxx.xx.xxx.xx5 |
| xxx.xx.xxx.xx5 |
+----------------+
2 rows in set
```
After you perform the Stop Server operation, the status field of the server in the oceanbase.DBA_OB_SERVERS view remains Active. However, the value of the stop_time field changes from NULL to the time when the Stop Server operation was performed, and the status of the node becomes stopped.

For more information, see View the status of a node.

Example:

Log on to the sys tenant of the cluster as the root user.
Execute the following statement to stop the node service:
```
obclient> ALTER SYSTEM STOP SERVER "10.10.10.1:2882" zone='z1';
```
If a leader is deployed on the node, the system automatically switches it to a follower after the node service is stopped. Followers on the node can still participate in voting but will not be elected as a leader. Stopping the node service is different from a node failure. The service stop time of the node can exceed the permanently offline time specified by the server_permanent_offline_time parameter, without causing the node to be truly offline.

Note

server_permanent_offline_time specifies the time threshold for heartbeat missing at which a server is considered permanently offline. Data replicas on a permanently offline server must be automatically supplemented.

Stop the observer process

Log on to the server where the node resides as the admin user.
Access the /home/admin/oceanbase/bin directory from the command-line interface (CLI).
Run the following command to stop the observer process:
```
kill `pidof observer`
```
Run the following command to check whether the process is stopped:
```
ps -ef | grep observer | grep -v grep
```
If no response is returned after the command is executed, the process is stopped.

Start the node

To start a node, perform the following steps:

Start the observer process
Start the node service by performing the Start Server operation

Start the observer process

In special cases, you can start the observer process by running a command.

Log on to the server where the node resides as the admin user.
Access the /home/admin/oceanbase/bin directory from the command-line interface (CLI).
Run the following command to start the observer process:
```
cd /home/admin/oceanbase/

./bin/observer [Startup parameters]
```
Generally, you need to add the startup parameters only for the first startup. In other startups, you can directly run the ./bin/observer command. You can also run ./bin/observer --help to view details of the observer startup parameters.

Sample command for starting the observer process for the first time:
```
cd /home/admin/oceanbase/bin

./observer -p 2881 -P 2882 -z 'zone_1' -d '/data/1/prod_data/' -r '10.10.10.1:2882:2881;10.10.10.2:2882:2881;10.10.10.3:2882:2881' -l WARN -o 'memory_limit=100GB,datafile_disk_percentage=85'
```
Parameters:
- -p specifies the port number for direct connection. The value is 2881 in this example.
- -P specifies the RPC port number. The value is 2882 in this example.
- -z specifies the zone where the node to be started is located. The value is zone_1 in this example.
- -d specifies the storage directory of data. The value is /data/1/prod_data in this example.
- -r specifies the IP address of the node to be started.
- -l specifies the level of logs to be printed. The value is WARN in this example, indicating that logs of the WARNING level are to be printed.
- -o specifies the startup parameters. When the -o parameter is used, note that:
  - The parameters are not case-insensitive. However, we recommend that you set them based on the names in observer.config.bin.
  - A parameter name cannot contain the following special characters: spaces, \r, \n, and \t.
  - An equal sign (=) is required between the name and the value of a parameter.
  - Separate multiple parameters with commas (,).
- datafile_disk_percentage = 85 indicates that the utilization of the data disk is 85%.
- memory_limit = 100GB specifies that the maximum memory available for starting the process is 100 GB.
After you start the process, wait for 5 to 10 seconds and check whether the process is started.
1. Run the following command to check whether the process is running.
  
  Sample code:
```
[root@xx oceanbase]#ps -ef | grep observer | grep -v grep
root       6136      0 99 11:23 ?        00:00:19 ./bin/observer
```
  In the example, if a response is returned after the command is executed, the process is started. Otherwise, the process is not started.
2. Run the following command to check whether port listening is enabled.
  
  Sample code:
```
[root@xxx oceanbase]#netstat -ntlp | grep `pidof observer`
tcp        0      0 0.0.0.0:2881            0.0.0.0:*               LISTEN      6136/./bin/observer
tcp        0      0 0.0.0.0:2882            0.0.0.0:*               LISTEN      6136/./bin/observer
```
  In the example, 6136 in 6136/./bin/observer indicates the ID of the observer process. The execution results show that port listening is enabled.

Start the node service

Generally, you can perform the Start Server operation to start the node service. The Start Server operation is inverse to the Stop Server operation. By default, when a node in the cluster is started, it enters the started state. After you perform the Stop Server operation on an OBServer, you must perform the Start Server operation to set the OBServer status to started.

Sample statement for starting the node service:

obclient> ALTER SYSTEM START SERVER 'ip:port' [,'ip:port'...] [ZONE='zone'];