You can start or stop a node based on the operating status of the database.
Stop a node
To stop a node, perform the following steps:
Stop the node service by performing the Stop Server operation
Stop the observer process
Stop the node service
You can perform the Stop Server operation to stop the node. The purpose of the Stop Server operation is to switch the partition leader on the current node to another node. If the current node has no partition leader, the system will internally mark the node as stopped, so that client requests will no longer be sent to the node and the node will not provide services to external applications.
For more information about the node status, see View the status of a node.
The Stop Server operation is usually performed under special O&M scenarios such as server diagnostics or the repair, replacement, or upgrade of hardware.
Sample statement of the Stop Server operation:
obclient> ALTER SYSTEM STOP SERVER 'ip:port' [,'ip:port'...] [ZONE='zone'];
This statement can be executed only in the sys tenant.
Before you stop an OBServer node, make sure that the active replicas are in the majority.
When you stop an OBServer node, note that:
Do not perform the Stop Server operation for an OBServer node in another zone. You can perform the Stop Server operation for multiple OBServer nodes in the same zone.
Do not initiate another Stop Server operation if the current Stop Server operation has not completed.
If a large number of partitions or partition leaders exist on the OBServer node on which the Stop Server operation is performed, the Stop Server operation takes a long time. If the operation times out, you can change the SQL timeout period to a larger value.
The SQL timeout duration can be specified by the
ob_query_timeoutvariable in microseconds. The default value is10000000. For more information about how to set the variable, see Set variables.For more information about the
ob_query_timeoutvariable, see ob_query_timeout.If the command fails to be executed in a short while, the logs may be out of synchronization. In this case, query the
oceanbase.DBA_OB_ROOTSERVICE_EVENT_HISTORYview to check whether the Stop Server record exists. If no Stop Server record is found, the failure has occurred before the Root Service performs the Stop Server operation. If a Stop Server record is found, the Stop Server operation has failed because the logs are not synchronized.Sample statement:
obclient> SELECT rs_svr_ip FROM oceanbase.DBA_OB_ROOTSERVICE_EVENT_HISTORY WHERE event='stop_server'; +----------------+ | RS_SVR_IP | +----------------+ | xxx.xx.xxx.xx5 | | xxx.xx.xxx.xx5 | +----------------+ 2 rows in setAfter you perform the Stop Server operation, the
statusfield of the server in theoceanbase.DBA_OB_SERVERSview remainsActive. However, the value of thestop_timefield changes fromNULLto the time when the Stop Server operation was performed, and the status of the node becomesstopped.For more information, see View the status of a node.
Example:
Log on to the
systenant of the cluster as therootuser.Execute the following statement to stop the node service:
obclient> ALTER SYSTEM STOP SERVER "10.10.10.1:2882" zone='z1';If a leader is deployed on the node, the system automatically switches it to a follower after the node service is stopped. Followers on the node can still participate in voting but will not be elected as a leader. Stopping the node service is different from a node failure. The service stop time of the node can exceed the permanently offline time specified by the
server_permanent_offline_timeparameter, without causing the node to be truly offline.Note
server_permanent_offline_timespecifies the time threshold for heartbeat missing at which a server is considered permanently offline. Data replicas on a permanently offline server must be automatically supplemented.
Stop the observer process
Log on to the server where the node resides as the
adminuser.Access the
/home/admin/oceanbase/bindirectory from the command-line interface (CLI).Run the following command to stop the observer process:
kill `pidof observer`Run the following command to check whether the process is stopped:
ps -ef | grep observer | grep -v grepIf no response is returned after the command is executed, the process is stopped.
Start the node
To start a node, perform the following steps:
Start the observer process
Start the node service by performing the Start Server operation
Start the observer process
In special cases, you can start the observer process by running a command.
Log on to the server where the node resides as the
adminuser.Access the
/home/admin/oceanbase/bindirectory from the command-line interface (CLI).Run the following command to start the observer process:
cd /home/admin/oceanbase/ ./bin/observer [Startup parameters]Generally, you need to add the startup parameters only for the first startup. In other startups, you can directly run the
./bin/observercommand. You can also run./bin/observer --helpto view details of the observer startup parameters.Sample command for starting the observer process for the first time:
cd /home/admin/oceanbase/bin ./observer -p 2881 -P 2882 -z 'zone_1' -d '/data/1/prod_data/' -r '10.10.10.1:2882:2881;10.10.10.2:2882:2881;10.10.10.3:2882:2881' -l WARN -o 'memory_limit=100GB,datafile_disk_percentage=85'Parameters:
-pspecifies the port number for direct connection. The value is2881in this example.-Pspecifies the RPC port number. The value is2882in this example.-zspecifies the zone where the node to be started is located. The value iszone_1in this example.-dspecifies the storage directory of data. The value is/data/1/prod_datain this example.-rspecifies the IP address of the node to be started.-lspecifies the level of logs to be printed. The value isWARNin this example, indicating that logs of the WARNING level are to be printed.-ospecifies the startup parameters. When the-oparameter is used, note that:The parameters are not case-insensitive. However, we recommend that you set them based on the names in
observer.config.bin.A parameter name cannot contain the following special characters: spaces,
\r,\n, and\t.An equal sign (=) is required between the name and the value of a parameter.
Separate multiple parameters with commas (,).
datafile_disk_percentage = 85indicates that the utilization of the data disk is 85%.memory_limit = 100GBspecifies that the maximum memory available for starting the process is 100 GB.
After you start the process, wait for 5 to 10 seconds and check whether the process is started.
Run the following command to check whether the process is running.
Sample code:
[root@xx oceanbase]#ps -ef | grep observer | grep -v grep root 6136 0 99 11:23 ? 00:00:19 ./bin/observerIn the example, if a response is returned after the command is executed, the process is started. Otherwise, the process is not started.
Run the following command to check whether port listening is enabled.
Sample code:
[root@xxx oceanbase]#netstat -ntlp | grep `pidof observer` tcp 0 0 0.0.0.0:2881 0.0.0.0:* LISTEN 6136/./bin/observer tcp 0 0 0.0.0.0:2882 0.0.0.0:* LISTEN 6136/./bin/observerIn the example,
6136in6136/./bin/observerindicates the ID of the observer process. The execution results show that port listening is enabled.
Start the node service
Generally, you can perform the Start Server operation to start the node service. The Start Server operation is inverse to the Stop Server operation. By default, when a node in the cluster is started, it enters the started state. After you perform the Stop Server operation on an OBServer, you must perform the Start Server operation to set the OBServer status to started.
Sample statement for starting the node service:
obclient> ALTER SYSTEM START SERVER 'ip:port' [,'ip:port'...] [ZONE='zone'];
This statement can be executed only in the sys tenant.
Example:
Log on to the
systenant of the cluster as therootuser.Execute the following statement to start the node service:
obclient> ALTER SYSTEM START SERVER "10.10.10.1:2882";Execute the following statement to check whether the node service is operating properly.
Sample code:
obclient> SELECT * FROM oceanbase.DBA_OB_SERVERS\G *************************** 1. row *************************** SVR_IP: xx.xx.xx.xx SVR_PORT: 2882 ID: 1 ZONE: zone1 SQL_PORT: 2881 WITH_ROOTSERVER: YES STATUS: ACTIVE START_SERVICE_TIME: 2022-06-17 11:30:04.589074 STOP_TIME: NULL BLOCK_MIGRATE_IN_TIME: NULL LAST_OFFLINE_TIME: NULL BUILD_VERSION: 4.0.0.0_20220525115829-1873fc2598d56060fe307ce3b7b88647686e0b09(May 25 2022 12:12:10) 1 row in setParameters:
statusindicates the status of the node service. Valid values:active: indicates that the node is operating properly.inactive: indicates that the node is offline. During an upgrade of the cluster, the status of nodes in the cluster isinactive.
start_service_time: indicates the time when the service was started on the node. If the value isNULL, the service has not been started on the node.stop_time: indicates the stop time of the node, which must be the default valueNULL. If not, the node is stopped by using a Stop Server operation and the service needs to be started by using a Start Server operation.