ob_server_stopped Server Stop Service|V4.4.2|OceanBase Cloud Platform|OCP docs|Distributed Database

ob_server_stopped Server Stop Service

Last Updated：2026-06-12 02:01:01 Updated

Alert description

This alert monitors whether an OBServer node in the OceanBase cluster is in the stopped state. The alert is triggered when the OBServer's stop_time is not null (i.e., a STOP SERVER operation has been performed), and the downtime exceeds 0 seconds.

Alert principle

Parameter	Value
Monitoring Metrics	ob_server_stopped_duration_seconds
Monitoring Expression	`max(ob_server_stopped_duration_seconds{@LABELS}) by (@GBLABELS)`
Metric Collection	ob_server_stopped_duration_seconds
Metric Source	Internal Views Collected by OCP-Agent from OBServer
Collection Cycle	N/A

OCP-Agent periodically queries the service status of OBServer nodes via SQL. The specific principle is as follows:

For OceanBase Database versions earlier than V4.0, OCP-Agent queries the __all_server view to calculate the downtime based on the stop_time field: if stop_time is 0, the metric value is 0; otherwise, the metric value is the number of seconds from the current time to stop_time.
For OceanBase Database V4.0 and later, OCP-Agent queries the DBA_OB_SERVERS view to calculate the downtime based on the STOP_TIME field: if STOP_TIME is NULL, the metric value is 0; otherwise, the metric value is the second difference between the current time and STOP_TIME.
The status field is also collected as a label to reflect the current operating status of the OBServer.

When the metric value ob_server_stopped_duration_seconds > 0, the trigger condition is met and an alert is generated.

Rule information

Monitoring Metrics	Default Threshold	Duration	Detection Cycle	Elimination Cycle
ob_server_stopped_duration_seconds	0	60 Seconds	10 Seconds	300 Seconds

Alert information

Alert Trigger Method	Alert Level	Scope
Based on monitoring metric expressions	Downtime	Host

Alert template

Alert Overview
- Template: ${alarm_target} ${alarm_name}
- Example: ob_cluster=obcluster1:host=xxx.xxx.xxx.xxx OceanBase server stopped service
Alert details
- Template: Cluster: ${ob_cluster_name}, Host: ${host}, OceanBase Server Status: ${status}, Alert: ${alarm_name}, Service Disruption Duration: ${value_shown}.
- Example: Cluster: obcluster1, Host: xxx.xxx.xxx.xxx, OceanBase Server Status: stopped, Alert: OceanBase server service stopped, Service stopped for: 120s.
Alert recovery
- Template: Alert: ${alarm_name}, OBServer Downtime: ${value_shown}
- Example: Alert: Server service stopped, OBServer service stop duration: 0

Impact on the system

When an OBServer node stops providing services, it no longer offers database services externally. This may result in the following impacts:

Service interruption: The leader replica hosted on this node needs to be elected and switched over, causing the related partition to be temporarily unavailable during the switchover.
Reduced availability: The number of available nodes in the cluster decreases. If multiple nodes fail simultaneously, a majority of Paxos votes may not be achievable, rendering the entire cluster unavailable.
Load imbalance: Traffic from the failed node is diverted to other nodes, potentially increasing their load.

Possible causes

The O&M engineer has actively executed the ALTER SYSTEM STOP SERVER operation (such as for scheduled maintenance or node isolation).
The node has been stopped through the OCP console.
The status of an OBServer process is marked as stopped after it exits abnormally.

Solution

Confirm the cause of service interruption: Check whether the node is down due to scheduled maintenance. If it is a planned operation, you can ignore this alert and wait for the maintenance to complete before the service recovers.

Check node status: Query the status of each node in the cluster by executing the following SQL statement:

SELECT SVR_IP, SVR_PORT, ZONE, STATUS, START_SERVICE_TIME, STOP_TIME
FROM oceanbase.DBA_OB_SERVERS;

Restore node service:
- Method 1 (recommended): Start the service for the node through the OCP console.
- Method 2: Start the node by using an SQL command:
```
ALTER SYSTEM START SERVER 'xxx.xxx.xxx.xxx:2882';
```
Check node health: Before restoring service, ensure the observer process on the node is running normally, network connectivity is intact, and log synchronization is not delayed:
```
ps -ef | grep observer
```
```
SELECT SVR_IP, ROLE, SCN_TO_TIMESTAMP(END_SCN)
FROM oceanbase.GV$OB_LOG_STAT WHERE TENANT_ID = 1 ORDER BY LS_ID, ROLE;
```
Verify the fix: After the service is restored, confirm that the node STATUS has changed to ACTIVE. The alert will automatically clear within the clearance period (300 seconds).

OceanBase

Customer Stories

Documentation