observer_process_stop OBServer process stop|V4.3.6| docs|Distributed Database

observer_process_stop OBServer process stop

Last Updated：2025-09-08 08:15:43 Updated

Alert description

Monitors whether the observer service process has stopped.

The OCP-Agent process monitors the start time of the OceanBase component process. If the timestamp changes, it indicates that the process has stopped or restarted. This alert only indicates that the process has stopped temporarily and does not mean that the process has been running for a long time. If the process is stopped for a long time, other status-related alerts will be triggered.

The relevant processes include:

observer: alert item is observer_process_stop
obproxy: alert item is obproxy_process_stop
obproxyd.sh: alert item is obproxyd_process_stop
ocp_agentd: alert item is agentd_process_stop
ocp_mgragent: alert item is mgragent_process_stop
ocp_monagent: alert item is monagent_process_stop

Alert principle

Parameter	Value
Monitoring metric	observer_boot_time_delta_seconds
Metric source	The boot time of the system plus the time difference between the process and the system restart is the start time of the process. The system boot time is equal to the btime value returned by the command cat /proc/stat. The time difference between the process and the system restart is equal to the value of the 22nd column in the result of the command cat /proc/pid/stat divided by 100.
Metric collection	process_boot_time_seconds
Monitoring expression	max(delta(process_boot_time_seconds{name="observer",@LABELS}[@INTERVAL])) by (@GBLABELS)
Metric collection cycle	5 seconds

Rule information

Monitoring expression	Description of the monitoring metric	Default threshold	Detection cycle	Elimination cycle
observer_boot_time_delta_seconds > 0	When the monitoring metric is greater than 0, it indicates that the process has stopped.	0 seconds	10 seconds	5 minutes

Alert information

Alert trigger method	Alert level	Scope
Based on the monitoring metric expression	Warning	OBServer

Alert template

Alert summary
- Template: ${alarm_target} ${alarm_name}
- Example: alarm_template_id=0:ob_cluster=AdminMETA-12:host=xxx.xxx.xxx.xxx OBServer process stop
Alert details
- Template: Cluster: ${ob_cluster_name}, Host: ${host}, Alert: ${alarm_name}.
- Example: Cluster: AdminMETA, Host: xxx.xxx.xxx.xxx, Alert: OBServer process stop.
Alert recovery
- Template: Alert: ${alarm_name}, OBServer process start time change: ${recover_value}
- Example: Alert: OBServer process stop, OBServer process start time change: 0 seconds

Impact on the system

If the OBServer process stops, the following situations may occur:

If the service is stopped due to a system failure or active maintenance, it is expected.
For multi-replica tenants, if the server hosting a replica stops, the OBServer will attempt to migrate the replica to another server to ensure high availability. However, the migration time depends on the data volume. Therefore, you can try to restart the OBServer. If multiple attempts fail, stop the attempts to prevent core files and log files from filling up the disk.

Possible causes

The process unexpectedly exits and generates a core dump. You can check the core files in the /data/1 directory:
```
ls -l ${observer.coredump.path} --full-time | grep '.*core-observer'
```
The memory usage exceeds the limit, and the operating system kills the process.
Disk failure.
Search for the following three keywords in the observer.log file to find out other reasons for the process stop: is_out_of_memstore_mem=true, right_to_die_or_duty_to_live, and on_fatal_error.
Other unexpected situations.

Procedure

Before pulling up the OBServer, determine the cause of the issue. If the OBServer process has unexpectedly terminated or experienced memory exhaustion, attempt to restart the process immediately. For other situations, verify the issue before pulling up the process to avoid potential complications.
Contact OceanBase Technical Support to investigate the cause and assess whether the OBServer can be restarted immediately.