observer_process_stop|V4.3.3| docs|Distributed Database

observer_process_stop

Last Updated：2025-01-10 06:15:54 Updated

Description

This alert is triggered when the observer process stops.

The ocp_agent process monitors the uptime of the processes of components in OceanBase Database. When the uptime of a process is 0, the process is stopped. This alert indicates that the process has stopped at the moment rather than for a long time. In the latter case, other status alerts are triggered.

Relevant alerts:

observer process: observer_process_stop
obproxy process: obproxy_process_stop
obproxyd.sh process: obproxyd_process_stop
ocp_agentd process: agentd_process_stop
ocp_mgragent process: mgragent_process_stop
ocp_monagent process: monagent_process_stop

Principle

Parameter	Value
Metric	observer_uptime_delta_seconds
Source	The difference between the current time and the process creation time displayed in the 14th column of the `stat` file in the `/proc/[pid]` directory.
Collected metric	process_uptime_seconds
Metric expression	0 - min(delta(process_uptime_seconds{name="observer",@LABELS}[@INTERVAL])) by (@GBLABELS)
Collection cycle	5 seconds

Alert rule

Metric expression	Metric description	Default threshold	Detection cycle	Time before clearance
observer_uptime_delta_seconds > 0	The negative offset value of uptime is used. A metric value greater than 0 indicates that the process has stopped.	0 seconds	10 seconds	5 minutes

Alert information

Trigger method	Alert level	Scope
Based on the expression of the metric	Warning	OBProxy

Alert templates

Overview
- Template: ${alarm_target} ${alarm_name}
- Example: alarm_template_id=0:ob_cluster=AdminMETA-12:host=xxx.xxx.xxx.xxx. The observer process stopped.
Details
- Template: Cluster: ${ob_cluster_name}. Host: ${host}. Alert: ${alarm_name}. The process has been running for ${value_shown}.
- Example: Cluster: AdminMETA. Host: xxx.xxx.xxx.xxx. Alert: The observer process stopped. The process has been running for 62 days 20 hours 56 seconds.

Impact on the system

Possible results when the observer process stops:

It is expected that the observer process stops due to a system fault or intended O&M action.
If the tenant has multiple replicas, and the observer process on a server that hosts a replica stops, OceanBase Database tries to maintain the majority of the replicas and ensure high availability by migrating the replica to another server. The time consumed for migration depends on the data volume. You can try to restart the observer process. We recommend that you stop trying if the restart fails several times. Otherwise, the disk can be fully occupied by core dump files and log files.

Possible causes

The process unexpectedly exits and a core dump file is generated. In this case, you can view the core dump file in the /data/1 directory by executing the following statement:
```
ls -l ${observer.coredump.path} --full-time | grep '.*core-observer'
```
The memory usage exceeds the threshold and the process is killed by the operating system.
The disk is faulty.
Causes recorded in the observer.log file. You can search the log file for the following three keywords to view the causes: is_out_of_memstore_mem=true, right_to_die_or_duty_to_live, and on_fatal_error.
Other exceptions.

Solutions

You can restart the observer process depending on the specific situation. For example, if the observer process crashes, or the observer process stops due to full usage of memory, you can try restarting the observer process right away. Otherwise, you need to confirm the specific causes to avoid unexpected consequences.
We recommend that you contact OceanBase Technical Support.