os_observer_not_exist|V4.3.5| docs|Distributed Database

os_observer_not_exist

Last Updated：2025-03-26 07:47:21 Updated

Description

This alert is triggered if the observer process on a monitored OBServer node does not exist.

Parameter	Value
Metric	observer_process_exists
Source	This metric is a basic host monitoring metric. To check whether any observer process exists, you can run `ps -ef\\|grep -w observer\\|grep -v grep\\|wc -l` to return the number of observer processes in the system.
Collected metric	process_exists
Metric expression	min(process_exists{name="observer",@LABELS}) by (@GBLABELS)
Collection cycle	1 second

Metric expression	Metric description	Default threshold	Detection cycle	Time before clearance
observer_process_exists == 0	`0`: The process does not exist. `1`: The process exists.	0	10 seconds	5 minutes

Trigger method	Alert level	Scope
Based on the expression of the metric	Critical	Server

Overview
- Template: ${alarm_target} ${alarm_name}
- Example: ob_cluster=obcluster-1631964370:svr_ip=xxx.xxx.xxx.xxx. The observer process does not exist.
Details
- Template: Cluster: ${ob_cluster_name}. Host: ${host}. Alert: The observer process does not exist.
- Example: Cluster: obcluster-1. Host: xxx.xxx.xxx.xxx. Alert: The observer process does not exist.

For a single-replica system, system services will be unavailable if the observer process does not exist.
For a multi-replica OceanBase cluster, the availability of the cluster may be compromised if the observer process on an OBServer does not exist. For example, the number of zones changes from 3 to 2, or the number of Internet Data Centers (IDCs) in the same region changes from 3 to 2.

The OBServer node is unexpectedly restarted. For example, the observer process is killed when the system resources are insufficient.

Reload the observer process when it unexpectedly exits. You can execute the alert clearance plan to handle the alerted issue. For more information, see Execute the alert clearance plan.

Notice

You can reload the observer process only once within 12 hours since the event occurs.
Check whether the basic monitoring metrics of the host, such as the memory usage, CPU utilization, load, and disk usage, are as expected.
Check whether a large number of ERROR log records exist in the runtime log of the OBServer node.
```
tail -10000 /home/admin/oceanbase/log/observer.log.wf | grep ERROR | wc -l
```
If yes, contact OceanBase Technical Support.
Check the OS logs. Search for the keyword "error" in the /var/log/messages log file and check the returned information.