Description
This alert is triggered when the observer process on an OBServer does not exist.
Principle
| Parameter | Value |
|---|---|
| Metric | observer_process_exists |
| Source | This metric is a basic host monitoring metric. To check whether any observer process exists, you can run ps -ef|grep -w observer|grep -v grep|wc -l to return the number of observer processes in the system. |
| Collected metric | process_exists |
| Metric expression | min(process_exists{name="observer",@LABELS}) by (@GBLABELS) |
| Collection cycle | 1 second |
Alert rule
| Metric expression | Metric description | Default threshold | Detection cycle | Elimination cycle |
|---|---|---|---|---|
| observer_process_exists == 0 | 0 | 10 seconds | 5 minutes |
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| Based on the expression of the metric | Critical | Server |
Alert templates
- Overview
- Template: ${alarm_target} ${alarm_name}
- Example: ob_cluster=obcluster-1631964370:svr_ip=192.168.0.1 os_observer_not_exist
- Details
- Template: Cluster: ${ob_cluster_name}, host: ${svr_ip}, alert: os_observer_not_exist.
- Example: Cluster: obcluster, host: 192.168.0.1, alert: os_observer_not_exist.
Impact on the system
- For a single-replica system, system services will be unavailable if the observer process does not exist.
- For a multi-replica OceanBase cluster, the availability of the cluster may decrease if the observer process on an OBServer does not exist. For example, the number of zones changes from 3 to 2, or the number of Internet Data Centers (IDCs) in the same region changes from 3 to 2.
Possible causes
The faulty OBServer is restarted accidentally. For example, the observer process is killed when the system resources are insufficient.
Suggested solutions
Check whether the basic monitoring metrics of the host, such as the memory usage, CPU utilization, load, and disk usage, are as expected.
Check whether a large number of ERROR logs exist in the run logs of the OBServer.
tail -10000 /home/admin/oceanbase/log/observer.log.wf | grep ERROR | wc -lIf yes, contact Technical Support.
Check the OS logs. Search for the keyword "error" in the
/var/log/messageslog file and check the returned information.