Description
This alert is triggered if the observer process on a monitored OBServer node does not exist.
Principle
| Parameter | Value |
|---|---|
| Metric | observer_process_exists |
| Source | This metric is a basic host monitoring metric. To check whether any observer process exists, you can run ps -ef\|grep -w observer\|grep -v grep\|wc -l to return the number of observer processes in the system. |
| Collected metric | process_exists |
| Metric expression | min(process_exists{name="observer",@LABELS}) by (@GBLABELS) |
| Collection cycle | 1 second |
Alert rule
| Metric expression | Metric description | Default threshold | Detection cycle | Time before clearance |
|---|---|---|---|---|
| observer_process_exists == 0 | 0: The process does not exist.1: The process exists. |
0 | 10 seconds | 5 minutes |
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| Based on the expression of the metric | Critical | Server |
Alert templates
- Overview
- Template: ${alarm_target} ${alarm_name}
- Example: ob_cluster=obcluster-1631964370:svr_ip=xxx.xxx.xxx.xxx. The observer process does not exist.
- Details
- Template: Cluster: ${ob_cluster_name}. Host: ${host}. Alert: The observer process does not exist.
- Example: Cluster: obcluster-1. Host: xxx.xxx.xxx.xxx. Alert: The observer process does not exist.
Impact on the system
- For a single-replica system, system services will be unavailable if the observer process does not exist.
- For a multi-replica OceanBase cluster, the availability of the cluster may be compromised if the observer process on an OBServer does not exist. For example, the number of zones changes from 3 to 2, or the number of Internet Data Centers (IDCs) in the same region changes from 3 to 2.
Possible causes
The OBServer node is unexpectedly restarted. For example, the observer process is killed when the system resources are insufficient.
Solutions
Reload the observer process when it unexpectedly exits. You can execute the alert clearance plan to handle the alerted issue. For more information, see Execute the alert clearance plan.
Notice
You can reload the observer process only once within 12 hours since the event occurs.
Check whether the basic monitoring metrics of the host, such as the memory usage, CPU utilization, load, and disk usage, are as expected.
Check whether a large number of ERROR log records exist in the runtime log of the OBServer node.
tail -10000 /home/admin/oceanbase/log/observer.log.wf | grep ERROR | wc -lIf yes, contact OceanBase Technical Support.
Check the OS logs. Search for the keyword "error" in the
/var/log/messageslog file and check the returned information.