Alert description
This alert monitors whether the observer process is running on the OBServer host. If the observer process is not running, the alert is triggered.
Alert principle
| Parameter | Value |
|---|---|
| Monitoring Metric | observer_process_exists |
| Monitoring Source | Monitoring of basic host features to check if there are observer processes in the system. You can run the ps -ef|grep -w observer|grep -v grep|wc -l command to view the number of observer processes. |
| Collected metric | process_exists |
| monitoring expression | min(process_exists{name="observer",@LABELS}) by (@GBLABELS) |
| Sampling Interval | 1 sec |
Rule information
| Monitoring Expression | Meaning of the Monitoring Metric | Default Threshold | Detection Cycle | Elimination Cycle |
|---|---|---|---|---|
| observer_process_exists == 0 | 0 | 10 seconds | 5 minutes |
Alert message
| Alert Trigger Method | Alert Level | Scope |
|---|---|---|
| Expression based on monitoring metrics | Critical | Server |
Alert template
- Alert Overview
- Template: ${alarm_target} ${alarm_name}
- Example: obcluster=obcluster-1631964370:svr_ip=xxx.xxx.xxx.xxx OceanBase server process does not exist
- Alert Details
- Template: cluster: ${ob_cluster_name}, host: ${host}, alert: OceanBase server process does not exist.
- Sample: Cluster: obcluster-1, Host: xxx.xxx.xxx.xxx, Alert: OceanBase server process does not exist.
- Alert Restore
- Template: Alert: ${alarm_name}, OceanBase server process status: ${recover_value}
- Example: Alert: OceanBase server process does not exist, OceanBase server process status: 1
Impact on the system
- For a single-replica system, the absence of observer processes would result in the service being halted. For a multi-replica OceanBase cluster, the availability may decrease because the observer process does not exist: for example, 3 zones to 2 zones or 3 IDCs within a region to 2 IDCs.
Potential causes
The OBServer node is unexpectedly restarted, for example, when it is killed due to insufficient system resources.
Treatment Method
Attempt to restore the observer process when it unexpectedly exits. You can execute an alert action plan to resolve the issue. For more information, see Alert action plans.
The event was triggered only once within the 30 minutes before the event occurred.
Check if the host's basic monitoring metrics, such as memory usage, CPU utilization, load, and disk usage, meet the expected standards.
Check if there are many error logs in the running log of the OBServer node:
tail -10000 /home/admin/oceanbase/log/observer.log.wf | grep ERROR | wc -lIf there are a large number of error logs, contact technical support.
Check the OS logs, search for the error keyword in the
/var/log/messageslog file, and observe the system's returned information.