Description
When the detection of an alert rule times out in OceanBase Cloud Platform (OCP), this alert is triggered. When the detection time of an alert rule exceeds 2.5s, this alert is triggered.
Principle
| Parameter | Value |
|---|---|
| Metric | ocp_alarm_detect_duration_seconds |
| Source | OCP server tracking data |
| Collected metric | ocp_alarm_detect_duration_seconds |
| Metric expression | (sum(rate(ocp_alarm_detect_duration_ms_sum{@LABELS}[@INTERVAL])) by (@GBLABELS) / sum(rate(ocp_alarm_detect_duration_ms_count{@LABELS}[@INTERVAL])) by (@GBLABELS)) / 1000 |
| Collection cycle | 60 seconds |
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| Based on the expression of the metric | Warning | OCP node |
Alert rule
| Metric | Default threshold | Source | Detection cycle | Elimination cycle |
|---|---|---|---|---|
| ocp_alarm_detect_duration_seconds | 2.5 | OCP server tracking data | 60 seconds | 5 minutes |
Alert template
- Alert overview
- Template: ${alarm_target} ${alarm_name}
- Sample: alarm_template_id=0:svr_ip=xxx.xxx.xxx.xxx:type=os_tsar_cpu_util_full ocp_alarm_detect_timeout
- Alert details
- Template: [${alarm_name}] OCP server: ${svr_ip}, alert rule detection time ${value_shown} too long, type=${type}
- Sample: [ocp_alarm_detect_timeout] OCP server: xxx.xxx.xxx.xxx, alert rule detection time 0.003 too long, type=os_tsar_cpu_util_full
Impact on the system
When the detection time of an alert rule exceeds 3s in OCP, the detection is invalid. As a result, alerts may not be triggered when exceptions occur.
Possible causes
- The system status is abnormal, such as insufficient memory or full CPU utilization.
- The system configurations are insufficient to support too many alert objects that involve a large amount of computation.
Solutions
Check whether many alert objects are associated. You can try upgrading the OCP node to improve its performance.
Adjust the alert threshold to 3s. If the problem persists, this detection is invalid and alerts may not be triggered as expected. In this case, obtain related alert information and contact the administrator for troubleshooting.