ocp_alarm_detect_timeout

2025-09-08 08:15:43  Updated

Description

When the detection of an alert rule times out in OceanBase Cloud Platform (OCP), this alert is triggered. When the detection time of an alert rule exceeds 2.5s, this alert is triggered.

Principle

Parameter Value
Metric ocp_alarm_detect_duration_seconds
Source OCP server tracking data
Collected metric ocp_alarm_detect_duration_seconds
Metric expression (sum(rate(ocp_alarm_detect_duration_ms_sum{@LABELS}[@INTERVAL])) by (@GBLABELS) / sum(rate(ocp_alarm_detect_duration_ms_count{@LABELS}[@INTERVAL])) by (@GBLABELS)) / 1000
Collection cycle 60 seconds

Alert information

Trigger method Alert level Scope
Based on the expression of the metric Warning OCP node

Alert rule

Metric Default threshold Source Detection cycle Elimination cycle
ocp_alarm_detect_duration_seconds 2.5 OCP server tracking data 60 seconds 5 minutes

Alert template

  • Alert overview
    • Template: ${alarm_target} ${alarm_name}
    • Sample: alarm_template_id=0:svr_ip=xxx.xxx.xxx.xxx:type=os_tsar_cpu_util_full ocp_alarm_detect_timeout
  • Alert details
    • Template: [${alarm_name}] OCP alert rule: ${type}, detection time ${value_shown} too long.
    • Sample: [OCP alert rule detection timeout] OCP alert rule: ocp_meta_db_disconnected, detection time 4 seconds too long.
  • Alert recovery
    • Template: Alert: ${alarm_name}, OCP alert rule detection time: ${recover_value}
    • Sample: Alert: OCP alert rule detection timeout, OCP alert rule detection time: 2 seconds

Impact on the system

When the detection time of an alert rule exceeds 3s in OCP, the detection is invalid. As a result, alerts may not be triggered when exceptions occur.

Possible causes

  • The system status is abnormal, such as insufficient memory or full CPU utilization.
  • The system configurations are insufficient to support too many alert objects that involve a large amount of computation.

Solutions

  1. Check whether many alert objects are associated. You can try upgrading the OCP node to improve its performance.

  2. Adjust the alert threshold to 3s. If the problem persists, this detection is invalid and alerts may not be triggered as expected. In this case, obtain related alert information and contact the administrator for troubleshooting.

Contact Us