ocp_http_request_too_many_errors_occur

2024-08-23 10:14:19  Updated

Description

When the average number of errors per second of an OCP HTTP URI exceeds the threshold, this alert is triggered.

Principle

Parameter Value
Metric ocp_http_request_error_count
Source OCP server tracking data
Collected metric ocp_http_request_error_count
Metric expression sum(rate(http_server_requests_seconds_count{status=~"4..|5..",@LABELS}[@INTERVAL])) by (@GBLABELS)\
Collection cycle 60 seconds

Alert information

Trigger method Alert level Scope
Based on the expression of the metric Warning OCP node

Alert rule

Metric Default threshold Source Detection cycle Elimination cycle
ocp_http_request_error_count 1 OCP server tracking data 60 seconds 5 minutes

Alert template

  • Alert overview
    • Template: ${alarm_target} ${alarm_name}
    • Sample: alarm_template_id=0:svr_ip=xxx.xxx.xxx.xxx:status=500:uri=root:method=GET ocp_http_request_too_many_errors_occur
  • Alert details
    • Template: [${alarm_name}] OCP server: ${svr_ip}, an average of ${value_shown} HTTP requests failed per second, uri=${uri}, method=${method}
    • Sample: [ocp_http_request_too_many_errors_occur] OCP server: xxx.xxx.xxx.xxx, an average of 1.2 HTTP requests failed per second, uri=root, method=GET

Impact on the system

If too many errors occur on an OCP URI, this URI may have been disabled.

Possible causes

  • The network is unstable.
  • The system status is abnormal, such as insufficient memory or full CPU utilization.
  • An external request attack is initiated.
  • A call error occurs in internal requests.

Solutions

  1. Based on the request URI in the alert details, check whether too many call errors have occurred due to configuration errors. If yes, correct the configuration errors.

  2. Manual operations will not frequently cause so many error requests. If this alert is triggered, provide related logs and alert records to the administrator for troubleshooting.

Contact Us