Description
The request latency of the OCP HTTP URI is too long. When the average request time exceeds 5 seconds, this alert is triggered.
Principle
| Parameter | Value |
|---|---|
| Metric | ocp_http_request_duration_seconds |
| Source | OCP server tracking data |
| Collected metric | ocp_http_request_duration_seconds |
| Metric expression | sum(rate(http_server_requests_seconds_sum{status="200",uri!~"/api/v2/software-package.|/api/v2/object-storage.",@LABELS}[@INTERVAL])) by (@GBLABELS) / sum(rate(http_server_requests_seconds_count{status="200",uri!~"/api/v2/software-package.|/api/v2/object-storage.",@LABELS}[@INTERVAL])) by (@GBLABELS)\ |
| Collection cycle | 60 seconds |
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| Based on the expression of the metric | Warning | OCP node |
Alert rule
| Metric | Default threshold | Source | Detection cycle | Elimination cycle |
|---|---|---|---|---|
| ocp_http_request_duration_seconds | 5 | OCP server tracking data | 60 seconds | 5 minutes |
Alert template
- Alert overview
- Template: ${alarm_target} ${alarm_name}
- Sample: alarm_template_id=0:svr_ip=xxx.xxx.xxx.xxx:uri=/api/v2/time:method=GET ocp_http_request_timeout
- Alert details
- Template: [${alarm_name}] OCP HTTP request for URI ${uri} and method ${method} took ${value_shown} seconds, which is excessively long.
- Sample: [OCP HTTP request timeout] OCP HTTP request for URI /api/v2/time and method GET took 11 seconds, which is excessively long.
- Alert recovery
- Template: Alert: ${alarm_name}, OCP HTTP request took: ${value_shown}
- Sample: Alert: OCP HTTP request timeout, OCP HTTP request took: 3 seconds
Impact on the system
Slow OCP requests will affect user experience or even affect system features.
Possible causes
- The network is unstable.
- The system status is abnormal, such as insufficient memory or full CPU utilization.
- The business code is inefficient.
Solutions
If the problem is caused by expected network jitters or network latency, increase the check threshold.
If the problem of long latency frequently occurs:
Check the status of the CPU server, such as CPU utilization and memory usage. You can adjust the CPU and memory resources as needed.
If the server status is normal but the request time is still long, contact the administrator to check the logs and confirm whether the URI has performance issues.