Description
The request latency of the OCP HTTP URI is too long. When the average request time exceeds 5 seconds, this alert is triggered.
Principle
| Parameter | Value |
|---|---|
| Metric | ocp_http_request_duration_seconds |
| Source | OCP server tracking data |
| Collected metric | ocp_http_request_duration_seconds |
| Metric expression | sum(rate(http_server_requests_seconds_sum{status="200",uri!~"/api/v2/software-package.|/api/v2/object-storage.",@LABELS}[@INTERVAL])) by (@GBLABELS) / sum(rate(http_server_requests_seconds_count{status="200",uri!~"/api/v2/software-package.|/api/v2/object-storage.",@LABELS}[@INTERVAL])) by (@GBLABELS)\ |
| Collection cycle | 60 seconds |
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| Based on the expression of the metric | Warning | OCP node |
Alert rule
| Metric | Default threshold | Source | Detection cycle | Elimination cycle |
|---|---|---|---|---|
| ocp_http_request_duration_seconds | 5 | OCP server tracking data | 60 seconds | 5 minutes |
Alert template
- Alert overview
- Template: ${alarm_target} ${alarm_name}
- Sample: alarm_template_id=0:svr_ip=xxx.xxx.xxx.xxx:uri=/api/v2/time:method=GET ocp_http_request_timeout
- Alert details
- Template: [${alarm_name}] OCP server: ${svr_ip} HTTP request time is ${value_shown}, which is too long. uri=${uri}, method=${method}
- Sample: [ocp_http_request_timeout] OCP server: xxx.xxx.xxx.xxx HTTP request time is 0, which is too long. uri=/api/v2/time, method=GET
Impact on the system
Slow OCP requests will affect user experience or even affect system features.
Possible causes
- The network is unstable.
- The system status is abnormal, such as insufficient memory or full CPU utilization.
- The business code is inefficient.
Solutions
If the problem is caused by expected network jitters or network latency, increase the check threshold.
If the problem of long latency frequently occurs:
Check the status of the CPU server, such as CPU utilization and memory usage. You can adjust the CPU and memory resources as needed.
If the server status is normal but the request time is still long, contact the administrator to check the logs and confirm whether the URI has performance issues.