Alert description
This alert indicates that the disk usage of an OCP node has exceeded the limit.
Alert principle
The following table lists the key parameters involved in the monitoring logic of this alert.
Parameter |
Value |
|---|---|
| Monitoring Metrics | ocp_disk_percent: The disk usage of the OCP node. An alert is triggered when the disk usage exceeds the threshold. |
| Monitoring Expression | 100 * (sum(disk_total_bytes{@LABELS}) by (@GBLABELS) - sum(disk_free_bytes{@LABELS}) by (@GBLABELS)) / sum(disk_total_bytes{@LABELS}) by (@GBLABELS) |
| Metric Collection | disk_total_bytes{app="OCP", instance="xx.xx.xx.xx:8080", job="custom", path="/home/admin/.", svr_ip="xx.xx.xx.xx", svr_port="8080"} |
| Metric Source | The OCP process uses the spring-boot-starter-actuator component to collect disk data by reading from the/proc/diskstatsand/proc/mountsfiles to obtain disk usage. |
| Collection Cycle | 5 Seconds |
Rule information
Monitoring Metrics |
Default Threshold (Unit: %) |
Duration |
Detection Cycle |
Elimination Cycle |
|---|---|---|---|---|
| ocp_disk_percent | This metric has two default thresholds: |
60 Seconds | 10 Seconds | 5 Minutes |
Alert information
Alert Trigger Method |
Alert Level |
Scope |
|---|---|---|
| Based on monitoring metric expressions | service |
Alert template
Alert Overview
- Template: ${alarm_target} ${alarm_name}
- Example: alarm_template_id=0:svr_ip=xx.xx.xx.xx:svr_port=8080 OCP Node Disk Usage Exceeds Threshold
Alert Details
- Template: Alert: ${alarm_name}, disk usage ${value_shown} exceeds ${alarm_threshold} %.
- Example: Alert: OCP node disk usage exceeds the limit, with disk usage of 90% exceeding the 80% threshold.
Alert recovery
- Template: Alert: ${alarm_name}, OCP Node Disk Usage Exceeds Threshold: ${value_shown}
- Example: Alert: OCP Node Disk Usage Exceeds Threshold, OCP Node Disk Usage Exceeds Threshold: 10 %
Impact on the system
When the disk usage of an OCP node exceeds a certain threshold, it may affect the configuration loading of OCP production logs or disks.
Possible causes
- The OCP deployment directory
/home/admin/ocp-server/logcontains a large number of expired logs. - Other tools are deployed in the OCP deployment directory.
Solution
- Log in to OCP, and choose System Management > Platform Monitoring from the left-side navigation pane to view the performance monitoring and HTTP request monitoring of the OCP platform. Observe whether related performance metrics such as memory, disk, and system load are normal.
- Clean up expired logs in the OCP deployment directory
/home/admin/ocp-server/log. - Clean up other tools in the OCP deployment directory.
