Description
This alert is triggered when the root directory usage exceeds 95%.
Principle
| Parameter | Value |
|---|---|
| Monitoring metric | root_disk_usage |
| Metric source | Host basic monitoring, collected by node_exporter Run df -B1 to view the disk usage of each directory. |
| Metric collection | node_filesystem_avail_bytes, node_filesystem_size_bytes |
| Monitoring expression | 100 *(1 - avg(node_filesystem_avail_bytes{mountpoint="/",@LABELS}) by (@GBLABELS) / avg(node_filesystem_size_bytes{mountpoint="/",@LABELS}) by (@GBLABELS)) |
| Collection interval | 1 second |
Rule information
| Monitoring expression | Description | Default threshold | Detection interval | Elimination interval |
|---|---|---|---|---|
| root_disk_usage > 95 | Root directory disk usage | 95% | 10 seconds | 5 minutes |
Alert information
| Alert triggering method | Alert level | Scope |
|---|---|---|
| Monitoring expression | Critical | Server |
Alert template
- Alert summary
- Template: ${alarm_target} ${alarm_name}
- Example: svr_ip=xxx.xxx.xxx.xxx High root directory usage on the server
- Alert details
- Template: Cluster: ${ob_cluster_name}, Host: ${host}, Alert: ${alarm_name}. Root directory usage is ${value_shown}%, which exceeds ${alarm_threshold} %.
- Example: Cluster: obcluster-1, Host: xxx.xxx.xxx.xxx, Alert: High root directory usage on the server. Root directory usage is 96.844 %, which exceeds 95 %.
- Alert recovery
- Template: Alert: ${alarm_name}, Root directory disk usage: ${value_shown}
- Example: Alert: High root directory usage on the server, Root directory disk usage: 80 %
Impact on the system
The root directory is generally used to store only system directories such as /bin, /etc, /tmp, and /var. If the root directory is full, it may affect the normal operation of the system.
Possible causes
- The root directory partition is too small.
- Log files occupy too much space.
- Zombie processes occupy space that is not released.
Procedure
Check the usage of the root directory to confirm whether it is caused by a small partition.
df -hCheck the file usage and delete invalid files.
## Go to the target directory. cd /target-path ## View the directory size. du -h -x --max-depth=1 ## Delete invalid files. rm -rf invalid-filesCheck whether zombie processes occupy space.
## Check whether zombie processes exist. lsof | grep deleted ## Delete zombie processes. lsof | grep deleted | awk '{print $2}' | xargs kill -9