Description
This alert is triggered when the disk usage of the root directory on the host exceeds 95%.
Principle
| Parameter | Value |
|---|---|
| Metric | root_disk_usage |
| Source | This metric is a basic host monitoring metric collected by node_exporter. You can run df -B1 to view the disk usage of each directory. |
| Collected metric | node_filesystem_avail_bytes, node_filesystem_size_bytes |
| Metric expression | 100 *(1 - avg(node_filesystem_avail_bytes{mountpoint="/",@LABELS}) by (@GBLABELS) / avg(node_filesystem_size_bytes{mountpoint="/",@LABELS}) by (@GBLABELS)) |
| Collection cycle | 1 second |
Alert rule
| Metric expression | Metric description | Default threshold (unit: percentage) | Detection cycle | Elimination cycle |
|---|---|---|---|---|
| root_disk_usage > 95 | The disk usage of the root directory. | 95% | 10 seconds | 5 minutes |
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| Based on the expression of the metric | Critical | Server |
Alert templates
- Overview
- Template: ${alarm_target} ${alarm_name}
- Example: svr_ip=xxx.xxx.xxx.xxx node_file_root_usage
- Details
- Template: Cluster: ${ob_cluster_name}, host: ${svr_ip}, alert: ${alarm_name}, root directory usage ${value_shown} exceeds ${alarm_threshold}%.
- Example: Cluster: obcluster, host: xxx.xxx.xxx.xxx, alert: node_file_root_usage, root directory usage 96.844% exceeds 95%.
Impact on the system
Generally, no files but only built-in system directories such as /bin, /etc, /tmp, and /var are placed under the root directory. If the root directory is full, the system may fail to properly run.
Possible causes
- The root partition is too small.
- Log files occupy excessive space.
- Space occupied by zombie processes is not released.
Suggested solutions
View the usage of the root directory and check whether the alert is triggered because the root partition is too small.
df -hView the space occupied by files and delete invalid files.
## Access the target directory. cd /target-path ## View the directory size. du -h -x --max-depth=1 ## Delete invalid files. rm -rf invalid-filesCheck whether any zombie processes occupy space.
## Check for zombie processes. lsof | grep deleted ## Delete zombie processes. lsof | grep deleted | awk '{print $2}' | xargs kill -9