node_file_root_usage

2023-08-15 02:30:56  Updated

Description

This alert is triggered when the disk usage of the root directory on the host exceeds 95%.

Principle

Parameter Value
Metric root_disk_usage
Source This metric is a basic host monitoring metric collected by node_exporter.
You can run df -B1 to view the disk usage of each directory.
Collected metric node_filesystem_avail_bytes, node_filesystem_size_bytes
Metric expression 100 *(1 - avg(node_filesystem_avail_bytes{mountpoint="/",@LABELS}) by (@GBLABELS) / avg(node_filesystem_size_bytes{mountpoint="/",@LABELS}) by (@GBLABELS))
Collection cycle 1 second

Alert rule

Metric expression Metric description Default threshold (unit: percentage) Detection cycle Elimination cycle
root_disk_usage > 95 The disk usage of the root directory. 95% 10 seconds 5 minutes

Alert information

Trigger method Alert level Scope
Based on the expression of the metric Critical Server

Alert templates

  • Overview
    • Template: ${alarm_target} ${alarm_name}
    • Example: svr_ip=192.168.0.1 node_file_root_usage
  • Details
    • Template: Cluster: ${ob_cluster_name}, host: ${svr_ip}, alert: ${alarm_name}, root directory usage ${value_shown} exceeds ${alarm_threshold}%.
    • Example: Cluster: obcluster, host: 192.168.0.1, alert: node_file_root_usage, root directory usage 96.844% exceeds 95%.

Impact on the system

Generally, no files but only built-in system directories such as /bin, /etc, /tmp, and /var are placed under the root directory. If the root directory is full, the system may fail to properly run.

Possible causes

  • The root partition is too small.
  • Log files occupy excessive space.
  • Space occupied by zombie processes is not released.

Suggested solutions

  1. View the usage of the root directory and check whether the alert is triggered because the root partition is too small.

    df -h
    
  2. View the space occupied by files and delete invalid files.

    ## Access the target directory.
    cd /target-path
    ## View the directory size.
    du -h -x --max-depth=1
    ## Delete invalid files.
    rm -rf invalid-files
    
  3. Check whether any zombie processes occupy space.

    ## Check for zombie processes.
    lsof | grep deleted
    ## Delete zombie processes.
    lsof | grep deleted | awk '{print $2}' | xargs kill -9
    

Contact Us