ob_host_data_disk_percent_over_threshold

2024-11-06 03:13:28  Updated

Description

This alert is triggered when the usage of the disk where the data directory of the OBServer locates exceeds the threshold.

Note

The default data directory for the OBServer is /data/1.

Principle

The following table describes the key parameters that are involved in the monitoring and alerting logic.

Parameter Value
Metric ob_host_data_path_disk_percent
Note
This alert is triggered when the usage of the disk where the data directory of the OBServer locates exceeds the threshold, this alert is triggered.
Source df -B1
Note
The metric source of this alert is special. OCP-Agent uses the preceding command to check the disk usage of OBServer and adds labels to disks. The labels vary with the disks:
  • Disk of the installation directory: mount_lable="install_path"
  • Disk of the data directory: mount_lable="data_path"
  • Disk of the log directory: mount_lable="log_path"
Collected metrics host_partition_volume_free、host_partition_volume_total
Metric expression 100 * (1 - avg(host_partition_volume_free{@LABELS}) by (@GBLABELS) / avg(host_partition_volume_total{@LABELS}) by (@GBLABELS))
Collection cycle 1 second

Alert rule

Metric Default threshold (unit: %) Source Duration Detection cycle Elimination cycle
ob_host_data_path_disk_percent 97 Host 0s 10s 5 min

Alert information

Trigger method Alert level Scope
Based on the expression of the metric Critical Server

Alert templates

  • Overview: ${alarm_target} ${alarm_name}

  • Details: Cluster: ${ob_cluster_name}, Host: ${host}, Alert: ${alarm_name}. The disk usage of the mount point ${mount_point} of the data directory ${data_disk_path} is ${value}, exceeding the ${alarm_threshold} threshold.

  • Overview example: ob_cluster=C1-1000:svr_ip=xxx.xxx.xxx.xxx. The usage of the disk where the data directory of the OBServer locates exceeds the threshold.

  • Details example: Cluster: ob_cluster=C1-1000, Host:xxx.xxx.xxx.xxx. Alert: The usage of the disk where the data directory of the OBServer locates exceeds the threshold. The disk usage of the mount point /data/1 of the data directory /data/1 is 98.0%, exceeding the threshold of 97.0%.

Impact on the system

Certain space on the OBServer data disk is reserved for core dump. If the remaining space on the disk is insufficient, the core dump process of the OBServer will be affected.

Possible cause

The directory stores excessive files generated by other programs.

Solution

  1. Run the df command to check whether the usage of the disk where /data/1 locates exceeds the threshold.

  2. Check whether the following alerts or alerts related to other data disks are reported on the Alerts page of the OCP console.

    ob_server_sstable_percent_over_threshold

    • If such alerts exist, handle these alerts by referring to solutions for other alerts and check whether these alerts are automatically cleared.

    • If not, go to the next step.

  3. Run the following command to locate the directories and files that occupy the most space:

    # Find the five directories that occupy the most space.
    du -a /data/1 | sort -n -r | head -n 5
    
    # Find the five largest files.
    cd /data/1 && find -type f -exec du -Sh {} + | sort -rh | head -n 5
    
  4. Delete unwanted files.

Contact Us