Description
This alert is triggered when the usage of the disk where the data directory of the OBServer locates exceeds the threshold.
Note
The default data directory for the OBServer is /data/1.
Principle
The following table describes the key parameters that are involved in the monitoring and alerting logic.
| Parameter | Value |
|---|---|
| Metric | ob_host_data_path_disk_percent
Note |
| Source | df -B1
Note |
| Collected metrics | host_partition_volume_free、host_partition_volume_total |
| Metric expression | 100 * (1 - avg(host_partition_volume_free{@LABELS}) by (@GBLABELS) / avg(host_partition_volume_total{@LABELS}) by (@GBLABELS)) |
| Collection cycle | 1 second |
Alert rule
| Metric | Default threshold (unit: %) | Source | Duration | Detection cycle | Elimination cycle |
|---|---|---|---|---|---|
| ob_host_data_path_disk_percent | 97 | Host | 0s | 10s | 5 min |
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| Based on the expression of the metric | Critical | Server |
Alert templates
Overview: ${alarm_target} ${alarm_name}
Details: Cluster: ${ob_cluster_name}, Host: ${host}, Alert: ${alarm_name}. The disk usage of the mount point ${mount_point} of the data directory ${data_disk_path} is ${value}, exceeding the ${alarm_threshold} threshold.
Overview example: ob_cluster=C1-1000:svr_ip=xxx.xxx.xxx.xxx. The usage of the disk where the data directory of the OBServer locates exceeds the threshold.
Details example: Cluster: ob_cluster=C1-1000, Host:xxx.xxx.xxx.xxx. Alert: The usage of the disk where the data directory of the OBServer locates exceeds the threshold. The disk usage of the mount point /data/1 of the data directory /data/1 is 98.0%, exceeding the threshold of 97.0%.
Impact on the system
Certain space on the OBServer data disk is reserved for core dump. If the remaining space on the disk is insufficient, the core dump process of the OBServer will be affected.
Possible cause
The directory stores excessive files generated by other programs.
Solution
Run the
dfcommand to check whether the usage of the disk where/data/1locates exceeds the threshold.Check whether the following alerts or alerts related to other data disks are reported on the Alerts page of the OCP console.
ob_server_sstable_percent_over_threshold
ob_zone_sstable_percent_over_threshold
If such alerts exist, handle these alerts by referring to solutions for other alerts and check whether these alerts are automatically cleared.
If not, go to the next step.
Run the following command to locate the directories and files that occupy the most space:
# Find the five directories that occupy the most space. du -a /data/1 | sort -n -r | head -n 5 # Find the five largest files. cd /data/1 && find -type f -exec du -Sh {} + | sort -rh | head -n 5Delete unwanted files.