Description
This alert is triggered when the usage of the disk mounted to the OBServer data directory exceeds the threshold. Note
The default data directory of OBServer is /data/1.
Principle
The following table describes the key parameters that are involved in the monitoring and alerting logic.
| Parameter | Value |
|---|---|
| Metric | ob_host_data_path_disk_percent Note This metric indicates the usage of the OBServer data disk. When the usage is greater than the threshold, this alert is triggered. The default threshold is 97%. |
| Source | unknow df -B1 Note The metric source of this alert is special. OCP-Agent runs the df -B1 command to check the disk usage of OBServer and adds labels to the disk. The following labels are used: * Disk of the installation directory: mount_lable="install_path" * Disk of the data directory: mount_lable="data_path" * Disk of the log directory: mount_lable="log_path" |
| Collected metrics | host_partition_volume_free and host_partition_volume_total |
| Metric expression | 100 * (1 - avg(host_partition_volume_free{@LABELS}) by (@GBLABELS) / avg(host_partition_volume_total{@LABELS}) by (@GBLABELS)) |
| Collection cycle | 1 second |
Alert rule
| Metric | Default threshold (unit: %) | Duration | Detection cycle | Time before clearance |
|---|---|---|---|---|
| ob_host_data_path_disk_percent | 97 | 0 seconds | 10 seconds | 5 minutes |
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| Metric expression | Critical | Server |
Alert templates
Overview: ${alarm_target} ${alarm_name}
Details: ${alarm_target} ${alarm_name}. The disk usage in the ${data_disk_path} data directory at the mount point ${mount_point} is ${value}%, exceeding the threshold of ${alarm_threshold}%.
Overview example: ob_cluster=C1-1000:svr_ip=192.168.1.1. The disk usage of the OBServer data directory exceeds the threshold.
Details example: ob_cluster=C1-1000:svr_ip=192.168.1.1. The disk usage in the data directory /data/1 of the mount point /data/1 is 98.0%, exceeding the threshold of 97.0%.
Impact on the system
The OBServer data disk has some space for the core dump. If the remaining space is insufficient, it may affect the core dump of the OBServer.
Possible cause
Too many files are generated by other applications.
Suggested solutions
Run the
dfcommand to check whether the disk usage of/data/1exceeds the threshold.Go to the Alerts page of the OCP console, and check whether the following alerts or other data disk-related alerts are triggered.
ob_server_sstable_percent_over_threshold
ob_zone_sstable_percent_over_threshold
If yes, solve the problems that caused those alerts first and check whether the ob_host_data_disk_percent_over_threshold alert is automatically cleared.
Otherwise, go to the next step.
Run the following commands to find the directories and files that occupy the most space:
# Find the five directories that occupy the most space. du -a /data/1 | sort -n -r | head -n 5 # Find the five largest files. cd /data/1 && find -type f -exec du -Sh {} + | sort -rh | head -n 5Delete unwanted files.