Alert description
This alert is triggered when the disk space usage of the log directory mount point on the OBServer node exceeds the threshold.
Note
The default log directory for the OBServer node is /data/log1.
Alerting principle
The following table describes the key parameters involved in the alert monitoring logic.
| Parameter | Value |
|---|---|
| Monitoring metric | ob_host_log_disk_percent_over_thresholdDescription |
| Metric source | df -B1 The metric source is relatively special. OCP-Agent uses the above command to check the disk usage of the OBServer and adds related labels to the disk. The labels are as follows: |
| Collected metric | host_partition_volume_free, host_partition_volume_total |
| Monitoring expression | 100 * (1 - avg(host_partition_volume_free{@LABELS}) by (@GBLABELS) / avg(host_partition_volume_total{@LABELS}) by (@GBLABELS)) |
| Collection interval | 1 second |
Rule information
| Monitoring metric | Default threshold (unit: %) | Duration | Detection cycle | Elimination cycle |
|---|---|---|---|---|
| ob_host_log_disk_percent_over_threshold | 85 | 0 seconds | 60 seconds | 5 minutes |
Alert information
| Alert trigger method | Alert level | Scope |
|---|---|---|
| Expression based on monitoring metrics | Severe | Server |
Alert template
Alert summary
- Template: ${alarm_target} ${alarm_name}
- Example: ob_cluster=obcluster-1:svr_ip=xxx.xxx.xxx.xxx OceanBase server log disk usage exceeds the limit
Alert details
- Template: Cluster: ${ob_cluster_name}, Host: ${host}, Alert: ${alarm_name}. The disk usage of the log directory ${log_disk_path} at the mount point ${mount_point} is ${value_shown}, which exceeds ${alarm_threshold} %.
- Example: Cluster: obcluster-1, Host: xxx.xxx.xxx.xxx, Alert: OceanBase server log disk usage exceeds the limit. The disk usage of the log directory /data/log1 at the mount point /data/log1 is 98.0%, which exceeds 85.0%.
Alert recovery
- Template: Alert: ${alarm_name}, OBServer log disk usage: ${value_shown}
- Example: Alert: OceanBase server log disk usage exceeds the limit, OBServer log disk usage: 80.0%
Impact on the system
If the remaining space on the OBServer node's log disk is insufficient, the OBServer node may not work properly.
Possible causes
There are too many files generated by other programs.
Solution
Run the
dfcommand to check whether the disk usage of/data/log1exceeds the limit.If it does, run the following commands to find the directory and file that occupy the most space.
# Find the five directories that occupy the most space. du -a /data/log1 | sort -n -r | head -n 5 # Find the five largest files. cd /data/log1 && find -type f -exec du -Sh {} + | sort -rh | head -n 5Delete the unnecessary files.