Alert description
This alert monitors whether an OceanBase cluster in OCP has not triggered a major compaction for a long time. If more than 108000 seconds (default) have passed since the last major compaction, the alert is triggered.
Alerting principle
The following table describes the key parameters involved in the alert monitoring logic.
| Parameter | Value |
|---|---|
| Monitoring metric | ob_cluster_no_merge_seconds |
| Metric source | SQL: select zone, name, value, time_to_usec(now()) from __all_zone;
|
| Sampled metric (unit: microseconds) | current_timestamp, zone_value |
| Monitoring expression | (max(current_timestamp{metric_group="all_zone",name="merge_start_time",@LABELS}) by (@GBLABELS) - max(zone_value{metric_group="all_zone",name="merge_start_time",@LABELS}) by (@GBLABELS)) / 1000000 |
| Sampling interval | 1 second |
The value of the monitoring metric ob_cluster_no_merge_seconds indicates how long the OceanBase cluster has not performed a major compaction. An alert is triggered when this value exceeds the threshold (which is 108000 seconds by default).
Rule information
| Monitoring metric | Default threshold (unit: seconds) | Duration | Detection cycle | Elimination cycle |
|---|---|---|---|---|
| ob_cluster_no_merge_seconds | 108000 | 0 seconds | 60 seconds | 5 minutes |
Alert information
| Alert trigger method | Alert level | Scope |
|---|---|---|
| Based on the expression of the monitoring metric | Critical | Cluster |
Alert template
Alert summary
- Template: ${alarm_target} ${alarm_name}
- Example: ob_cluster=obcluster-1 OceanBase cluster merge detection failed
Alert details
- Template: Cluster: ${ob_cluster_name}, Alert: ${alarm_name}. The time since the last major compaction is ${value_shown} seconds, which exceeds ${alarm_threshold} seconds.
- Example: Cluster: obcluster-1, Alert: OceanBase cluster merge detection failed. The time since the last major compaction is 108001.0 seconds, which exceeds 108000 seconds.
Alert recovery
- Template: Alert: ${alarm_name}, OceanBase cluster time since the last major compaction: ${value_shown}
- Example: Alert: OceanBase cluster merge detection failed, OceanBase cluster time since the last major compaction: 70000.0 seconds
Impact on the system
The Root Service of the OceanBase cluster regularly initiates major compactions. If no major compactions or merges are initiated for a long time, the disk space will increase, leading to disk space exhaustion and affecting business write operations.
Possible causes
This is commonly due to the Root Service service being paused, for example, due to no leader or abnormal process.
Solution
For more information, see ob_cluster_no_frozen OB cluster freeze detection failed.