Description
This alert is triggered when an error occurs while compacting OceanBase clusters.
Principle
The following table describes the key parameters that are involved in the monitoring and alerting logic.
| Parameter | Value |
|---|---|
| Metric | ob_cluster_merge_error_flag |
| Source | SQL: javascript select zone, name, value, time_to_usec(now()) from __all_zone; |
| Collected metric | zone_value |
| Metric expression | max(zone_value{metric_group="all_zone",name="is_merge_error",@LABELS}) by (@GBLABELS) |
| Collection cycle | 1 second |
If a major compaction request, initiated automatically or manually, fails, the corresponding flag bit is set and the major compaction status is updated in the __all_zone table. The value of the value field in the __all_zone table is used as the value of the collected metric, and the values of other fields are used as labels.
The value of the metric indicates the cluster compaction status. When the value is 0, the cluster compaction is normal. When the value is 1, an error occurred while compacting the cluster, and the alert is triggered.
Alert rule
| Metric | Default threshold | Duration | Detection cycle | Time before clearance |
|---|---|---|---|---|
| ob_cluster_merge_error_flag | 1 | 0 seconds | 60 seconds | 5 minutes |
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| Metric expression | Critical | Cluster |
Alert templates
Overview: ${alarm_target} ${alarm_name}
Details: ${alarm_target} ${alarm_name}
Overview example: ob_cluster=C1-1000. OceanBase cluster compaction failed.
Details example: ob_cluster=C1-1000. OceanBase cluster compaction failed.
Impact on the system
Replicas of the OceanBase cluster are inconsistent. For example, data between replicas is inconsistent, or the index table is different from the primary table.
Switchover cannot be performed between the primary and standby OceanBase clusters to avoid spreading the error.
Possible causes
Technically, only some extreme cases may cause this error.
Suggested solution
For more information about how to troubleshoot this error, see Exception handling for OceanBase cluster compaction.
Improper handling may cause data inconsistency. If you do not have professional knowledge, contact OCP Technical Support for help.