Description
This alert monitors whether the OceanBase cluster merge is normal in OCP. If an error occurs during the merge, an alert is triggered.
Principle
The following table describes the key parameters involved in the monitoring logic.
| Parameter | Value |
|---|---|
| Monitoring metric | merge_error_flag |
| Metric source | SQL: select /*+ MONITOR_AGENT READ_CONSISTENCY(WEAK) */ zone, name, value, time_to_usec(now()) as current from __all_zone |
| Metric collection | zone_value |
| Monitoring expression | max(ob_zone_stat{name="is_merge_error",@LABELS}) by (@GBLABELS) |
| Metric collection interval | 1 second |
Whether the merge is automatically or manually initiated, if the merge fails, the system sets a flag and updates the merge status to the __all_zone table. The collection metric value is obtained from the value field in the __all_zone table, and other field values serve as labels.
The value of the monitoring metric indicates whether an error occurred during the cluster merge. A value of 0 indicates that the cluster merge was successful, while a value of 1 indicates an error.
An alert is triggered when the monitoring metric value is 1.
Rule information
| Monitoring metric | Default threshold | Duration | Detection interval | Elimination interval |
|---|---|---|---|---|
| merge_error_flag | 1 | 0 seconds | 10 seconds | 5 minutes |
Alert information
| Alert trigger method | Alert level | Scope |
|---|---|---|
| Monitoring expression | Critical | Cluster |
Alert template
Alert summary
- Template: ${alarm_target} ${alarm_name}
- Example: ob_cluster=obcluster-1 OceanBase cluster merge error
Alert details
- Template: Cluster: ${ob_cluster_name}, Alert: ${alarm_name}.
- Example: Cluster: obcluster-1, Alert: OceanBase cluster merge error.
Alert recovery
- Template: Alert: ${alarm_name}, OceanBase cluster merge error flag: ${recover_value}
- Example: Alert: OceanBase cluster merge error, OceanBase cluster merge error flag: 0
Impact on the system
Data in some replicas of the OceanBase cluster is inconsistent, either between replicas or between the primary table and index table.
A primary/standby OceanBase cluster cannot be switched, otherwise the failure will be passed to the relational cluster.
Possible causes
Theoretically, an extreme scenario may exist, and no common causes are typically involved.
Resolution
Please refer to OceanBase cluster merge error handling to troubleshoot the issue.
Improper handling may result in data inconsistencies. If you are not a professional, contact technical support.