Description
Note
This alert takes effect only in a cluster of OceanBase Database V4.2.0.0 or later.When an OceanBase cluster cannot connect to its originally associated arbitration service and the automatic switching strategy of the arbitration service group is triggered, the OceanBase cluster automatically switches to another arbitration service in the group. This alert is triggered when the task of switching the arbitration service for an OceanBase cluster fails. The ID of the failed task is provided in the alert.
Principle
| Parameter | Value |
|---|---|
| Metric | arbitration_group_auto_replace_task_status |
| Data source | OCP-Server. The OCP-Server service regularly checks for failed automatic arbitration service switching tasks. If a failed task is detected, OCP-Server parses information of the arbitration service group from the task and triggers an alert. |
| Collected metric | arbitration_group_auto_replace_task_status |
| Metric expression | sum(arbitration_group_auto_replace_task_status{@LABELS}) by (@GBLABELS) |
| Collection cycle | 60 seconds |
Valid values of arbitration_group_auto_replace_task_status is 0 and 1. The value 1 indicates that the arbitration service group does not involve any failed switching task, and the value 0 indicates that the arbitration service group involves a failed switching task.
Alert rule
| Metric expression | Default threshold | Duration | Detection cycle | Time before clearance |
|---|---|---|---|---|
| arbitration_group_auto_replace_task_status | 0 | 120 seconds | 60 seconds | 5 minutes |
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| Metric expression | Critical | OceanBase cluster |
Alert templates
Overview
- Template: ${group_name} ${alarm_name}
- Example: An automatic arbitration service switching task of the arbitration_group_A arbitration service group failed.
Details
- Template: An automatic arbitration service switching task of the ${group_name} arbitration service group failed. Task ID: ${task_id}.
- Example: An automatic arbitration service switching task of the arbitration_group_A arbitration service group failed. Task ID: 12521.
Impact on the system
If an automatic arbitration service switching task fails for an OceanBase cluster, the cluster may lack an arbitration service for a prolonged period. This downgrades the high availability of tenants with two or four full-featured replicas in the cluster. Consequently, a tenant replica failure directly affects system services.
Possible causes
- The server hosting the target arbitration service goes down during the task execution.
- The target arbitration service process stops running during the task execution.
- OCP cannot connect to the OceanBase cluster.
Suggested solutions
- Make sure that the server hosting the target arbitration service is running.
- Make sure that the target arbitration service process is available for external service.
- Make sure that OCP can connect to the OceanBase cluster, and the OceanBase cluster is running.