arbitration_group_auto_replace_failed

2025-03-26 07:47:21  Updated

Description

Note

This alert takes effect only in a cluster of OceanBase Database V4.2.0.0 or later.

When an OceanBase cluster cannot connect to its originally associated arbitration service and the automatic switching strategy of the arbitration service group is triggered, the OceanBase cluster automatically switches to another arbitration service in the group. This alert is triggered when the task of switching the arbitration service for an OceanBase cluster fails. The ID of the failed task is provided in the alert.

Principle

Parameter Value
Metric arbitration_group_auto_replace_task_status
Data source OCP-Server. The OCP-Server service regularly checks for failed automatic arbitration service switching tasks. If a failed task is detected, OCP-Server parses information of the arbitration service group from the task and triggers an alert.
Collected metric arbitration_group_auto_replace_task_status
Metric expression sum(arbitration_group_auto_replace_task_status{@LABELS}) by (@GBLABELS)
Collection cycle 60 seconds

Valid values of arbitration_group_auto_replace_task_status is 0 and 1. The value 1 indicates that the arbitration service group does not involve any failed switching task, and the value 0 indicates that the arbitration service group involves a failed switching task.

Alert rule

Metric expression Default threshold Duration Detection cycle Time before clearance
arbitration_group_auto_replace_task_status 0 120 seconds 60 seconds 5 minutes

Alert information

Trigger method Alert level Scope
Metric expression Critical OceanBase cluster

Alert templates

  • Overview

    • Template: ${group_name} ${alarm_name}
    • Example: An automatic arbitration service switching task of the arbitration_group_A arbitration service group failed.
  • Details

    • Template: An automatic arbitration service switching task of the ${group_name} arbitration service group failed. Task ID: ${task_id}.
    • Example: An automatic arbitration service switching task of the arbitration_group_A arbitration service group failed. Task ID: 12521.

Impact on the system

If an automatic arbitration service switching task fails for an OceanBase cluster, the cluster may lack an arbitration service for a prolonged period. This downgrades the high availability of tenants with two or four full-featured replicas in the cluster. Consequently, a tenant replica failure directly affects system services.

Possible causes

  1. The server hosting the target arbitration service goes down during the task execution.
  2. The target arbitration service process stops running during the task execution.
  3. OCP cannot connect to the OceanBase cluster.

Suggested solutions

  1. Make sure that the server hosting the target arbitration service is running.
  2. Make sure that the target arbitration service process is available for external service.
  3. Make sure that OCP can connect to the OceanBase cluster, and the OceanBase cluster is running.

Contact Us