Description
OBServer nodes in the OceanBase cluster fail to communicate with the arbitration service. This alert is valid only in a cluster of OceanBase Database V4.2.0 or later that is associated with the arbitration service.
Principle
| Parameter | Value |
|---|---|
| Metric | arbitration_status_inactive_server_count |
| Source | Connect to OceanBase Database and perform this query: select /*+ MONITOR_AGENT */ count(1) as inactive_server_count from GV$OB_ARBITRATION_SERVICE_STATUS where status = 'INACTIVE' |
| Collected metric | arbitration_status_inactive_server_count |
| Metric expression | max(arbitration_status_inactive_server_count{@LABELS}) by (@GBLABELS) |
| Collection cycle | 60 seconds |
Alert rule
| Metric expression | Metric description | Default threshold | Detection cycle | Time before clearance |
|---|---|---|---|---|
| arbitration_status_inactive_server_count > 0 | The OceanBase cluster contains OBServer nodes that cannot communicate with the arbitration service. | 0 | 60 seconds | 5 minutes |
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| The network between an OBServer node and the arbitration service fails. | Critical | Cluster |
Alert templates
Overview
- Template: ${alarm_target} ${alarm_name}
- Example: ob_cluster=TEST. The OceanBase cluster contains OBServer nodes that cannot communicate with the arbitration service.
Details
- Template: Cluster: ${ob_cluster_name}. Alert: ${alarm_name}. The ${ob_cluster_name} cluster contains ${value_shown} OBServer nodes that cannot communicate with the arbitration service.
- Example: Cluster: TEST. Alert: The OceanBase cluster contains OBServer nodes that cannot communicate with the arbitration service. The TEST cluster contains 2 OBServer nodes that cannot communicate with the arbitration service.
Impact on the system
If an OBServer node fails to communicate with the arbitration service, the arbitration feature may be abnormal and cannot provide the high availability service for tenants with two or four full-featured replicas. When half of the replicas are unavailable in a tenant with the arbitration service enabled, service exceptions occur in the tenant.
Possible causes
- The network between an OBServer node and the arbitration service fails.
- The arbitration service process is abnormal.
Suggested solutions
Connect to the cluster as the root@sys user. Execute the following SQL statement to check for OBServer nodes in the INACTIVE state.
SELECT svr_ip, svr_port, arbitration_service_address, status FROM GV$OB_ARBITRATION_SERVICE_STATUS;Check the network between these OBServer nodes and the arbitration service.
Check whether the arbitration service process runs properly.