Alert description
Monitors the availability of the arbitration service process.
Alert principle
| Parameter | Value |
|---|---|
| Monitoring metric | arbitration_service_available |
| Source | Check whether the local OBServer process of the OCP Agent has been started and the arbitration service port has been occupied. |
| Collection metric | arbitration_service_available |
| Monitoring expression | sum(arbitration_service_available{@LABELS}) by (@GBLABELS) |
| Sampling cycle | 30s |
Rule information
| Monitoring Expression | Default Threshold | Duration | Detection Cycle | Elimination Cycle |
|---|---|---|---|---|
| arbitration_service_available | 0 | 0 seconds | 30 seconds | 5 minutes |
Alert information
| Alert Trigger Method | Alert Severity | Scope |
|---|---|---|
| Expression based on monitoring metrics | Critical | Arbitration service |
Alert template
- Alert Overview
- Template: ${alarm_target} ${alarm_name}
- Example: svr_ip=xxx.xxx.xxx.xxx:svr_port=2882 the arbitration service is unavailable
- Alert Details
- Template: Cluster: Quorum service ${svr_ip}:${svr_port} unavailable.
- Sample: Cluster: The arbitration service xxx.xxx.xxx.xxx:2882 is unavailable.
- Alert Restore
- Template: Alert: ${alarm_name}, Recoverable: ${recover_value}
- Sample: Alert: Arbitration Service Unavailable, Arbitration Service Available: 0
Impact on the system
When the arbitration service is unavailable, tenants that use arbitration cannot enjoy reliable high availability. When a tenant replica fails, tenants cannot ensure that they will continue to provide arbitration service.
Possible causes
- The host where the arbitration service is located fails.
- The arbitration server port is inaccessible from the outside because the firewall is enabled.
Solution
The system attempts to restart the observer process of the arbitration service if it unexpectedly exits. You can quickly respond to the alert by following the alert handling procedure. For more information, see Alert handling procedure.
The event was pulled only once within 30 minutes of its occurrence. Check if the host where the arbitration service is located is working properly. If the host is down, try to restart the host and restart the arbitration service on the OCP dashboard.
If the host is running normally, check whether the arbitration service process exists, whether the arbitration service port is listening normally, and whether the firewall blocks access from other IP addresses.