Alert description
This alert indicates that the arbitration service node cannot connect to its corresponding observer.
Alert principle
The following table lists the key parameters involved in the monitoring logic of this alert.
Parameter |
Value |
|---|---|
| Monitoring Metrics | N/A |
| Monitoring Expression | {LOG_LEVEL=EDIAG,KWORD=arb active keepalive probe_connectivity,DURATION=0 SECONDS,ALERT_LEVEL=DOWN} |
| Metric Collection | arbitration_log |
| Metric Source | Arbitration service logs, for example,/home/admin/oceanbase/log/observer.log. |
| Detection Cycle (ms) | 500 |
Rule information
Monitoring Metrics |
Default Threshold |
Duration |
Detection Cycle |
Elimination Cycle |
|---|---|---|---|---|
| N/A | N/A | 0 Seconds | The agent reads the file every 500 ms, and OCP generates an alert within one minute. | 5 Minutes |
Alert information
Alert Trigger Method |
Alert Level |
Scope |
|---|---|---|
| Log Matching Rules | Downtime | All Arbitration Services |
Alert template
Alert Overview
- Template: ${alarm_target} ${alarm_name}
- Example: alarm_template_id=0:address=ip:port:host=ip:server_type:arbitration_observer:error_code=-1:keyword=arbactive keepalive probe_connectivity The Arbitration node cannot connect to the observer.
Alert details
- Template: Alert:[${alarm_name}] Arbitration service instance: ${address}, host: ${host}, log type: ${server_type}, log file: ${filename}, log level: ${log_level}, keyword=${keyword}, error code=${error_code}, log details=${error_message}.
- Example: Alert: [Arbitration Node Cannot Connect Observer] Arbitration Service Instance: xx.xx.xx.xx:2889, Host: xx.xx.xx.xx, Log Type: arbitration_observer, Log File: /home/admin/oceanbase/log/observer.log, Log Level: EDIAG, Keywords=arb active keepalive probe_connectivity, Error Code=4124, Log Details=[2026-04-24 11:05:15.681142] EDIAG [CLOG] probe_once_ (ob_log_active_keep_alive.cpp:276) [117128][TimerWK0_ArbServerTimer][C0][T0][Y0-0000000000000000-0-0] [lt=12][errcode=-4124] active keepalive probe_connectivity blacklisted for new connections(tmp_ret=-4124, tmp_ret="OB_CONNECT_ERROR", dst="xx.xx.xx.xx:2889", in_blacklist=true) BACKTRACE:0x346b97db 0x3448c8a2 0x106b51b7 0x106b4c19 0x106b49f5 0x106b47e6 0x13717054 0x13714007 0x32d3517e 0x33d4110c 0x1379a3f0 0x3469f36f 0x346b37a6 0x150907d9d3fb 0x150907bf3e83
Alert recovery
- Template: Alert:[${alarm_name}] Arbitration service instance: ${address}, host: ${host}, log type: ${server_type}, log file: ${filename}, log level: ${log_level}, keyword=${keyword}, error code=${error_code}, log details=${error_message}.
- Example: Alert: [Arbitration Node Cannot Connect Observer] Arbitration Service Instance: xx.xx.xx.xx:2889, Host: xx.xx.xx.xx, Log Type: arbitration_observer, Log File: /home/admin/oceanbase/log/observer.log, Log Level: EDIAG, Keywords=arb active keepalive probe_connectivity, Error Code=4124, Log Details=[2026-04-24 11:05:15.681142] EDIAG [CLOG] probe_once_ (ob_log_active_keep_alive.cpp:276) [117128][TimerWK0_ArbServerTimer][C0][T0][Y0-0000000000000000-0-0] [lt=12][errcode=-4124] active keepalive probe_connectivity blacklisted for new connections(tmp_ret=-4124, tmp_ret="OB_CONNECT_ERROR", dst="xx.xx.xx.xx:2889", in_blacklist=true) BACKTRACE:0x346b97db 0x3448c8a2 0x106b51b7 0x106b4c19 0x106b49f5 0x106b47e6 0x13717054 0x13714007 0x32d3517e 0x33d4110c 0x1379a3f0 0x3469f36f 0x346b37a6 0x150907d9d3fb 0x150907bf3e83
Impact on the system
The network isolation between the arbitration replica and followers may prevent election when the leader fails, compromising the high availability of 2F1A/4F1A configurations. A failure will directly impact system services.
Possible causes
- Hardware failure.
- Configuration faults of firewalls, iptables, and so on.
- Router or gateway configuration is incorrect.
Solution
Identify the faulty object and the failed target. Based on the
address,host,error_codein the alert details, and thedst="ip:port"in the logs, identify the specific arbitration service instance with a fault and the observer node that cannot be connected. Ifin_blacklist=trueappears in the logs, it indicates that this observer has been marked by the arbitration side as a target for "new connection failure". You need to focus your troubleshooting on this target node and its link.grep -n "arb active keepalive probe_connectivity" /home/admin/oceanbase/log/observer.log*Check the target observer process and listening port. Log in to the target observer node specified in the alert, confirm that the observer process is running normally, and verify that the target port is in a listening state in the logs. If the observer has exited, is stuck, or the port is not listening, restore the observer service first.
ps -ef | grep observer ss -lntp | grep <target port in the alert log>Verify the network connectivity between the arbitration node and the observer node. On the arbitration node, perform a connectivity test on the
dstaddress and port specified in the log to identify the cause of the connection failure, such as host unreachability, port unreachability, or link jitter. Ifpingis successful but TCP connection fails, it usually indicates issues with the firewall, security group, port listening, or routing policy.ping <observer_ip> nc -vz <observer_ip> <observer_port> traceroute <observer_ip>Check the host and network configurations. Focus on verifying the firewall, iptables/nftables, switch ACLs, security groups, routing, and gateway configurations between the arbitration node and the observer node to confirm whether there is any interception when the arbitration node accesses the observer's RPC port. Also, check whether the bound address or RPC port configuration of the target observer node has changed, to avoid situations where the service is started but does not listen on the target address or port as indicated in the alert.
Verify that the alert is resolved after recovery. After recovery, re-execute the TCP connectivity test on the arbitration node and continuously monitor the arbitration service logs to confirm that the keywords "arb active keepalive probe_connectivity failed" or "blacklisted for new connections" no longer appear. If the log level permits, successful logs such as "probe connect ok" should be visible after the alert resolves.
If the error persists after 1 to 2 minutes, continue collecting the
observer.logfiles from both nodes, network policy configurations, and link probing results for further analysis.grep -E "probe connect ok|probe_connectivity" /home/admin/oceanbase/log/observer.log*Seek further assistance if necessary. If the observer service is normal, port listening is normal, and TCP probing from the arbitration node to the observer node still fails, it is recommended to collect the
observer.logfiles, system logs, network policies, and link probing results from both ends at the time of the failure. Contact the network or system administrator to further investigate potential issues such as hardware faults, switch/gateway abnormalities, or cross-subnet policy blocking.
