Alert description
This alert monitors whether the TCP retransmission rate of an OBServer node is excessively high and triggers an alert if it is.
Alert principle
The following table describes the key parameters involved in the monitoring logic of this alert.
| Parameter | Value |
|---|---|
| Monitoring metric | ob_host_tcp_retrans_percent |
| Metric source | tsar --check --tcp -s retran | awk -F '=' '{print $2}' |
| Collected metric | tcp_retrans |
| Monitoring expression | max(tcp_retrans{@LABELS}) by (@GBLABELS) |
| Collection interval | 1 second |
Note
The metric source is different from that of alerts triggered by other expressions. It is collected by OCP-Agent every collection interval by executing the preceding command.
The value of the ob_host_tcp_retrans_percent metric indicates the TCP retransmission rate of the OBServer node. If the rate exceeds the threshold (which is 10% by default), an alert is triggered.
Rule Information
| Monitoring metric | Default threshold (unit: %) | Duration | Detection interval | Elimination interval |
|---|---|---|---|---|
| ob_host_tcp_retrans_percent | 10 | 0 seconds | 60 seconds | 5 minutes |
Alert Information
| Alert trigger method | Alert level | Scope |
|---|---|---|
| Based on the monitoring metric expression | Severe | Server |
Alert Template
Alert Overview
- Template: ${alarm_target} ${alarm_name}
- Example: ob_cluster=obcluster-1:svr_ip=xxx.xxx.xxx.xxx Server TCP Retransmission Rate Exceeded
Alert Details
- Template: Cluster: ${ob_cluster_name}, Host: ${host}, Alert: ${alarm_name}. TCP Retransmission Rate ${value_shown} exceeds ${alarm_threshold} %.
- Example: Cluster: obcluster-1, Host: xxx.xxx.xxx.xxx, Alert: Server TCP Retransmission Rate Exceeded. TCP Retransmission Rate 11.0 % exceeds 10.0 %.
Alert Recovery
- Template: Alert: ${alarm_name}, Server TCP Retransmission Rate: ${value_shown}
- Example: Alert: Server TCP Retransmission Rate Exceeded, Server TCP Retransmission Rate: 5 %
Impact on the system
Unstable network connections may cause OBServer nodes to malfunction, such as SQL statements failing to execute.
Possible causes
Common causes include unstable network interfaces on OBServer nodes (such as frequent switching of primary and backup network interfaces) and network communication failures.
Solution
Contact a network engineer to troubleshoot network interface errors and network failures. If no network engineer is available, refer to Network Troubleshooting for network troubleshooting.