Description
This alert is triggered when the TCP retransmission rate of the OBServer is too high.
Principle
The following table describes the key parameters that are involved in the monitoring and alerting logic.
| Parameter | Value |
|---|---|
| Metric | ob_host_tcp_retrans_percent |
| Source | tsar --check --tcp -s retran | awk -F '=' '{print $2}' |
| Collected metric | tcp_retrans |
| Metric expression | max(tcp_retrans{@LABELS}) by (@GBLABELS) |
| Collection cycle | 1 second |
Note
The metric source of this alert is special. In each collection cycle , the OCP-Agent collected runs the preceding command to collect the metric data.
The value of the metric ob_host_tcp_retrans_percent indicates the TCP retransmission rate of the OBServer. When this value is greater than the threshold, this alert is triggered. The default threshold is 10%.
Alert rule
| Metric | Default threshold (unit: %) | Duration | Detection cycle | Time before clearance |
|---|---|---|---|---|
| ob_host_tcp_retrans_percent | 10 | 0 seconds | 60 seconds | 5 minutes |
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| Metric expression | Critical | Server |
Alert templates
Overview: ${alarm_target} ${alarm_name}
Details: Cluster: ${ob_cluster_name}, Host: ${host}, Alert: ${alarm_name}. The TCP retransmission rate is ${value}%, exceeding the threshold of ${alarm_threshold}%.
Overview example: ob_cluster=C1-1000:svr_ip=xxx.xxx.xxx.xxx. The TCP retransmission rate of the OBServer exceeds the threshold.
Details example: Cluster: obcluster-1, Host: xxx.xxx.xxx.xxx, Alert: The TCP retransmission rate of the OBServer exceeds the threshold. The TCP retransmission rate is 11.0%, exceeding the threshold of 10.0%.
Impact on the system
An unstable network may cause OBServer features to be unavailable. For example, SQL statements cannot be executed.
Possible causes
This problem is commonly found when the network interface cards (NICs) of the OBServer are unstable or a network communication error occurred. In the former case, unstable NICs may be caused by frequent switchover between primary and secondary NICs.
Suggested solutions
Contact the network engineer to troubleshoot NIC and network problems.
If no network engineer is available, you can troubleshoot and resolve the issue on your own. For more information, see Network troubleshooting.