os_tsar_traffic_overload|V4.3.5| docs|Distributed Database

os_tsar_traffic_overload

Last Updated：2025-03-26 07:47:21 Updated

Description

This alert is triggered when the bandwidth usage of the NIC eth.* exceeds 80%.

Parameter	Value
Metric	eth_net_traffic_usage, eth_net_bandwidth_mbps
Source	These metrics are basic host monitoring metrics collected by node_exporter.
Collected metric	node_network_receive_bytes_total, node_network_transmit_bytes_total, node_net_bandwidth_bps
Metric expression	eth_net_traffic_usage: 800 * (sum(rate(node_network_receive_bytes_total{device=~"eth.",@LABELS}[@INTERVAL])) by (@GBLABELS) + sum(rate(node_network_transmit_bytes_total{device=~"eth.",@LABELS}[@INTERVAL])) by (@GBLABELS)) / sum(node_net_bandwidth_bps{device=~"eth.",@LABELS}) by (@GBLABELS) eth_net_bandwidth_mbps: sum(node_net_bandwidth_bps{device=~"eth.",@LABELS}) by (@GBLABELS) / 1000000
Collection cycle	1 second

Metric expression	Metric description	Default threshold (unit: percentage)	Detection cycle	Elimination cycle
eth_net_traffic_usage > 80 and eth_net_bandwidth_mbps > 0	eth_net_traffic_usage: the proportion of bandwidth used for data receiving and transmission to the total bandwidth of the NIC eth.. eth_net_bandwidth_mbps: the total bandwidth of the NIC eth..	80%	10 seconds	5 minutes

Trigger method	Alert level	Scope
Based on the expression of the metric	Critical	Server

Overview
- Template: ${alarm_target} ${alarm_name}
- Example: svr_ip=xxx.xxx.xxx.xxx os_tsar_traffic_overload
Details
- Template: Cluster: ${ob_cluster_name}, host: ${svr_ip}, NIC:${device}, alert: ${alarm_name}, NIC bandwidth usage ${eth_net_traffic_usage_value_zh_cn} exceeds ${eth_net_traffic_usage_alarm_threshold}%, NIC bandwidth is ${eth_net_bandwidth_mbps_value_zh_cn}.
- Example: Cluster: obcluster, host: xxx.xxx.xxx.xxx, NIC:eno1, alert: os_tsar_traffic_overload, NIC bandwidth usage 91.15% exceeds 80%, NIC bandwidth is 300 Mbit/s.

When the NIC is exhausted, system resources such as CPU and I/O resources may be fully occupied, affecting system stability.

Throttling is not enabled for OBServers in the reconfirm phase, and this causes the gigabit NICs to work at full capacity.

During a leader switchover, the leader needs to pull logs from the followers for reconfirmation. If throttling is not enabled and the followers use gigabit NICs, two followers will send logs to the leader concurrently and the NIC of the leader will work at full capacity.
A large number of network requests exist in other network bandwidth-consuming scenarios.