Description
This alert is triggered when the bandwidth usage of the NIC eth.* exceeds 80%.
Principle
| Parameter | Value |
|---|---|
| Metric | eth_net_traffic_usage, eth_net_bandwidth_mbps |
| Source | These metrics are basic host monitoring metrics collected by node_exporter. |
| Collected metric | node_network_receive_bytes_total, node_network_transmit_bytes_total, node_net_bandwidth_bps |
| Metric expression | |
| Collection cycle | 1 second |
Alert rule
| Metric expression | Metric description | Default threshold (unit: percentage) | Detection cycle | Elimination cycle |
|---|---|---|---|---|
| eth_net_traffic_usage > 80 and eth_net_bandwidth_mbps > 0 | 80% | 10 seconds | 5 minutes |
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| Based on the expression of the metric | Critical | Server |
Alert templates
- Overview
- Template: ${alarm_target} ${alarm_name}
- Example: svr_ip=xxx.xxx.xxx.xxx os_tsar_traffic_overload
- Details
- Template: Cluster: ${ob_cluster_name}, host: ${svr_ip}, NIC:${device}, alert: ${alarm_name}, NIC bandwidth usage ${eth_net_traffic_usage_value_zh_cn} exceeds ${eth_net_traffic_usage_alarm_threshold}%, NIC bandwidth is ${eth_net_bandwidth_mbps_value_zh_cn}.
- Example: Cluster: obcluster, host: xxx.xxx.xxx.xxx, NIC:eno1, alert: os_tsar_traffic_overload, NIC bandwidth usage 91.15% exceeds 80%, NIC bandwidth is 300 Mbit/s.
Impact on the system
When the NIC is exhausted, system resources such as CPU and I/O resources may be fully occupied, affecting system stability.
Possible causes
Throttling is not enabled for OBServers in the reconfirm phase, and this causes the gigabit NICs to work at full capacity.
During a leader switchover, the leader needs to pull logs from the followers for reconfirmation. If throttling is not enabled and the followers use gigabit NICs, two followers will send logs to the leader concurrently and the NIC of the leader will work at full capacity.
A large number of network requests exist in other network bandwidth-consuming scenarios.
Suggested solutions
View the network traffic statistics.
## Refresh the statistics every 5 seconds. sar -n DEV 5Find the device where the NIC shows the most frequent traffic changes. Frequent traffic changes indicate a high level of traffic.
Run
iftopon this device to find processes with high network traffic.Check whether the processes with high network traffic are as expected. If the observer process accounts for a high proportion of network traffic, contact Technical Support.