Description
This alert is triggered when the average number of network packet losses per second exceeds 1 for 30 consecutive seconds.
Principle
| Parameter | Value |
|---|---|
| Metric | net_receive_drop_count |
| Source | This metric is a basic host monitoring metric collected by node_exporter. |
| Collected metric | node_network_receive_drop_total |
| Metric expression | sum(rate(node_network_receive_drop_total{@LABELS}[@INTERVAL])) by (@GBLABELS) |
| Collection cycle | 1 second |
Alert rule
| Metric expression | Metric description | Default threshold | Detection cycle | Elimination cycle |
|---|---|---|---|---|
| net_receive_drop_count > 1 | The number of network packet losses per unit time. | 1 | 10 seconds | 5 minutes |
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| Based on the expression of the metric | Critical | Server |
Alert templates
- Overview
- Template: ${alarm_target} ${alarm_name}
- Example: svr_ip=xxx.xxx.xxx.xxx os_tsar_traffic_drop
- Details
- Template: Cluster: ${ob_cluster_name}, host: ${host}, alert: ${alarm_name}, average packet losses ${value_shown} exceeds ${alarm_threshold}.
- Example: Cluster: obcluster, host: xxx.xxx.xxx.xxx, alert: os_tsar_traffic_drop, average packet losses 5.2 exceeds 1.
Impact on the system
A few network packet losses are normal. However, frequent packet losses will affect system stability. When TCP packets are lost due to timeout, the packets will be retransmitted. As the network packet loss rate increases, the transmission rate decreases exponentially.
Possible causes
Network packet loss may be caused by many factors. The possible causes are as follows:
Packets are blocked by the firewall.
The ring buffer overflows.
The system load is high.
Suggested solutions
View other monitoring metrics of the host, such as the network interface card (NIC) traffic, load, memory usage, and CPU utilization. The network packet loss may be related to the system performance stress if it is high.
View the system load. High CPU utilization, high memory usage, and high I/O load may all lead to network packet loss. For example, CPU utilization is high and there are insufficient CPU resources for processing packets, thereby causing packet loss on the NIC or socket buffer; memory usage is high and the application cannot process packets in a timely manner; I/O load is high, which makes the CPU very busy and unable to process network packets. You need to check the causes of the high system load and resolve the problem of insufficient resources.
Run
ifconfigto view the cause.- drop: Network packets are dropped due to insufficient memory.
- overrun: The CPU cannot process packets in a timely manner and the ring buffer overflows, thereby causing packet loss. You can add buffers to resolve the problem.
View the iptables rules. Check whether iptables blocking rules exist in /etc/sysconfig/iptables.