os_tsar_traffic_drop

2023-08-22 02:46:05  Updated

Description

This alert is triggered when the average number of network packet losses per second exceeds 1 for 30 consecutive seconds.

Principle

Parameter Value
Metric net_receive_drop_count
Source This metric is a basic host monitoring metric collected by node_exporter.
Collected metric node_network_receive_drop_total
Metric expression sum(rate(node_network_receive_drop_total{@LABELS}[@INTERVAL])) by (@GBLABELS)
Collection cycle 1 second

Alert rule

Metric expression Metric description Default threshold Detection cycle Elimination cycle
net_receive_drop_count > 1 The number of network packet losses per unit time. 1 10 seconds 5 minutes

Alert information

Trigger method Alert level Scope
Based on the expression of the metric Critical Server

Alert templates

  • Overview
    • Template: ${alarm_target} ${alarm_name}
    • Example: svr_ip=xxx.xxx.xxx.xxx os_tsar_traffic_drop
  • Details
    • Template: Cluster: ${ob_cluster_name}, host: ${host}, alert: ${alarm_name}, average packet losses ${value_shown} exceeds ${alarm_threshold}.
    • Example: Cluster: obcluster, host: xxx.xxx.xxx.xxx, alert: os_tsar_traffic_drop, average packet losses 5.2 exceeds 1.

Impact on the system

A few network packet losses are normal. However, frequent packet losses will affect system stability. When TCP packets are lost due to timeout, the packets will be retransmitted. As the network packet loss rate increases, the transmission rate decreases exponentially.

Possible causes

Network packet loss may be caused by many factors. The possible causes are as follows:

  • Packets are blocked by the firewall.

  • The ring buffer overflows.

  • The system load is high.

Suggested solutions

View other monitoring metrics of the host, such as the network interface card (NIC) traffic, load, memory usage, and CPU utilization. The network packet loss may be related to the system performance stress if it is high.

  1. View the system load. High CPU utilization, high memory usage, and high I/O load may all lead to network packet loss. For example, CPU utilization is high and there are insufficient CPU resources for processing packets, thereby causing packet loss on the NIC or socket buffer; memory usage is high and the application cannot process packets in a timely manner; I/O load is high, which makes the CPU very busy and unable to process network packets. You need to check the causes of the high system load and resolve the problem of insufficient resources.

  2. Run ifconfig to view the cause.

    • drop: Network packets are dropped due to insufficient memory.
    • overrun: The CPU cannot process packets in a timely manner and the ring buffer overflows, thereby causing packet loss. You can add buffers to resolve the problem.
  3. View the iptables rules. Check whether iptables blocking rules exist in /etc/sysconfig/iptables.

Contact Us