ob_host_net_recv_percent_over_threshold|V4.3.1| docs|Distributed Database

ob_host_net_recv_percent_over_threshold

Last Updated：2024-08-23 10:14:19 Updated

Description

This alert is triggered when the inbound bandwidth usage of the OBServer exceeds the threshold.

Principle

The following table describes the key parameters that are involved in the monitoring and alerting logic.

Parameter	Value
Metric	ob_host_net_send_percent
Source	bandwidth: The network bandwidth is collected by using the psutil.net_if_stats() function of the Python script. The collected data is converted to a value in bytes, where 1 Gbit/s = 1024 Mbit/s, 1 Mbit/s = 1024 Kbit/s, 1 Kbit/s = 1024 bit/s, and 1 byte = 8 bits. node_network_receive_bytes_total: The total amount of inbound data. The data is collected by using the node_exporter process and can be viewed on the local host at `http://localhost:63000/metrics`.
Collected metrics	node_network_receive_bytes_total and bandwidth
Metric expression	100 * max(sum(rate(node_network_receive_bytes_total{@LABELS}[@INTERVAL]) by (device,@GBLABELS)) by (device,@GBLABELS) / sum(bandwidth{@LABELS}) by (device,@GBLABELS)) by (@GBLABELS)
Collection cycle	1 second

Note

The metric source of this alert is special. The network bandwidth usage of the local host is monitored by the OCP-Agent and the data is collected by using the Python script and the exporter process. For more information, see the description in the Source row of the preceding table.

The value of the metric ob_host_net_send_percent indicates the inbound bandwidth usage of the OBServer. When this value exceeds the threshold, this alert is triggered. The default threshold is 80%.

Alert rule

Metric	Default threshold (unit: %)	Duration	Detection cycle	Time before clearance
ob_host_net_send_percent	80	120 seconds	60 seconds	5 minutes

Alert information

Trigger method	Alert level	Scope
Metric expression	Critical	Server

Alert templates

Overview: ${alarm_target} ${alarm_name}
Details: Cluster: ${ob_cluster_name}, Host: ${host}, Alert: ${alarm_name}. The inbound bandwidth usage of the OBServer is ${value}%, exceeding the threshold of ${alarm_threshold}%.
Overview example: ob_cluster=C1-1000:svr_ip=xxx.xxx.xxx.xxx. The inbound bandwidth usage of the OBServer exceeds the threshold.
Details example: Cluster: obcluster-1, Host: xxx.xxx.xxx.xxx, Alert: The inbound bandwidth usage of the OBServer exceeds the threshold. The inbound bandwidth usage of the OBServer is 81.0%, exceeding the threshold of 80.0%.

Impact on the system

When the network bandwidth is fully utilized, the performance of the OceanBase cluster is bottlenecked. When business traffic continuously increases, the performance of the OceanBase cluster cannot meet the business requirements.

Possible causes

This problem is commonly found in cases where the business traffic of the OceanBase cluster increases, for example, more and more SQL queries are received.