ob_host_cpu_percent_over_threshold CPU usage on the server exceeds the threshold|V4.3.6| docs|Distributed Database

ob_host_cpu_percent_over_threshold CPU usage on the server exceeds the threshold

Last Updated：2025-09-08 08:15:43 Updated

Alert description

This alert is triggered when the CPU usage of an OBServer node exceeds the threshold.

Alerting principle

The following table describes the key parameters involved in the alerting monitoring logic.

Parameter	Value
Monitoring metric	ob_host_cpu_percent
Data source	Collected by the node_exporter process
Collected metric	node_cpu_seconds_total
Monitoring expression	100 * (1 - sum(rate(node_cpu_seconds_total{mode="idle", @LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / sum(rate(node_cpu_seconds_total{@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)) The monitoring expression uses LABELS to differentiate data, and the following LABELS are included: user idle system iowait irq nice softirq steal Here, idle indicates the time the CPU spends in an idle state, and the sum of the other states indicates the CPU usage.
Collection interval	1 second

The value of the monitoring metric ob_host_cpu_percent indicates the CPU usage of the server where the OBServer node is located. An alert is triggered when the usage exceeds the threshold (100% by default).

Rule Information

Monitoring Metric	Default Threshold (Unit: %)	Duration	Detection Cycle	Elimination Cycle
ob_host_cpu_percent	100	60 seconds	60 seconds	5 minutes

Alert Information

Alert Trigger Method	Alert Level	Scope
Expression based on monitoring metrics	Critical	Server

Alert Template

Alert Overview
- Template: ${alarm_target} ${alarm_name}
- Example: ob_cluster=obcluster-1:svr_ip=xxx.xxx.xxx.xxx Server CPU usage exceeds the limit
Alert Details
- Template: Cluster: ${ob_cluster_name}, Host: ${host}, Alert: CPU usage ${value_shown} exceeds ${alarm_threshold} %.
- Example: Cluster: obcluster-1, Host: xxx.xxx.xxx.xxx, Alert: Server CPU usage exceeds the limit, CPU usage 101.0 % exceeds 100.0 %.
Alert Recovery
- Template: Alert: ${alarm_name}, Server CPU usage: ${value_shown}
- Example: Alert: Server CPU usage exceeds the limit, Server CPU usage: 85 %

Impact on the system

A sudden surge in CPU usage has a minimal impact on the system, but prolonged high CPU usage can lead to a decrease in system throughput and increased request latency.

Possible causes

This issue commonly occurs in the following scenarios:

The OBServer node is executing complex SQL queries.
Other programs running on the host are consuming excessive CPU resources.

Solution

Verify whether the high CPU usage is caused by the observer process.

Run the top command on the OBServer node that triggered the alert to identify the process consuming excessive CPU resources.
- If it is the observer process, it may also trigger the following alerts:
  - ob_cpu_percent_over_threshold CPU usage exceeds the threshold in OB statistics
  - tenant_cpu_percent_over_threshold CPU usage exceeds the threshold in OB tenants
    
    High CPU usage in the observer process can be caused by complex SQL queries executed on the OBServer node, which may trigger both this alert and the one mentioned above.
  First, refer to the documentation to resolve the above alerts, and then check if this alert continues to be triggered.
  - If it is triggered, proceed to the next step.
  - If it is not triggered, the issue has been resolved.
- If it is another process, proceed to the next step.
The high CPU usage may be caused by another process.

Contact the DBA or an O&M engineer. If there are processes that are not essential for business operations, they can be shut down.