ob_cpu_percent_over_threshold|V4.3.1| docs|Distributed Database

ob_cpu_percent_over_threshold

Last Updated：2024-08-23 10:14:19 Updated

Description

This alert is triggered when the CPU utilization of observer processes exceeds the threshold.

Principle

The following table describes the key parameters that are involved in the monitoring and alerting logic.

Parameter	Value
Metric	ob_cpu_percent
Source	`select /+read_consistency(weak)/ tenant_name, tenant_id, stat_id, value from v$sysstat, __all_tenant where stat_id IN (140005, 140006) and (con_id > 1000 or con_id = 1) and __all_tenant.tenant_id = v$sysstat.con_id;` Note The value of the value field is used as the value of the collected metric, and the values of other fields are used as labels.
Collected metric	sysstat_value
Metric expression	100 * sum(sysstat_value{metric_group="sysstat",stat_id="140006",@LABELS}) by (@GBLABELS) / sum(sysstat_value{metric_group="sysstat",stat_id="140005",@LABELS}) by (@GBLABELS)
Collection cycle	1 second

The value of the metric ob_cpu_percent indicates the CPU utilization of the observer processes in the cluster. When this value exceeds the threshold, this alert is triggered. The default threshold is 90%.

Note

When the statistical event ID (stat_id) is 140005, the maximum number of available CPU cores of observer processes in the cluster is collected.
When the stat_id is 140006, the number of CPU cores used by observer processes in the cluster is collected.

The ratio of the sum of the values of the value field when stat_id is 140006 to the sum of the values of the value field when stat_id is 140005 is used as the CPU utilization of the observer processes in the cluster.

Alert rule

Metric	Default threshold (unit: %)	Duration	Detection cycle	Time before clearance
ob_cpu_percent	90	60 seconds	60 seconds	5 minutes

Alert information

Trigger method	Alert level	Scope
Metric expression	Critical	Server

Alert templates

Overview: ${alarm_target} ${alarm_name}
Details: ${alarm_target} ${alarm_name}. The CPU utilization of observer processes is ${value}%, exceeding the threshold of ${alarm_threshold}%.
Overview example: ob_cluster=C1-1000:svr_ip=xxx.xxx.xxx.xxx. The CPU utilization of observer processes exceeds the threshold.
Details example: ob_cluster=C1-1000:svr_ip=xxx.xxx.xxx.xxx. The CPU utilization of observer processes is 91.0%, exceeding the threshold of 90.0%.

Impact on the system

High CPU utilization reduces the system throughput and increases the request latency.

A transient increase in CPU utilization does not generate a great impact. If the CPU utilization remains high for a long time, it must be handled.

Possible cause

This problem is commonly found during complex SQL statement execution.