ob_cpu_percent_over_threshold

2025-03-28 08:05:55  Updated

Description

This alert is triggered when the CPU utilization of observer processes exceeds the threshold.

Principle

The following table describes the key parameters that are involved in the monitoring and alerting logic.

Parameter Value
Metric ob_cpu_percent
Source select /*+read_consistency(weak)*/ tenant_name, tenant_id, stat_id, value from v$sysstat, __all_tenant where stat_id IN (140005, 140006) and (con_id > 1000 or con_id = 1) and __all_tenant.tenant_id = v$sysstat.con_id;
Note
The value of the value field is used as the value of the collected metric, and the values of other fields are used as labels.
Collected metric sysstat_value
Metric expression 100 * sum(sysstat_value{metric_group="sysstat",stat_id="140006",@LABELS}) by (@GBLABELS) / sum(sysstat_value{metric_group="sysstat",stat_id="140005",@LABELS}) by (@GBLABELS)
Collection cycle 1 second

The value of the metric ob_cpu_percent indicates the CPU utilization of the observer processes in the cluster. When this value exceeds the threshold, this alert is triggered. The default threshold is 90%.

Note

  • When the statistical event ID (stat_id) is 140005, the maximum number of available CPU cores of observer processes in the cluster is collected.
  • When the stat_id is 140006, the number of CPU cores used by observer processes in the cluster is collected.

The ratio of the sum of the values of the value field when stat_id is 140006 to the sum of the values of the value field when stat_id is 140005 is used as the CPU utilization of the observer processes in the cluster.

Alert rule

Metric Default threshold (unit: %) Duration Detection cycle Time before clearance
ob_cpu_percent 90 60 seconds 60 seconds 5 minutes

Alert information

Trigger method Alert level Scope
Metric expression Critical Server

Alert templates

  • Overview: ${alarm_target} ${alarm_name}

  • Details: ${alarm_target} ${alarm_name}. The CPU utilization of observer processes is ${value}%, exceeding the threshold of ${alarm_threshold}%.

  • Overview example: ob_cluster=C1-1000:svr_ip=xxx.xxx.xxx.xxx. The CPU utilization of observer processes exceeds the threshold.

  • Details example: ob_cluster=C1-1000:svr_ip=xxx.xxx.xxx.xxx. The CPU utilization of observer processes is 91.0%, exceeding the threshold of 90.0%.

Impact on the system

High CPU utilization reduces the system throughput and increases the request latency.

A transient increase in CPU utilization does not generate a great impact. If the CPU utilization remains high for a long time, it must be handled.

Possible cause

This problem is commonly found during complex SQL statement execution.

Suggested solutions

  1. Check whether the tenant_cpu_percent_over_threshold alert is triggered.

  2. Check whether the load of observer processes simultaneously increases in multiple tenants. The load of observer processes running on the same OBServer stacks. The alert is triggered when the total load exceeds the threshold.

    To reduce high CPU utilization caused by simultaneous load increase in multiple tenants, perform the following steps:

Contact Us