tenant_cpu_percent_over_threshold |V3.1.1|OceanBase Cloud Platform| docs|Distributed Database

tenant_cpu_percent_over_threshold

Last Updated：2023-08-22 02:51:01 Updated

Description

This alert is triggered when the CPU utilization of an OceanBase tenant exceeds the threshold.

The CPU utilization of an OceanBase tenant = The CPU resources used by the tenant/Total CPU resources of all units assigned to the tenant in the current primary zone

Principle

The following table describes the key parameters that are involved in the monitoring and alerting logic.

Parameter	Value
Metric	ob_cpu_percent Note: The metric indicates the percentage of CPU usage by a tenant in the cluster. By default, when the percentage is greater than 95%, this alert is triggered.
Source	SQL: `select /+read_consistency(weak)/ tenant_name, tenant_id, stat_id, value from v$sysstat, __all_tenant where stat_id IN (140006, 140005) and (con_id > 1000 or con_id = 1) and __all_tenant.tenant_id = v$sysstat.con_id;` Note: The value of the value field is used as the value of sysstat_value, and the values of other fields are used as labels.
Collected metric	sysstat_value
Metric expression	100 * sum(sysstat_value{metric_group="sysstat",stat_id="140006",@LABELS}) by (@GBLABELS) / sum(sysstat_value{metric_group="sysstat",stat_id="140005",@LABELS}) by (@GBLABELS) Note: When the statistical event ID (stat_id) is 140005, the maximum number of available CPU cores of observer processes in the tenant is collected. When the stat_id is 140006, the number of CPU cores used by observer processes in the tenant is collected. The ratio of the sum of the values of the value field when stat_id is 140006 to the sum of the values of the value field when stat_id is 140005 is used as the CPU utilization of a tenant.
Collection cycle	1 second

Alert information

Trigger method	Alert level	Scope
Metric expression	Warning	Tenant

Alert rule

Metric	Default threshold (unit: %)	Duration	Detection cycle	Time before clearance
ob_cpu_percent{app="OB"}	95	0 seconds	60 seconds	5 minutes

Alert templates

Overview: ${alarm_target} ${alarm_name}
Details: ${alarm_target} ${alarm_name}. The CPU utilization is ${value}%, exceeding the threshold of ${alarm_threshold}%.
Overview example: ob_cluster=C1-1000:tenant_name=tenant-1:svr_ip=xxx.xxx.xxx.xxx. The CPU utilization of the OceanBase tenant exceeds the threshold.
Details example: ob_cluster=C1-1000:tenant_name=tenant-1:svr_ip=xxx.xxx.xxx.xxx. The CPU utilization of the OceanBase tenant reaches 96.0%, exceeding the threshold of 95.0%.

Impact on the system

No impact

Possible causes

This problem is commonly found in the following scenarios:

The application queries a large amount of data or generates hotspot data.
The resource plan of a tenant cannot cope with business requirements or hotspot data is generated.

Community Edition

Enterprise Edition