ob_tenant_cpu_usage_over_threshold

2025-09-08 08:15:43  Updated

Description

This alert is triggered when the CPU utilization of an OceanBase Database tenant exceeds the threshold. The threshold for this alert equals the percentage of the number of CPU cores occupied by the tenant to the number of CPU cores allocated to the tenant. Therefore, this alert is more accurate than the tenant_cpu_percent_over_threshold and ob_cpu_percent_over_threshold alerts, which are triggered by the CPU utilization based on the thread usage of a tenant.

OBServer and OceanBase Cloud Platform (OCP) must meet the following version requirements for this alert to work:

  1. OBServer: V3.2.3 BP10, V3.2.4 BP5, V4.1 BP3, V4.2.0, and later.
  2. OCP: V4.2.2 and later.

Principle

Parameter Value
Metric ob_tenant_cpu_percent
Source Query virtual tables:
  • Query statement for OceanBase Database tenants of versions earlier than V4.0: select /*+ MONITOR_AGENT READ_CONSISTENCY(WEAK) */ con_id tenant_id, stat_id, value from v$sysstat where stat_id IN (140005, 140013) and (con_id > 1000 or con_id = 1) and class < 1000.
  • Query statement for OceanBase Database tenants of version V4.0 and later: select /* MONITOR_AGENT */ con_id tenant_id, stat_id, value from v$sysstat where stat_id IN (140005, 140013) and (con_id > 1000 or con_id = 1) and class < 1000.
The CPU time is obtained from cpuacct.usage by the cgroup feature only when the cgroup feature is enabled on an OceanBase Database tenant of V4.x and later. In other cases, the CPU time is obtained from the /proc/<pid>/task/<tid>/stat directory. In OceanBase Database V4.x and later but earlier than V4.1 BP3, the CPU time can be obtained only when the cgroup feature is enabled.
Collected metric ob_sysstat:
  • 140005: max cpus, which is the number of CPU cores allocated to each tenant.
  • 140013: cpu time, which is the cumulative CPU time of all threads of each tenant, in microseconds.
Metric expression sum(rate(ob_sysstat{stat_id="140013"}[$__rate_interval])) by (ob_cluster_name,ob_cluster_id,obzone,tenant_name,ob_tenant_id,svr_ip)/sum(ob_sysstat{stat_id="140005"}) by (ob_cluster_name,ob_cluster_id,obzone,tenant_name,ob_tenant_id,svr_ip)/100
Collection cycle 60 seconds

Alert rule

Metric expression Metric description Default threshold Detection cycle Time before clearance
ob_tenant_cpu_percent > 95 The CPU utilization of the tenant. 95 10 seconds 5 minutes

Alert information

Trigger method Alert level Scope
Metric expression Critical Tenant

Alert templates

  • Alert overview

    • Template: ${alarm_target} ${alarm_name}
    • Example: alarm_template_id=0:ob_cluster=TEST-1:tenant_name=t1:host=xxx.xxx.xxx.xxx. The CPU utilization of the OceanBase Database tenant exceeeds the threshold.
  • Alert details

    • Template: Cluster: ${ob_cluster_name}. Tenant: ${tenant_name}. Alert: ${alarm_name}. The CPU utilization of the tenant is ${value_shown}, which exceeds ${alarm_threshold} %.
    • Example: Cluster: TEST. Tenant: t1. Alert: The CPU utilization of the OceanBase Database tenant exceeeds the threshold. The CPU utilization of the tenant is 96%, which exceeds 95%.
  • Alert recovery

    • Template: Alert: ${alarm_name}. OceanBase tenant CPU utilization: ${value_shown}
    • Example: Alert: OceanBase tenant CPU utilization exceeeds the threshold. OceanBase tenant CPU utilization: 90%

Impact on the system

High CPU utilization of a tenant may have the following impacts:

  1. Business responses are affected due to long SQL response time.
  2. SQL requests may time out, affecting the business.

Possible causes

  • The tenant may have slow SQL statements. You can analyze the top SQL statements with high percentage of CPU resources to locate specific SQL statements that cause high tenant CPU utilization.

  • The SQL performance degrades or the execution plan deteriorates. In this case, OCP generates the following alerts:

    • oas_anomaly_sql_from_sql_inspection_plan_changed
    • oas_anomaly_sql_from_sql_inspection_perf_degradation
    • oas_anomaly_sql_from_anomaly_event_analysis_plan_changed
    • oas_anomaly_sql_from_anomaly_event_analysis_perf_degradation
    • oas_anomaly_sql_from_anomaly_event_analysis_cpu_percent_high
  • The tenant has large transactions or large queries. You can view the performance monitoring information of the tenant to locate large transactions and large queries.

  • The business concurrency is high.

Suggested solutions

  1. Optimize SQL statements based on the optimization suggestions provided in the SQL diagnostics feature.
  2. Analyze CPU-consuming queries, such as large queries and large transactions in the business.
  3. Scale out the tenant if the CPU utilization is high due to high business concurrency.

Contact Us