Description
This alert is triggered when the CPU utilization of an OceanBase Database tenant exceeds the threshold. The threshold for this alert equals the percentage of the number of CPU cores occupied by the tenant to the number of CPU cores allocated to the tenant. Therefore, this alert is more accurate than the tenant_cpu_percent_over_threshold and ob_cpu_percent_over_threshold alerts, which are triggered by the CPU utilization based on the thread usage of a tenant.
OBServer and OceanBase Cloud Platform (OCP) must meet the following version requirements for this alert to work:
- OBServer: V3.2.3 BP10, V3.2.4 BP5, V4.1 BP3, V4.2.0, and later.
- OCP: V4.2.2 and later.
Principle
| Parameter | Value |
|---|---|
| Metric | ob_tenant_cpu_percent |
| Source | Query virtual tables:
/proc/<pid>/task/<tid>/stat directory. In OceanBase Database V4.x and later but earlier than V4.1 BP3, the CPU time can be obtained only when the cgroup feature is enabled. |
| Collected metric | ob_sysstat:
|
| Metric expression | sum(rate(ob_sysstat{stat_id="140013"}[$__rate_interval])) by (ob_cluster_name,ob_cluster_id,obzone,tenant_name,ob_tenant_id,svr_ip)/sum(ob_sysstat{stat_id="140005"}) by (ob_cluster_name,ob_cluster_id,obzone,tenant_name,ob_tenant_id,svr_ip)/100 |
| Collection cycle | 60 seconds |
Alert rule
| Metric expression | Metric description | Default threshold | Detection cycle | Time before clearance |
|---|---|---|---|---|
| ob_tenant_cpu_percent > 95 | The CPU utilization of the tenant. | 95 | 10 seconds | 5 minutes |
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| Metric expression | Critical | Tenant |
Alert templates
Overview
- Template: ${alarm_target} ${alarm_name}
- Example: alarm_template_id=0:ob_cluster=TEST-1:tenant_name=t1:host=xxx.xxx.xxx.xxx. The CPU utilization of the OceanBase Database tenant exceeeds the threshold.
Details
- Template: Cluster: ${ob_cluster_name}. Tenant: ${tenant_name}. Alert: ${alarm_name}. The CPU utilization of the tenant is ${value_shown}, which exceeds ${alarm_threshold} %.
- Example: Cluster: TEST. Tenant: t1. Alert: The CPU utilization of the OceanBase Database tenant exceeeds the threshold. The CPU utilization of the tenant is 96%, which exceeds 95%.
Impact on the system
High CPU utilization of a tenant may have the following impacts:
- Business responses are affected due to long SQL response time.
- SQL requests may time out, affecting the business.
Possible causes
The tenant may have slow SQL statements. You can analyze the top SQL statements with high percentage of CPU resources to locate specific SQL statements that cause high tenant CPU utilization.
The SQL performance degrades or the execution plan deteriorates. In this case, OCP generates the following alerts:
- oas_anomaly_sql_from_sql_inspection_plan_changed
- oas_anomaly_sql_from_sql_inspection_perf_degradation
- oas_anomaly_sql_from_anomaly_event_analysis_plan_changed
- oas_anomaly_sql_from_anomaly_event_analysis_perf_degradation
- oas_anomaly_sql_from_anomaly_event_analysis_cpu_percent_high
The tenant has large transactions or large queries. You can view the performance monitoring information of the tenant to locate large transactions and large queries.
The business concurrency is high.
Suggested solutions
- Optimize SQL statements based on the optimization suggestions provided in the SQL diagnostics feature.
- Analyze CPU-consuming queries, such as large queries and large transactions in the business.
- Scale out the tenant if the CPU utilization is high due to high business concurrency.