oas_anomaly_sql_from_anomaly_event_analysis_perf_degradation|V4.3.3| docs|Distributed Database

oas_anomaly_sql_from_anomaly_event_analysis_perf_degradation

Last Updated：2025-01-10 06:15:54 Updated

Description

By default, if the CPU utilization of an OBServer node remains higher than 90% for more than 5 minutes and the CPU utilization of an SQL statement on the OBServer node exceeds 15%, the system triggers an oas_anomaly_sql_from_anomaly_event_analysis_cpu_percent_high alert. Then, OceanBase Cloud Platform (OCP) analyzes the CPU time trends of the SQL statement in the last 24 hours. If the CPU time of the SQL statement is higher than the lowest value in the last 24 hours, the system determines that the execution performance of the SQL statement has degraded and triggers the oas_anomaly_sql_from_anomaly_event_analysis_perf_degradation alert.

Principle

Parameter	Value
Metric	None
Source	None
Collected metric	None
Metric expression	None
Collection cycle	None

Alert rule

By default, this alert is triggered if the CPU utilization of an OBServer node remains higher than 90% for more than 5 minutes, the CPU utilization of an SQL statement on the OBServer node exceeds 15%, and the CPU time of the SQL statement is higher than the lowest value in the last 24 hours. The alerting criteria are the same as the diagnostic criteria for the diagnostic result Deteriorated performance on the Suspicious SQL tab.

Alert information

Trigger method	Alert level	Scope
Based on monitored conditions	Critical	Cluster

Alert templates

Details

Template:

The CPU utilization of the host ${svr_ip} in the ${ob_cluster_name} cluster reaches ${server_cpu_percent}% during the period from ${start_time} to ${end_time}. The execution performance of the SQL statement ${sql_id} in the ${db_name} database of the ${tenant_name} tenant has degraded. The CPU utilization of the SQL statement is ${sql_cpu_percent}%. The average CPU time is ${sql_cpu_time_ms} ms. The average response time is ${sql_elapsed_time_ms} ms. The baseline period is from ${baseline_start_time} to ${baseline_end_time}. During the baseline period, the average CPU time is ${sql_baseline_cpu_time_ms} ms, and the average response time is ${sql_baseline_elapsed_time_ms} ms. Recommended action: [Refresh Plan Cache] [Throttle].

Example:

Overview: alarm_template_id=0:ob_cluster=admin-4:tenant_name=a****:db_name=test:sql_id=6CE3107B0430F8CA5AAE8B5094EC****:recommend_operations=FLUSHING_PLAN_CACHE,RATE_LIMIT. The CPU utilization of the host exceeds the threshold because the execution performance of the SQL statement has degraded.
Period: 2023-09-05T23:51:17 to 2023-09-05T23:56:47
Symptom: The CPU utilization of the OBServer node xxx.xxx.xxx.xxx reaches 22.09%.
Associated SQL statement: SQL ID: 6CE3107B0430F8CA5AAE8B5094EC****. Database: test. Tenant: a****.
Hit rule: The execution performance degrades.
Basic information of the SQL statement: The CPU utilization of the SQL statement is 15.01%. The average CPU time is 77.29 ms. The average response time is 77.36 ms.
Baseline period: 2023-09-04T23:55:47 to 2023-09-04T23:56:17
Information about the baseline SQL statements: The average CPU time is 0.19 ms. The average response time is 0.26 ms.
Recommended action: [Refresh Plan Cache] [Throttle].

Impact on the system

The CPU utilization of the OBServer node is affected.

Possible causes

The SQL statement is poorly or frequently executed.
The volume of business data increases or the leader role is switched.
The SQL execution plan has changed.

Solutions

Refresh the plan cache. To do so, you can execute the alert clearance plan. For more information, see Execute the alert clearance plan.
Optimize the SQL statement.
Perform throttling based on the business assessment.