Description
By default, if the CPU utilization of an OBServer node remains higher than 90% for more than 5 minutes and the CPU utilization of an SQL statement on the OBServer node exceeds 15%, the system triggers an oas_anomaly_sql_from_anomaly_event_analysis_cpu_percent_high alert. Then, if the CPU time of an execution plan of the SQL statement in the last 24 hours is shorter than that of the current execution plan, the system determines that the SQL execution plan has deteriorated and triggers the oas_anomaly_sql_from_anomaly_event_analysis_plan_changed alert. The alerting criteria are the same as the diagnostic criteria for historical execution plan recommendation in the Optimization Suggestions section.
Principle
| Parameter | Value |
|---|---|
| Metric | None |
| Source | None |
| Collected metric | None |
| Metric expression | None |
| Collection cycle | None |
Alert rule
By default, this alert is triggered if the CPU utilization of an OBServer node remains higher than 90% for more than 5 minutes, the CPU utilization of an SQL statement on the OBServer node exceeds 15%, and a historical execution plan is way better than the current execution plan.
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| Based on monitored conditions | Critical | Cluster |
Alert templates
Details
Template:
The CPU utilization of the host ${svr_ip} in the ${ob_cluster_name} cluster reaches ${server_cpu_percent}% during the period from ${start_time} to ${end_time}. The execution plan of the SQL statement ${sql_id} in the ${db_name} database of the ${tenant_name} tenant has deteriorated. The CPU utilization of the SQL plan is ${sql_cpu_percent}%. The average CPU time is ${sql_cpu_time_ms} ms. The average response time is ${sql_elapsed_time_ms} ms. The hash value of the current execution plan is ${plan_hash_of_current_plan}. The hash value of the historical execution plan is ${plan_hash_of_recommend_plan}. The average CPU time of the historical execution plan is ${sql_baseline_cpu_time_ms} ms. Recommended action: [Refresh Plan Cache] [Bind Execution Plan] [Throttle].Example:
Overview: alarm_template_id=0:ob_cluster=admin-4:tenant_name=a****:db_name=test:sql_id=6CE3107B0430F8CA5AAE8B5094EC****:recommend_operations=FLUSHING_PLAN_CACHE,PLAN_BINDING,RATE_LIMIT. The CPU utilization of the host exceeds the threshold because the execution plan of the SQL statement has deteriorated. Period: 2023-09-05T01:57:17 to 2023-09-05T02:02:47 Symptom: The CPU utilization of the OBServer node xxx.xxx.xxx.xxx reaches 29.81%. Associated SQL statement: SQL ID: 6CE3107B0430F8CA5AAE8B5094EC****. Database: test. Tenant: a****. Hit rule: The execution plan deteriorates. Basic information of the SQL statement: The hash value of the current execution plan is 278667027174996****. The hash value of the historical execution plan is -510958339363226****. The average CPU time of the current execution plan is 139.42 ms. The average CPU time of the historical execution plan is 0.28 ms. Recommended action: [Refresh Plan Cache] [Bind Execution Plan] [Throttle].
Impact on the system
The CPU utilization of the OBServer node is affected.
Possible causes
Unrepresentative parameters are used when the SQL execution plan is generated. As a result, the SQL execution plan is changed and the overall SQL execution performance degrades, causing an increase of the CPU utilization.
Solutions
- Refresh the plan cache of the SQL statement to regenerate the SQL execution plan. To do so, you can execute the alert clearance plan. For more information, see Execute the alert clearance plan.
- If the first solution is ineffective, you can analyze the execution plan of the SQL statement and bind a historical execution plan or an index to the SQL statement.
- If none of the preceding solutions are effective, you can optimize the SQL statement or perform throttling based on the business assessment.