Description
A slow SQL statement is an SQL statement whose execution time exceeds 100 ms. The threshold for triggering this alert is specified by the OceanBase Cloud Platform (OCP) system parameter ocp.ob.slowsql.threshold. When a slow SQL statement exists in a tenant, an alert is triggered and displayed by SQL ID. This alert is disabled by default. You can enable it as needed.
You can set the system parameter ocp.alarm.datasource.slow-sql-exclude-obclusters to exclude clusters from slow SQL alerting. Separate the cluster names with commas (,).
Principle
| Parameter | Value |
|---|---|
| Metric | max_elapsed_time_ms |
| Source | SQL_AUDIT data (ob_hist_sql_audit_sample) collected by OCP-Agent: select max(elapsed_time) / 1000 as max_elapsed_time_ms from ob_hist_sql_audit_sample |
| Collected metric | max_elapsed_time_ms: The maximum time consumed for executing the SQL statement. |
| Metric expression | max(max_elapsed_time_ms) |
| Collection cycle | 60 seconds |
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| Based on the expression of the metric | Warning | Tenant |
Alert rule
| Metric | Default threshold (unit: ms) | Source | Detection cycle | Time before clearance |
|---|---|---|---|---|
| max_elapsed_time_ms | 100 | SQL collection | 60 seconds | 5 minutes |
Alert templates
Alert overview
Template: ${alarm_target} ${alarm_name}
Example: ob_cluster=obcluster-1645511453:tenant_name=sys:sql_id=2A6D6C59E6D86C57519267B5EC2CBEC3 A slow SQL statement exists in the OceanBase Database tenant.
Alert details
Template: Cluster: ${ob_cluster_name}; Tenant: A slow SQL statement exists in the ${tenant_name} tenant; Database: ${database}; Execution Account: ${user_name}; SQL ID: ${sql_id}; SQL Text: ${query_sql}; First Detected At: ${start_time}; Last Detected At: ${end_time}; Number of Slow Executions: ${execution_count}; Average Number of Affected Rows: ${avg_affected_rows}; Maximum Number of Affected Rows: ${max_affected_rows}; Average Number of Returned Rows: ${return_rows}; Maximum Number of Returned Rows: ${max_return_rows}; Number of Waiting Events: ${total_waits}; Average Waiting Time: ${avg_total_wait_time_ms} ms; Maximum Waiting Time: ${max_total_wait_time_ms} ms; Number of Remote Executions: ${remote_plan_count}; Number of Distributed Executions: ${dist_plan_count}; Average Execution Time: ${avg_elapsed_time_ms} ms; Maximum Execution Time: ${max_elapsed_time_ms} ms
Example: Cluster: obcluster; Tenant: A slow SQL statement exists in the sys tenant. Database: oceanbase; Execution Account: ocp_monitor; SQL ID: 2A6D6C59E6D86C57519267B5EC2CBEC3; SQL Text: select /*+ query_timeout(10000000) */ sid as session_id, event_id, con_id as tenant_id, total_waits, time_waited_micro as time_waited_us from v$session_event where total_waits > ? and (sid> ? or (sid = ? and event_id > ?)) order by sid, event_id limit 40000; First Detected At: 2022-04-06T14:18:44.579; Last Detected At: 2022-04-06T14:18:44.579; Number of Slow Executions: 1; Average Number of Affected Rows: 0; Maximum Number of Affected Rows: 0; Average Number of Returned Rows: 681; Maximum Number of Returned Rows: 681; Number of Waiting Events: 1; Average Waiting Time: 0.09 ms; Maximum Waiting Time: 0.09 ms; Number of Remote executions: 0; Number of Distributed Executions: 0; Average Execution Time: 4056.5 ms; Maximum Execution Time: 4056.5 ms
Impact on the system
Execution of SQL statements consumes system resources. When many SQL statements are executed slowly, system resource usage increases, affecting the system throughput. If the business system does not expect slow execution of SQL statements, its services will be affected.
Possible causes
The SQL statement needs to be optimized. For example, the statement has join queries on large tables or lacks an index.
The OBServer does not have sufficient resources and therefore cannot handle concurrent requests, thus slowing down the SQL statement execution.
Solutions
Use the SQL diagnostics feature in the tenant management module in OCP to identify the cause. If the problem is caused by a business SQL statement, bind an index or an execution plan to the statement.
Check whether the resource usage, such as the CPU usage, IOPS or I/O time, and network throughput, of the OBServer has exceeded the limits.