Among all the factors that cause database service errors, SQL query exceptions are the most common, in addition to hardware and infrastructure faults. This topic describes how to troubleshoot and eliminate exceptions caused by SQL queries in OceanBase Database.
Scenario
Exceptions of SQL execution result in high CPU utilization, high I/O load, or surging memory usage. SQL exceptions occur in the following scenarios:
No new versions of applications are released, which means no new types of SQL queries. SQL exceptions in this scenario are classified into the following two types:
A sudden increase in the number of SQL executions and data amount due to changes in the business traffic.
Exceptions of the existing SQL execution plan.
New types of SQL queries, including slow SQL queries, are generated due to the release of new versions of applications.
Business impact
SQL exceptions may cause high CPU utilization and exhaust memory space, which result in business freezing or failure.
Emergency procedure
The following sections describe how to troubleshoot and eliminate exceptions in each scenario.
No new types of SQL queries
Scenario 1: increase in the number of SQL executions or data amount
If the number of a certain type of SQL executions or the data amount increases due to marketing campaigns or the growth of user access, and the execution plan operates normally, we recommend that you downgrade the business or throttle the traffic. If business downgrade or traffic throttling is impossible, you can scale up the CPU or memory configurations of the database. For more information, see Scaling in Basic database management.
If no extra resources are available in the cluster, you can throttle the execution of the specified SQL statement by inserting the max_concurrent hint to the execution plan that is bound to the SQL statement to limit the SQL execution concurrency. Example:
CREATE OUTLINE outline_name ON sql_id USING HINT /*+max_concurrent(1)*/;
For more information about how to bind an outline, see "CREATE OUTLINE" in SQL Reference (MySQL Mode) and "CREATE OUTLINE" in SQL Reference (Oracle Mode).
Scenario 2: exceptions of the existing SQL execution plan.
If no new versions of applications are released, the following two types of exceptions may occur in the existing SQL execution plan:
An invalid execution plan is used. You can clear the plan cache and execute hard parsing of SQL statements to generate a valid plan.
ALTER system flush plan cache [tenant_list] [global];
Note
- The tenant_list and global fields are optional. If you do not specify the tenant_list field, the plan caches of all tenants are cleared. Otherwise, only the plan caches of the specified tenants are cleared. If you do not specify the global field, the plan caches on the local server are cleared. Otherwise, the plan caches are cleared on all servers where the tenant resides. By default, the minimum plan caches are cleared.
- You must specify the tenant_list field in the following format: tenant = 'tenant1, tenant2, tenant3….'
An invalid index is used.
If you find that the execution plan does not use a better index, and the exception is not resolved by clearing the plan cache, you can bind an outline to the valid execution plan for the optimizer to use.
OceanBase Cloud Platform (OCP) provides a graphical interface for you to bind indexes. For more information, see OCP User Guide.
The following example shows how to bind the SELECT * FROM t1 WHERE v1 = 3 statement to the primary key index by using the CLI tool:
Log on to OceanBase Database as a business tenant. You must add the -c parameter in the
mysqlcommand.Create an outline.
create outline bind_to_primary_key on SELECT/*+ index(t1 primary)*/ * FROM t1 WHERE v1 = 3;Verify whether the outline is created.
SELECT * FROM oceanbase.gv$outline WHERE OUTLINE_NAME = 'bind_to_primary_key';Verify whether the execution of the new SQL statement generates a new plan based on the bound outline.
After the SQL statement bound to the outline executes a new query, check the
gv$plan_cache_plan_statview foroutline_idin the plan information that corresponds to the SQL statement. If the value ofoutline_id``` is identical to that found in thegv$outline` view, the plan is generated based on the bound outline. Otherwise, it is not.SELECT SQL_ID, PLAN_ID, STATEMENT, OUTLINE_ID, OUTLINE_DATA FROM oceanbase.gv$plan_cache_plan_stat WHERE STATEMENT LIKE '%SELECT * FROM t1 WHERE v1 = 3%';Verify whether the execution plan meets expectations.
After you verify that the plan is generated based on the bound outline, you can query the
gv$plan_cache_plan_statview to view the shape of the execution plan cached inplan_cacheto verify whether the plan meets expectations.SELECT OPERATOR, NAME FROM oceanbase.gv$plan_cache_plan_explain WHERE TENANT_ID = Business tenant ID AND IP = 'xxx.xxx.xxx.xxx' AND PORT = xxxx AND PLAN_ID = the plan_id obtained in the previous step;
No index is available.
If you find that no optimal index is available for the table related to the SQL statement, you can create the index as needed.
Notice
Create the index on all the columns to be queried. This can reduce the number of rows returned by table access by index primary key.
Place equivalent conditions at the head of the columns contained in the composite index.
Place the fields with heavy data filtering and sorting load in front of the columns contained in the composite index.
Occurrence of new types of SQL queries
The release of new versions of applications tends to bring new SQL queries. Some unreviewed slow SQL queries can consume excessive resources, which affects the performance of the core business. In this case, you must locate the slow SQL queries in the database to optimize the performance.
Locate slow SQL queries in the OCP console.
On the TopSQL page in the OCP console, you can view the most time-consuming and frequently executed queries, and sort them in descending order. If an invalid execution plan is used, bind a valid one to the SQL query. If you need to throttle an SQL query, consult with developers to locate the SQL query.
Use the CLI tool to locate slow SQL queries.
Execute the following statement to view the top SQL queries that consume the most CPU resources in a specific tenant:
SELECT sql_id, avg(execute_time) avg_exec_time, count(*) cnt, avg(execute_time - TOTAL_WAIT_TIME_MICRO) cpu_time, RETRY_CNT,QUEUE_TIME,IS_HIT_PLAN FROM OCEANBASE.gv$sql_audit WHERE tenant_id = 1002 GROUP BY 1 ORDER BY (avg_exec_time * cnt) desc limit 5;Note
A large value of the EXECUTE_TIME field indicates possible wait events or an excessive number of logical reads.
A large value of the RETRY_CNT field indicates possible lock contention or follower-to-leader switchover.
A large value of the QUEUE_TIME field indicates insufficient CPU resources.
The GET_PLAN_TIME field indicates the time consumed to obtain the execution plan. A large value of this field is often accompanied by IS_HIT_PLAN = 0, which means that no plan is hit.
Other related queries.
OceanBase Database records recent SQL execution history in the
v$sql_auditandgv$sql_audittables. The v$sql_audit table stores the SQL execution history of the local server, and the gv$sql_audit table stores the SQL execution history of the cluster. You can query the two tables as needed.a. Query the v$sql_audit table to locate slow SQL queries. For example, find SQL queries whose execution time is longer than 1s (1000000 μs) for a tenant.
SELECT * FROM v$sql_audit WHERE tenant_id= <tenantid> AND elapsed_time> 1000000 limit 10;b. Query the histogram of SQL queries by execution time in seconds.
SELECT round(elapsed_time/1000000), count(*) FROM v$sql_audit WHERE tenant_id = <tenant_id>GROUP BY 1;
Search for slow SQL queries in observer log.
SQL queries whose execution time exceeds the value of the trace_log_slow_query_watermark parameter are tagged with "slow query" in the observer log. The default value of the parameter is 100 ms. Therefore, you can search for slow SQL queries in observer log. This method is not as efficient and visual as the preceding two methods. However, you also need to use this method in some cases, such as the case where the cache in sql_audit is evicted.
The observer log is saved in the
/home/admin/oceanbase/logdirectory.a. View all slow queries in the log.
fgrep '[slow query]' observer.log |sed -e 's/|/\n/g' | moreb. Query a slow query by trace_id.
fgrep "<trace_id>" observer.log |sed -e 's/|/\n/g' | more
Insufficient memory in the SQL sort workspace
OceanBase Database stores intermediate result sets to be sorted in a memory area, which is known as a workspace. If a large number of data sorting SQL queries are executed, errors may occur due to insufficient workspace memory. If an error occurs, an error message such as Failed to allocate memory is reported in the observer log. In this case, you can increase the value of the ob_asql_work_area_percentage parameter. The default value is 5, which indicates 5% of the memory in a single resource unit of the tenant. The parameter can be set to take effect at the session level or global level. Recommended value range: [5,20].
Sample statements:
Session level
obclient> SET ob_sql_work_area_percentage = xxx;Global level
obclient> SET GLOBAL ob_sql_work_area_percentage = xxx;