Among all the factors that cause database service errors, SQL query exceptions are the most common ones, apart from the hardware and infrastructure faults. This topic describes how to troubleshoot and eliminate exceptions caused by SQL queries in OceanBase Database.
Scenario
Exceptions of SQL execution result in high CPU utilization, high I/O load, or surging memory usage. SQL exceptions occur in the following scenarios:
No new versions of applications are released, which means no new types of SQL queries. SQL exceptions in this scenario are classified into the following two types:
A sudden increase in the number of SQL executions and data amount due to changes in the business traffic.
Exceptions of the existing SQL execution plan.
New types of SQL queries, including slow SQL queries, are generated due to the release of new versions of applications.
Business impact
SQL exceptions may cause high CPU utilization and exhaust memory space, which result in business freezing or failure.
Emergency procedure
The following sections describe how to troubleshoot and eliminate exceptions in each scenario.
No new types of SQL queries
Scenario 1: increase in the number of SQL executions or data amount
If the number of a certain type of SQL executions or the data amount increases due to marketing campaigns or the growth of user access, and the execution plan operates normally, we recommend that you downgrade the business or throttle the traffic. If business downgrade or traffic throttling is impossible, you can scale up the CPU or memory configurations of the database. For more information, see Scaling in Basic database management.
If no extra resources are available in the cluster, you can throttle the execution of the specified SQL statement by inserting the max_concurrent hint to the execution plan that is bound to the SQL statement to limit the SQL execution concurrency. Here is an example:
CREATE OUTLINE outline_name ON sql_id USING HINT /*+max_concurrent(1)*/;
For more information about how to bind an outline, see "CREATE OUTLINE" in SQL Reference (MySQL Mode) and "CREATE OUTLINE" in SQL Reference (Oracle Mode).
Scenario 2: exceptions of the existing SQL execution plan.
If no new application versions are released, the following types of exceptions may occur in the existing SQL execution plan:
An invalid execution plan is used. You can clear the plan cache and execute hard parsing of SQL statements to generate a valid plan.
ALTER system flush plan cache [tenant_list] [global];
Note
- The tenant_list and global fields are optional. If you do not specify the tenant_list field, the plan caches of all tenants are cleared. Otherwise, only the plan caches of the specified tenants are cleared. If you do not specify the global field, the plan caches on the local server are cleared. Otherwise, the plan caches are cleared on all servers where the tenant resides. By default, the minimum plan caches are cleared.
- You must specify the tenant_list field in the following format: tenant = 'tenant1, tenant2, tenant3….'
An invalid index is used.
If you find that the execution plan does not use a better index, and the exception is not resolved by clearing the plan cache, you can bind an outline to the valid execution plan for the optimizer to use.
OceanBase Cloud Platform (OCP) provides a graphical interface for you to bind indexes. For more information, see OCP User Guide.
The following example shows how to bind the
SELECT \* FROM t1 WHERE v1 = 3statement to the primary key index by using the CLI tool:Log on to OceanBase Database as a business tenant. You must add the
-cparameter in themysqlcommand.Create an outline.
create outline bind_to_primary_key on SELECT/*+ index(t1 primary)*/ * FROM t1 WHERE v1 = 3;Verify whether the outline is created.
SELECT * FROM oceanbase.gv$outline WHERE OUTLINE_NAME = 'bind_to_primary_key';Verify whether the execution of the new SQL statement generates a new plan based on the bound outline.
After the SQL statement bound to the outline executes a new query, check the
gv$plan_cache_plan_statview foroutline_idin the plan information that corresponds to the SQL statement. If the value ofoutline_idis identical to that found in thegv$outlineview, the plan is generated based on the bound outline. Otherwise, it is not.SELECT SQL_ID, PLAN_ID, STATEMENT, OUTLINE_ID, OUTLINE_DATA FROM oceanbase.GV$OB_PLAN_CACHE_PLAN_STAT WHERE STATEMENT LIKE '%SELECT * FROM t1 WHERE v1 = 3%';Verify whether the execution plan meets expectations.
After you verify that the plan is generated based on the bound outline, you can query the
gv$plan_cache_plan_statview to view the shape of the execution plan cached inplan_cacheto verify whether the plan meets expectations.SELECT OPERATOR, NAME FROM oceanbase.V$OB_PLAN_CACHE_PLAN_STAT WHERE TENANT_ID = Business tenant ID AND IP = 'xxx.xxx.xxx.xxx' AND PORT = xxxx AND PLAN_ID = the plan_id obtained in the previous step;
No index is available.
If you find that no optimal index is available for the table related to the SQL statement, you can create the index as needed.
Notice
Create the index on all the columns to be queried. This can reduce the number of rows returned by table access by index primary key.
Place equivalent conditions at the head of the columns contained in the composite index.
Place the fields with heavy data filtering and sorting load in front of the columns contained in the composite index.
Occurrence of new types of SQL queries
The release of new versions of applications tends to bring new SQL queries. Some unreviewed slow SQL queries can consume excessive resources, which affects the performance of the core business. In this case, you must locate the slow SQL queries in the database to optimize the performance.
Locate slow SQL queries in the OCP console.
On the TopSQL page in the OCP console, you can view the most time-consuming and frequently executed queries, and sort them in descending order. If an invalid execution plan is used, bind a valid one to the SQL query. If you need to throttle an SQL query, communicate with developers to locate the SQL query.
Use the CLI tool to locate slow SQL queries.
Execute the following statement to view the top SQL queries that consume the most CPU resources in a specific tenant:
SELECT sql_id, avg(execute_time) avg_exec_time, count(*) cnt, avg(execute_time - TOTAL_WAIT_TIME_MICRO) cpu_time, RETRY_CNT,QUEUE_TIME,IS_HIT_PLAN FROM OCEANBASE.V$OB_SQL_AUDIT WHERE tenant_id = 1002 GROUP BY 1 ORDER BY (avg_exec_time * cnt) desc limit 5;
Note
A large value of the
EXECUTE_TIMEfield indicates possible wait events or an excessive number of logical reads.A large value of the
RETRY_CNTfield indicates possible lock contention or follower-to-leader switchover.A large value of the
QUEUE_TIMEfield indicates insufficient CPU resources.The
GET_PLAN_TIMEfield indicates the time consumed to obtain the execution plan. A large value of this field is often accompanied byIS_HIT_PLAN = 0, which means that no plan is hit.
Other related queries.
OceanBase Database records recent SQL execution history in the
V$OB_SQL_AUDIandGV$OB_SQL_AUDITtables. TheV$OB_SQL_AUDITtable records the SQL execution history of the local server, and theGV$OB_SQL_AUDITtable records the SQL execution history of the entire cluster. You can query the two tables as needed.Query the
GV$OB_SQL_AUDITtable to locate slow SQL queries. For example, find SQL queries whose execution time is longer than 1s (1000000 ms) for a tenant.SELECT * FROM oceanbase.GV$OB_SQL_AUDIT WHERE tenant_id= <tenantid> AND elapsed_time> 1000000 limit 10;Query the SQL execution time histogram ordered by seconds:
SELECT round(elapsed_time/1000000), count(*) FROM oceanbase.V$OB_SQL_AUDIT WHERE tenant_id = <tenant_id> GROUP BY 1;
Search for slow SQL queries in observer log.
SQL queries whose execution time exceeds the value of the
trace_log_slow_query_watermarkparameter are tagged with "slow query" in the observer log. The default value of the parameter is 100 ms. Therefore, you can search for slow SQL queries in observer log. This method is not as efficient and visual as the preceding two methods. However, you also need to use this method in some cases, such as the case where the cache in sql_audit is evicted.The observer log is saved in the
/home/admin/oceanbase/logdirectory.View all slow SQL queries in the log.
fgrep '[slow query]' observer.log |sed -e 's/|/\n/g' | moreView a specific slow SQL query by using the trace id.
fgrep "<trace_id>" observer.log |sed -e 's/|/\n/g' | more