Among all the factors that cause database service errors, SQL query exceptions are the most common ones, apart from the hardware and infrastructure faults. This topic describes how to troubleshoot and eliminate exceptions caused by SQL queries in OceanBase Database.
Scenario
Exceptions of SQL execution result in high CPU utilization, high I/O load, or surging memory usage. SQL exceptions occur in the following scenarios:
The business application has not released a new version, meaning no new SQL statements have been added. This scenario can be further divided into two categories:
The external business traffic has changed, leading to a sudden increase in the number of SQL executions or a surge in data volume.
The execution plan of the original SQL statements is abnormal.
The business application has released a new version, introducing new SQL query logic that includes slow SQL statements.
Business impact
SQL issues can easily lead to high CPU usage and memory exhaustion, resulting in business delays or even system crashes.
Emergency response process
The following sections describe the methods for quickly locating and handling each situation.
No new version is published, and no new SQL is added
Scenario 1: SQL execution count or data volume changes
When the database is providing normal service, marketing campaigns, increased user traffic, or similar reasons may cause the execution count or data volume of certain existing SQL workloads to grow. When the execution plan is still normal, the approach usually preferred first is to downgrade or throttle at the business layer. If the business cannot throttle or downgrade directly, and tenant CPU or memory can be increased at that time, prefer temporary database scale-out. For more information about tenant scaling, see Introduction to tenant scaling.
If the cluster does not have additional resources to scale out, you can consider throttling specific SQL statements at the OceanBase Database kernel level. You can bind an execution plan to the SQL statement and add the hint max_concurrent clause to limit the concurrency of the SQL statement. Example:
CREATE OUTLINE outline_name ON sql_id USING HINT /*+max_concurrent(1)*/;
For more information about how to bind an outline, see Plan binding.
Scenario 2: The execution plan of the existing SQL statement is incorrect
If no new version is published, but the execution plan of the existing SQL statement in the database is incorrect, the following situations may occur:
The execution plan is incorrectly selected. In this case, you can clear the plan cache to force the SQL statement to perform a hard parse and generate the correct plan.
ALTER system flush plan cache [tenant_list] [global];Note
tenant_listandglobalare optional. Iftenant_listis not specified, the plan cache is cleared for all tenants; otherwise, only for the specified tenants. Ifglobalis not specified, the plan cache on the local server is cleared; otherwise, the plan cache is cleared on all servers where the tenant runs. By default, the narrowest clearing scope applies.- tenant_list example: tenant = 'tenant1, tenant2, tenant3....'
The wrong index is selected.
After troubleshooting, it is found that a better index exists in the table, but the execution plan does not use the correct index. If clearing the execution plan cache does not resolve the issue, you can bind an outline to tell the optimizer the correct execution plan.
Index binding is supported graphically in OCP. See the OCP user guide for the corresponding operations.
For command-line binding, refer to the following example: bind the SQL statement
SELECT * FROM t1 WHERE v1 = 3to use the primary key index.Log in to OceanBase Database as a business tenant (note that you must add the
-cparameter to themysqlcommand).Create an outline.
create outline bind_to_primary_key on SELECT/*+ index(t1 primary)*/ * FROM t1 WHERE v1 = 3;Verify that the outline is created.
SELECT * FROM oceanbase.gv$outline WHERE OUTLINE_NAME = 'bind_to_primary_key';Verify that the new SQL statement generates an execution plan based on the bound outline.
After the SQL bound to the outline runs a new query, query the
outline_idin the plan information for that SQL inGV$OB_PLAN_CACHE_PLAN_STAT. Ifoutline_idmatches the value found ingv$outline, the plan was generated from the bound outline; otherwise, it was not.SELECT SQL_ID, PLAN_ID, STATEMENT, OUTLINE_ID, OUTLINE_DATA FROM oceanbase.GV$OB_PLAN_CACHE_PLAN_STAT WHERE STATEMENT LIKE '%SELECT * FROM t1 WHERE v1 = 3%';Verify that the generated execution plan meets expectations.
After you confirm the plan comes from the bound outline, verify that the plan shape in
plan_cachemeets expectations by queryingGV$OB_PLAN_CACHE_PLAN_STAT.SELECT OPERATOR, NAME FROM oceanbase.GV$OB_PLAN_CACHE_PLAN_STAT WHERE TENANT_ID = <business_tenant_id> AND IP = 'xxx.xxx.xxx.xxx' AND PORT = xxxx AND PLAN_ID = <plan_id_from_previous_step>;
The required index is missing.
After troubleshooting, it is found that the optimal index is missing in the tables involved in the current SQL statement. Create the corresponding index as needed.
Notice
When creating an index, include as many of the queried columns as possible. The more columns included, the fewer rows need to be accessed from the table.
For equality conditions, place them at the beginning of the composite index columns.
Place columns with large amounts of filtered or sorted data at the beginning of the composite index columns.
A new version of the business is released, and new SQL logic is introduced
When a new version of the business is released, new SQL statements are generated. In some cases, unreviewed slow SQL statements may consume excessive resources in the production environment, affecting core business operations. In such scenarios, identify slow SQL statements in the database to locate the issue and proceed with optimization.
Locate slow SQL statements in OCP.
On the TOP SQL page in OCP, view queries with the highest elapsed time and execution counts over a recent period, sorted from high to low. For SQL statements with incorrect plans, bind a plan online; for SQL statements that require throttling, work with business development to identify which statements can be throttled.
Locate slow SQL statements from the command line.
Query the top SQL statements with the highest CPU consumption for a specific tenant.
SELECT sql_id, avg(execute_time) avg_exec_time, count(*) cnt, avg(execute_time - TOTAL_WAIT_TIME_MICRO) cpu_time, RETRY_CNT,QUEUE_TIME,IS_HIT_PLAN FROM oceanbase.V$OB_SQL_AUDIT WHERE tenant_id = 1002 GROUP BY 1 ORDER BY (avg_exec_time * cnt) desc limit 5;
Note
For EXECUTE_TIME, if the value is excessively large, consider waiting events or an unusually high number of logical reads.
The RETRY_CNT field indicates the number of retries. If the value is excessively large, it may indicate lock conflicts or leader switching.
If QUEUE_TIME is excessively large, CPU resources are insufficient.
GET_PLAN_TIME is the time to obtain the execution plan. If it is very long, it is usually accompanied by IS_HIT_PLAN=0, meaning the plan was not hit.
Other relevant queries.
OceanBase Database provides two virtual tables,
V$OB_SQL_AUDITandGV$OB_SQL_AUDIT, which record recent SQL execution history.V$OB_SQL_AUDITstores SQL execution history on the local node, andGV$OB_SQL_AUDITstores SQL execution history for the entire cluster. You can run queries as needed.Query
GV$OB_SQL_AUDIT, for example to find SQL for a tenant with execution time greater than 1 second (1,000,000 microseconds).SELECT * FROM oceanbase.GV$OB_SQL_AUDIT WHERE tenant_id= <tenantid> AND elapsed_time> 1000000 limit 10;Query a histogram of SQL execution time in seconds.
SELECT round(elapsed_time/1000000), count(*) FROM oceanbase.V$OB_SQL_AUDIT WHERE tenant_id = <tenant_id> GROUP BY 1;
Locate slow SQL statements in OceanBase logs.
In OceanBase Database, SQL statements whose execution time exceeds
trace_log_slow_query_watermark(a system parameter) are printed as slow query messages in the observer log. The default value of this parameter is 100 ms. You can use the observer log to locate slow SQL statements. This method is not as efficient or intuitive as the previous two, but it can help in some cases—for example, when records have already been evicted from sql_audit.Slow SQL logs are located at:
/home/admin/oceanbase/logView all slow query entries in the log.
fgrep '[slow query]' observer.log |sed -e 's/|/\n/g' | moreQuery a specific slow query by trace_id.
fgrep "<trace_id>" observer.log |sed -e 's/|/\n/g' | more
