In production practice, after excluding hardware and infrastructure failures, SQL query exceptions are among the most common causes of database service anomalies (if not the most common). As a key focus for users, this section describes how to analyze and handle SQL query exceptions in OceanBase Database.
Scenario description
During normal database operation, you may encounter situations such as CPU usage spikes, high I/O, or memory surges caused by abnormal SQL execution. SQL exceptions can be categorized into the following types:
No new version of the business application is released, meaning no new SQL statements are added. This can be further divided into two scenarios:
Changes in external business traffic result in a sudden increase in SQL execution frequency or data volume.
The execution plan of the original SQL statements is abnormal.
A new version of the business application is released, introducing new SQL query logic that includes slow SQL statements.
Business impact
SQL issues can easily lead to high CPU usage, memory exhaustion, and other phenomena, which can cause business delays or even system crashes.
Emergency handling process
The following sections describe the quick location and handling methods for each scenario.
No new version is published, and no new SQL statements are added
Scenario 1: The number of SQL executions or the data volume changes
If the number of executions or the data volume of a certain type of SQL statement in the database increases due to marketing activities or increased user access, but the execution plan is normal, the recommended solution is to downgrade or throttle the business. If the business cannot directly throttle or downgrade, and you can increase the tenant's CPU and memory configuration, give priority to temporary database expansion. For more information about tenant expansion, see Tenant expansion and reduction.
If the cluster has no spare resources for expansion at this time, you can throttle the specified SQL statements at the OceanBase Database kernel level by binding the execution plan of the SQL to the hint max_concurrent clause to limit SQL concurrency. Example:
CREATE OUTLINE outline_name ON sql_id USING HINT /*+max_concurrent(1)*/;
For more information about how to bind an outline, see Plan binding.
Scenario 2: The execution plan of an existing SQL statement is abnormal
If no new version is published, but the execution plan of an existing SQL statement is abnormal, the following situations may occur:
The execution plan is selected incorrectly. In this case, you can clear the plan cache to force the SQL statement to re-parse and generate a correct plan.
ALTER system flush plan cache [tenant_list] [global];Note
- Parameters
tenant_listandglobalare optional. Iftenant_listis not specified, the plan cache of all tenants is cleared; otherwise, only the specified tenant(s) are cleared. Ifglobalis not specified, the local plan cache on this server is cleared; otherwise, the plan cache on all servers where the tenant resides is cleared. The default is the minimum scope. - Example of
tenant_list: tenant = 'tenant1, tenant2, tenant3....'
- Parameters
The wrong index is selected.
After troubleshooting, you find that the table has a better index, but the execution plan does not use the correct index. If clearing the plan cache does not solve the issue, you can bind the correct execution plan to the optimizer using an outline.
Index binding is now supported in OCP for graphical operations. Please refer to the OCP user guide for the corresponding operations.
For command-line execution of binding operations, you can refer to the following example: bind the SQL statement SELECT * FROM t1 WHERE v1 = 3 to the primary key index.
Log in to OceanBase Database as a business tenant (note that you need to add the -c parameter to the mysql command).
Create an outline.
create outline bind_to_primary_key on SELECT/*+ index(t1 primary)*/ * FROM t1 WHERE v1 = 3;Verify that the outline is created.
SELECT * FROM oceanbase.gv$outline WHERE OUTLINE_NAME = 'bind_to_primary_key';Verify that the new SQL statement generates a new execution plan based on the bound outline.
After executing the SQL statement bound to the outline, query the
GV$OB_PLAN_CACHE_PLAN_STATtable for theoutline_idof the SQL statement. If theoutline_idmatches the one found ingv$outline, it indicates that the execution plan was generated based on the bound outline.SELECT SQL_ID, PLAN_ID, STATEMENT, OUTLINE_ID, OUTLINE_DATA FROM oceanbase.GV$OB_PLAN_CACHE_PLAN_STAT WHERE STATEMENT LIKE '%SELECT * FROM t1 WHERE v1 = 3%';Verify that the generated execution plan meets expectations.
After confirming that the execution plan is generated based on the bound outline, check whether the plan meets expectations by querying the
GV$OB_PLAN_CACHE_PLAN_STATtable for the shape of the cached execution plan inplan_cache.SELECT OPERATOR, NAME FROM oceanbase.GV$OB_PLAN_CACHE_PLAN_STAT WHERE TENANT_ID = business_tenant_id AND IP = 'xxx.xxx.xxx.xxx' AND PORT = xxxx AND PLAN_ID = plan_id_found_in_previous_step;
The optimal index is missing.
After troubleshooting, you find that the optimal index is missing from the tables involved in the current SQL statement. Please create the corresponding index as needed.
Notice
When creating an index, include as many of the queried columns as possible. This minimizes the number of rows that need to be accessed from the table.
Place equality conditions at the beginning of the composite index columns.
Place columns with large amounts of filtered or sorted data at the beginning of the composite index columns.
A new version of the business is released, and new SQL logic is introduced
When a new version of the business is released, new SQL statements are generated. In some cases, unreviewed slow SQL statements may consume excessive resources in the production environment, affecting core business operations. In such cases, it is necessary to identify and troubleshoot slow SQL statements in the database to locate the root cause and implement subsequent optimization measures.
Use OCP to identify slow SQL statements.
You can view the top SQL statements in the TOP SQL page of OCP, sorted by execution time and frequency over a certain period. For SQL statements that are incorrectly planned, you can directly bind them online. For SQL statements that require throttling, you need to confirm with the business development team to identify the SQL statements that can be throttled.
Use the command line to identify slow SQL statements.
Query the top SQL statements that consume the most CPU resources in a specific tenant.
SELECT sql_id, avg(execute_time) avg_exec_time, count(*) cnt, avg(execute_time - TOTAL_WAIT_TIME_MICRO) cpu_time, RETRY_CNT,QUEUE_TIME,IS_HIT_PLAN FROM oceanbase.V$OB_SQL_AUDIT WHERE tenant_id = 1002 GROUP BY 1 ORDER BY (avg_exec_time * cnt) desc limit 5;
Note
If the value of EXECUTE_TIME is excessively large, it may indicate waiting events or an unusually high number of logical reads.
The RETRY_CNT field indicates the number of retries. If the value is excessively large, it may indicate lock conflicts or primary switchover.
If the value of QUEUE_TIME is excessively large, it indicates insufficient CPU resources.
GET_PLAN_TIME indicates the time taken to obtain the execution plan. If this value is excessively large, it is usually accompanied by IS_HIT_PLAN=0, indicating that the plan was not hit.
Other related queries.
OceanBase Database provides two virtual tables,
V$OB_SQL_AUDITandGV$OB_SQL_AUDIT, which record SQL execution history over a recent period.V$OB_SQL_AUDITstores the SQL execution history on the local server;GV$OB_SQL_AUDITstores the SQL execution history across the entire cluster. You can perform queries as needed.Query the
GV$OB_SQL_AUDITtable to find SQL statements executed by a specific tenant with an execution time greater than 1 second (1,000,000 microseconds).SELECT * FROM oceanbase.GV$OB_SQL_AUDIT WHERE tenant_id= <tenantid> AND elapsed_time> 1000000 limit 10;Query the histogram of SQL execution time in seconds.
SELECT round(elapsed_time/1000000), count(*) FROM oceanbase.V$OB_SQL_AUDIT WHERE tenant_id = <tenant_id> GROUP BY 1;
Use OceanBase logs to identify slow SQL statements.
In OceanBase Database, if the execution time of an SQL statement exceeds the trace_log_slow_query_watermark (a system parameter), a slow query message is printed in the observer log. The default value of this parameter is 100 ms. You can use the observer log to locate slow SQL statements. This method is less efficient and intuitive compared to the previous two methods, but it can be helpful in certain scenarios, such as when the cached SQL statements in the sql_audit table have been evicted.
The slow SQL log is stored in the
/home/admin/oceanbase/logdirectory.View all slow queries in the log.
fgrep '[slow query]' observer.log |sed -e 's/|/\n/g' | moreQuery a specific slow query based on the trace ID.
fgrep "<trace_id>" observer.log |sed -e 's/|/\n/g' | more