In practical production scenarios, apart from hardware and infrastructure failures, SQL query exceptions are among the most common causes of database service issues. This section focuses on analyzing and handling SQL query exceptions in OceanBase Database.
Scenario description
During normal database operations, it is common to encounter situations where abnormal SQL execution leads to spikes in CPU usage, high I/O, or a surge in memory. SQL anomalies can generally be categorized into the following scenarios:
The business application has not released a new version, meaning no new SQL statements have been introduced. This can be further divided into two subcategories:
A sudden increase in external business traffic results in a surge in SQL execution frequency or data volume.
The original SQL execution plan is abnormal.
The business application has released a new version, introducing new SQL query logic that includes slow SQL statements.
Business impact
SQL issues can easily lead to high CPU usage and memory exhaustion, causing business delays or even system crashes.
Emergency response process
The following sections describe the quick location and handling methods for each situation.
The business has not released a new version and has not added new SQL statements
Scenario 1: The number of SQL executions or the amount of data changes
In a normal database service scenario, due to marketing activities or increased user access, the number of executions or the amount of data for a certain type of SQL statement in the database may increase. If the execution plan is normal, the preferred solution is to downgrade or throttle the business. If the business cannot directly throttle or downgrade, and if additional CPU/memory resources can be allocated to the tenant, it is preferred to temporarily expand the database. For more information about tenant scaling, see Tenant scaling.
If there are no extra resources available for expansion in the cluster, you can consider throttling specific SQL statements at the OceanBase Database kernel level. You can add the hint max_concurrent clause to the execution plan of the SQL statement to limit its concurrency and thus throttle it. Here is an example:
CREATE OUTLINE outline_name ON sql_id USING HINT /*+max_concurrent(1)*/;
For more information about how to bind an outline, see Plan binding.
Scenario 2: The execution plans of existing SQL statements are abnormal
If the business has not released a new version, but the execution plans of existing SQL statements in the database are abnormal, the following situations may occur:
The execution plan is selected incorrectly. In this case, you can clear the plan cache to force the SQL statement to undergo hard parsing and generate the correct execution plan.
ALTER system flush plan cache [tenant_list] [global];Note
- If you do not specify the tenant_list or global parameter, the execution plans of all tenants or the execution plans of the current server are cleared, respectively. By default, the minimum scope is cleared.
- tenant_list: tenant = 'tenant1, tenant2, tenant3....'
The wrong index is selected.
After troubleshooting, it is found that there are better indexes in the table, but the execution plan does not use the correct index. If clearing the execution plan cache does not solve the issue, you can bind the correct execution plan to the optimizer using an outline.
Index binding is now supported in OCP through a graphical interface. Please refer to the OCP user guide for the corresponding operations.
For command-line execution of binding operations, here is an example: bind the SQL statement
SELECT * FROM t1 WHERE v1 = 3to use the primary key index.Log in to OceanBase Database as a business tenant (note that the -c parameter must be added to the mysql command).
Create an outline.
create outline bind_to_primary_key on SELECT/*+ index(t1 primary)*/ * FROM t1 WHERE v1 = 3;Verify that the outline is created successfully.
SELECT * FROM oceanbase.gv$outline WHERE OUTLINE_NAME = 'bind_to_primary_key';Verify that the new SQL statement generates an execution plan based on the bound outline.
After executing the SQL statement bound to the outline, query the
GV$OB_PLAN_CACHE_PLAN_STATtable for theoutline_idof the SQL statement. If theoutline_idmatches the one found ingv$outline, it indicates that the execution plan was generated based on the bound outline; otherwise, it was not.SELECT SQL_ID, PLAN_ID, STATEMENT, OUTLINE_ID, OUTLINE_DATA FROM oceanbase.GV$OB_PLAN_CACHE_PLAN_STAT WHERE STATEMENT LIKE '%SELECT * FROM t1 WHERE v1 = 3%';Verify that the generated execution plan meets expectations.
After confirming that the execution plan was generated based on the bound outline, check if the plan meets expectations by querying the
GV$OB_PLAN_CACHE_PLAN_STATtable for the shape of the execution plan cached inplan_cache.SELECT OPERATOR, NAME FROM oceanbase.GV$OB_PLAN_CACHE_PLAN_STAT WHERE TENANT_ID = business_tenant_id AND IP = 'xxx.xxx.xxx.xxx' AND PORT = xxxx AND PLAN_ID = the plan_id obtained in the previous step;
The required index is missing.
After troubleshooting, it is found that the current SQL statement involves tables that lack the optimal index. Please create the corresponding index as needed.
Notice
When creating an index, include as many query columns as possible. The more columns included, the fewer rows need to be retrieved from the table.
For equality conditions, place the columns in the composite index at the beginning of the condition.
For filtering and sorting large amounts of data, place the columns in the composite index at the beginning of the condition.
A new version of the business is released, and new SQL logic is introduced
When a new version of the business is released, new SQL statements are generated. In some cases, unreviewed slow SQL statements may consume excessive resources in the production environment, affecting core business operations. In such scenarios, it is necessary to identify slow SQL statements in the database to pinpoint the issue and proceed with optimization.
Identify slow SQL statements using OCP.
You can view the top SQL statements in the TOP SQL page of OCP, sorted by execution time and frequency. For SQL statements that are incorrectly planned, you can bind them online. For SQL statements that require throttling, you need to confirm with the business development team to identify the SQL statements that can be throttled.
Identify slow SQL statements using the command line.
Query the top SQL statements that consume the most CPU resources for a specific tenant.
SELECT sql_id, avg(execute_time) avg_exec_time, count(*) cnt, avg(execute_time - TOTAL_WAIT_TIME_MICRO) cpu_time, RETRY_CNT,QUEUE_TIME,IS_HIT_PLAN FROM oceanbase.V$OB_SQL_AUDIT WHERE tenant_id = 1002 GROUP BY 1 ORDER BY (avg_exec_time * cnt) desc limit 5;Note
The EXECUTE_TIME value indicates the execution time of an SQL statement. If the value is excessively large, it may indicate waiting events or an unusually high number of logical reads.
The RETRY_CNT field indicates the retry count. If the value is excessively large, it may indicate lock conflicts or primary key switching.
If the QUEUE_TIME value is excessively large, it indicates that the CPU resources are insufficient.
GET_PLAN_TIME indicates the time taken to obtain the execution plan. If the value is excessively large, it is usually accompanied by IS_HIT_PLAN=0, which indicates that the plan is not hit.
Perform other related queries.
OceanBase Database provides two virtual tables,
V$OB_SQL_AUDITandGV$OB_SQL_AUDIT, which record the SQL execution history in the recent period. TheV$OB_SQL_AUDITtable stores the SQL execution history of the local node, and theGV$OB_SQL_AUDITtable stores the SQL execution history of the entire cluster. You can perform related queries as needed.Query the
GV$OB_SQL_AUDITtable to find SQL statements executed by a specific tenant for more than 1 second (1,000,000 microseconds).SELECT * FROM oceanbase.GV$OB_SQL_AUDIT WHERE tenant_id= <tenantid> AND elapsed_time> 1000000 limit 10;Query the histogram of SQL execution time in seconds.
SELECT round(elapsed_time/1000000), count(*) FROM oceanbase.V$OB_SQL_AUDIT WHERE tenant_id = <tenant_id> GROUP BY 1;
Identify slow SQL statements in OceanBase Database logs.
In OceanBase Database, SQL statements with an execution time exceeding the trace_log_slow_query_watermark (a system parameter) will be logged as slow query messages in the observer log. The default value of this parameter is 100 ms. You can search for slow SQL statements in the observer log. Although this method is less efficient and intuitive compared to the previous two methods, it can be helpful in certain scenarios, such as when the cache in the sql_audit table has been evicted.
The slow SQL logs are stored in the
/home/admin/oceanbase/logdirectory.View all slow query logs in the log file.
fgrep '[slow query]' observer.log |sed -e 's/|/\n/g' | moreQuery a specific slow query by trace ID.
fgrep "<trace_id>" observer.log |sed -e 's/|/\n/g' | more
