SQL tuning basically refers to optimizing the execution plan of SQL and designing well-structured index combinations. Reducing the number of rows scanned is the most effective means of optimization. In addition to this, SQL tuning is inseparable from application and database optimization.
In a broad sense, SQL tuning is to look for best practices of getting data from databases for applications by using SQL requests, which is also the focus of this topic. For a single SQL query, the optimization options are limited. For example, it is almost impossible to optimize an SQL query without any filter conditions, and you may wonder whether the query is even necessary.
This topic provides typical scenarios and cases of SQL tuning in OceanBase Database to help you better understand common methods for SQL diagnostics and optimization. At the same time, we expect to gradually build a library of SQL tuning methods based on these cases, and implement GUI-based SQL diagnostics and optimization capabilities to facilitate SQL risk management and optimization.
SQL capacity management
High SQL performance consumption
SQL execution gradually slows down
Scenarios
SQL execution gradually slows down over a period of time. Here are the typical business scenarios:
- The business periodically writes data, so the data volume increases gradually. As a result, SQL execution slows accordingly.
- For paging SQL queries on large datasets, as the number of pages increases, more data needs to be scanned, which also slows the execution of the queries.
Optimization suggestions
We recommend that you take the following measures in the first scenario:
- Add a covering index and use
ORDER BY+LIMITto control the number of rows processed at a time, so that data is smoothly processed. This keeps the SQL performance relatively constant. - Maintain balance between business data production and downstream consumption, as far as possible, to prevent data accumulation. If accumulation occurs, you can migrate the accumulated data through asynchronous tasks to implement even distribution, thus ensuring stable performance of downstream consumption.
We recommend that you take the following measures in the second scenario:
Do not use offset for paging queries. Otherwise, the performance of queries on large pages becomes poor. You can transfer the primary key id instead. For example, each time you query a page, you can transfer the primary key id or another identifier in the last row on the page returned for the previous query to the current query. This way, you can quickly obtain data based the information transferred and the use of ORDER BY LIMIT n.
SQL queries spike
Scenarios
In this scenario, SQL queries are executed much more frequently in a short period of time, causing high performance consumption and even the exhaust of CPU resources in a tenant. This may affect the overall performance of the database. Here are the typical business scenarios:
- Scheduled tasks for business
- Promotion campaigns
- Cache stampede
Optimization suggestions
- For business: Add traffic control at the business layer to prevent requests from instantly using up database resources in scenarios such as cache stampede, promotion campaigns, and scheduled tasks.
- For database: Calculate the CPU capacity required by the database and add buffer CPU capacity. Alternatively, enable tenant-level CPU over-allocation to allow the tenant to automatically use over-allocated CPU capacity in emergency situations.
- For SQL emergency response: Throttle abnormal SQL requests to recover the overall service of the database at the expense of a small part of business.
SQL business hotspots
Read hotspots
Scenarios
A read hotspot arises when a row is accessed by SQL requests from a single account in a short period of time. SQL read hotspots are a type of SQL execution spike scenarios. However, if you can locate the abnormal account, you can control the traffic from the account. This affects requests from only this account and has no other business impact.
Optimization suggestions
During SQL emergency response, you can throttle requests from the account that leads to a hotspot to lower the concurrency and recover the database.
Write and lock hotspots
Scenarios
A write hotspot usually triggers lock competition, which leads to a large number of SQL retries or lock waits and eventually causes database exceptions. A write hotspot often arises when two UPDATE or SELECT FOR UPDATE operations are concurrently performed on the same row.
Optimization suggestions
We recommend that you use the NOWAIT keyword when executing SELECT FOR UPDATE and implement a retry mechanism at the business code layer to avoid lock contention at the database layer. During SQL emergency response, you can throttle requests from the account that leads to a hotspot to lower the lock concurrency.
Account size skew
Scenarios
Account size skew is an SQL optimization scenario caused by uneven data distribution. It often happens when certain accounts in a table have much larger data volumes than others. When queries involving these large accounts are executed, they need to scan significantly more data, leading to performance fluctuations.
Case analysis
This issue is especially common in business environments where there are major differences in data volume.
For example, if the data in the system forms an inverted pyramid—where a small number of accounts hold most of the data—SQL requests for these large accounts will perform much worse than those for smaller accounts.
Optimization suggestions
For routine optimization, there are limited options purely at the SQL level. However, improvements can be made from the business architecture side:
- Split data for large and small accounts. In addition to horizontal sharding, consider vertical partitioning based on account size to reduce data skew as much as possible.
- Route requests for large and small accounts separately. Different accounts may require different SQL indexes. By routing requests according to account size, you can ensure SQL uses the right indexes and maintain priority for critical queries. For example, in an emergency, if you can't immediately identify the problematic account, you can limit SQL traffic through a specific channel to protect the stability of other business operations.
- Use pagination queries. Implement ID-based sharding, where each query retrieves only a segment of data using "id > last page's ending id." Aggregate the results at the application layer. This approach keeps SQL performance consistent for both large and small accounts.
For emergency situations, aside from increasing tenant capacity and throttling SQL requests, there are few other options. However, you can limit SQL requests specifically for large accounts to help maintain overall system stability.
SQL performance optimization
Execution plan optimization
Execution plan deterioration
Scenarios
OceanBase Database provides the plan cache mechanism for SQL tuning, which is described as follows:
- When receiving the first SQL request, the database hard-parses the request, calculates costs based on parameters in the request, generates an execution plan, and then writes the plan to the plan cache.
- When receiving the second SQL request, the database checks whether the request hits a plan in the plan cache. If so, the database executes the cached plan.
Hard parsing consumes a large amount of CPU resources. The plan cache mechanism can improve SQL performance and the overall database throughput. However, the plan cache mechanism also brings problems. For example, the quality of an SQL execution plan depends on the quality of parameters used for hard parsing. If the parameters in the first SQL request are the "minority", that is, they do not meet requirements of most requests, then the generated execution plan may be inappropriate for the "majority". In addition, if data distribution changes and the original plan is not applicable to the new scenario, performance of the SQL request also deteriorates.
Optimization suggestions
Execution plans that use indexes outperform those involve full table scans. We recommend that you bind a correct plan to an SQL request for emergency response.
Buffer tables
Scenarios
Buffer tables are an SQL exception that is triggered only when the following conditions are met:
Condition 1: Most data inserted into a table will be deleted soon, that is, only a small amount of data persists in the table. In general, SQL execution plans that access this table perform a full table scan. This is because OceanBase Database recycles data block "holes" (unused space) from the table, which makes the table's high-water mark rather low and thus greatly reduces the table scan cost.
Condition 2: A great deal of data is inserted and deleted in a short period of time, and the table's high-water mark remains high due to inefficient "hole" recycling or data in the table piles up because the amount of inserted data is larger than that of deleted data. In this case, the database actually scans lots of data in the table, leading to SQL performance deterioration.
Note that buffer tables arise only when both of the preceding conditions are met. In addition, frequent updates on indexed columns may also trigger this exception. This is because the updates are actually implemented by inserting data into and deleting data from the index table, which makes the index table a buffer table.
Optimization suggestions
We recommend that you do not perform optimization for routine use, because buffer tables occur at a quite low probability with strict trigger conditions. For emergency response, we recommend that you use an outline to bind the settle_id index. If you want to ensure the stability of SQL performance and prevent sudden deterioration, you can also specify an index with the settle_id field in the hint at the cost of some daily performance.
Multi-plan jitter
Scenarios
When multiple indexes are available for an SQL query, the OceanBase Database optimizer may choose different indexes for different execution plans based on data distribution. Because the optimizer always tries to find the optimal execution plan at any time, it estimates the SQL execution cost based on current data distribution and parameters, and selects the plan with the lowest cost, which is considered the one with the best performance. However, data distribution changes as business data is constantly written into or changed in the database, and thereby different execution plans may be generated at different points in time. In addition, because the distribution of different data values may vary greatly, the execution plans may change frequently back and forth.
Optimization suggestions
You can reduce execution plan jitter in scenarios where multiple indexes are available in the following ways:
- Do not create too many redundant indexes. For example, the
status,env, andenv+statusindexes overlap with each other. To avoid ambiguity, you can drop the single-columnstatusindex to prevent miss selection for execution plans and reduce storage costs of index tables. - Do not build an index on enum columns with few enum options, such as gender, age, type, and status, unless the index is highly selective. For example, assume that the status column in a table has 99%
Initializingvalues and 1%Pendingvalues. The business scenario is to insert values into rows wherestatusisInitializing, and then update the value in thestatuscolumn toCompletedfor the rows based on the primary key. After the operations, only 1% of rows meet the conditionstatus = Pending. In this case, if a query needs to obtain data in the pending state form the table, an index on thestatuscolumn will provide high performance.
- For scenarios with serious data skew, you need to perform optimization at the business layer. For more information, see the "Account size skew" section.
Index optimization
Incorrectly selected index
Scenarios
An SQL query has multiple indexes, but the index selected by the plan does not achieve optimal performance. Possible causes are as follows:
- The index specified in the SQL hint leads to poor performance.
- An inappropriate index is bound to the SQL outline.
- The data distribution of input parameters used for hard parsing of the SQL query is applicable to a minority of scenarios. As a result, an incorrect execution path is selected.
- The plan generated based on the data distribution during SQL tuning is not suitable for the data distribution after the business runs for a period of time.
Optimization suggestions
We recommend that you do not specify an index in the SQL hint, which allows the optimizer to automatically select an index, or specify the idx_fund_inst_type_time index.
No available index
Scenarios
No index is available for filter conditions in an SQL query, so the query has to perform a full table scan or use other indexes. This causes low performance.
Optimization suggestions
- For SQL execution: Create an index on an appropriate column. In this example, you can create a normal index on the
dag_task_idcolumn. Since the index also contains the primary keytask_id, sorting oftask_idwill be eliminated. - For business process: We recommend that you perform SQL review in the code integration phase to rule out SQL queries that have no available indexes.
Inappropriate index
Scenarios
Indexes available to an SQL query are inappropriate and fail to improve performance. However, there exist columns on which you can create a more efficient index.
Optimization suggestions
The optimization suggestions are as follows:
For SQL execution:
- Remove the function operation on the
gmt_createcolumn by rewritingTIMESTAMPDIFF(MINUTE, gmt_create, now()) > ?asgmt_create > DATE_SUB(now(), INTERVAL ? MINUTE). - Change the index that covers only
sceneandgmt_createto an index that coversscene,effective, andgmt_create. The performance can be improved by about three times.
- Remove the function operation on the
For business process: We recommend that you perform SQL review in the code integration phase to rule out SQL queries with predicates involving function operations.