Structured query language (SQL) is a widely used method for users to interact with databases. Through SQL, we can define, store, update, and query data, as well as perform other management operations. SQL is a highly non-procedural programming language, where users simply specify "what" needs to be done, without having to specify "how" to do it. Users do not need to specify how the data is stored, nor do they need to understand the specific data storage methods. SQL is more like a "colloquial" programming language, similar to how we speak in everyday conversation, with a subject-verb-object hierarchical structure.
For example, imagine going to the library and asking the librarian for a book on "artificial intelligence". The SQL statement for this request would be select * from book where book_name='artificial intelligence'. Just as the librarian organizes and categorizes books in advance, efficient and logical indexing design is essential for optimal SQL performance. This ensures that when the book on "artificial intelligence" is added to the library collection, it is placed in the appropriate section, making it easy to find. Conversely, if the books were randomly arranged, it might work with a small collection but would be highly inefficient as the collection grows. The efficient classification and sorting techniques in the library can be compared to the index structure of the database schema, emphasizing the importance of SQL index optimization to minimize data scanning during SQL execution.
In practical scenarios, SQL tuning is more complex than optimizing an individual SQL statement. Key considerations include questions such as:
- How to write SQL statements to enhance performance based on existing indexes?
- Which is the most appropriate index for an SQL statement?
- How to design reasonable indexes for optimal business system performance?
- How to identify problematic SQL statements and quickly recover a database when exceptions occur?
This topic explores how to develop an SQL tuning system across the R&D, integration, and O&M phases.
R&D phase: Craft efficient SQL statements
During the daily R&D process, crafting efficient SQL statements involves the following key points:
- Formulate SQL R&D specifications and strictly follow them.
- Reasonably plan schemas based on business scenarios to ensure efficient indexing.
- Optimize individual SQL statements.
SQL R&D specifications serve as the foundation for ensuring the quality and stability of SQL performance. SQL issues identified during the execution phase can have a significant impact on the business and may be challenging to address with common SQL emergency plans. Therefore, clear SQL R&D specifications help rectify basic SQL errors in code and avoid potential SQL risks during the R&D phase.
Schema design is an ongoing process of optimization that continues to evolve with changes in business scenarios. In a complex business system, various types of SQL queries with different load structures will evolve alongside changes in the business model, and continuous data writes at the underlying business level will also lead to ongoing changes in data distribution. We must continuously optimize schema design based on SQL loads to ensure the best practices for maintaining SQL performance aligned with business throughput. After all, the goal is not to guarantee optimal performance for every SQL statement, as it is unnecessary and often costly.
For an individual SQL statement, the key is to ensure that the SQL uses appropriate indexing, as reducing the number of data scans is the most effective optimization method. Additionally, from a business perspective, it is crucial to consider whether the SQL statement meets the business requirements and if there are more reasonable approaches. Simultaneously, we need to predict whether, after launching the SQL statement, there will be performance inflection points or potential risks due to data changes.
Crafting efficient SQL statements during the R&D phase ensures their stability to the highest extent and minimizes the generation of poor-quality SQL statements at the source.
Integration phase: Mitigate risks
During the integration phase, the most important task when submitting SQL statements to the production environment is SQL review. Just like code review before code submission, SQL review is essential in the integration phase. So, focus on the following points during SQL review:
- Check compliance with SQL R&D specifications.
- Evaluate SQL performance.
The first point requires you to scan SQL statements to check whether they comply with R&D specifications, which helps implement the specifications. The implementation of SQL R&D specifications not only reduces basic SQL errors and low-performance SQL scenarios, but also increases the readability of SQL statements, facilitates the understanding of SQL business scenarios, and assists in manual SQL troubleshooting.
To evaluate SQL performance, you need to identify potential SQL risks in advance. The main focuses are to check whether efficient indexing is applied, determine whether statement rewrite is required for optimization, and predict whether performance will degrade. If you need data support or want to learn more about business scenarios during SQL review, you can further evaluate the implementation method.
When we identify potential SQL risks during SQL review in the integration phase, we can effectively prevent risky SQL statements from going online and reduce poor-quality SQL statements in the process.
O&M phase: Perform adaptive optimization
SQL tuning in the O&M phase is generally the most common scenario we deal with. When SQL goes live, with small data volume and low concurrency, many SQL queries may not encounter performance issues. However, as the business grows and data accumulates, SQL execution may gradually become slower or even experience drastic performance degrade due to sudden changes in execution plans. During the O&M phase, SQL tuning tasks we need to perform include the following:
- Exception diagnostics, which identifies problematic SQL queries
- Root cause analysis, which locates SQL queries that cause sudden database crash
- SQL emergency response, which recovers databases based on root causes
- Daily governance, which continuously handles poor-quality SQL queries in databases
- Continuous optimization, which involves case archiving for later reference
If you are familiar with common SQL problems and experienced in SQL troubleshooting, you can quickly diagnose exceptions and analyze root causes. This facilitates emergency response and helps solve the business issue. The following topics in this chapter will introduce typical scenarios of SQL tuning.
Methods for SQL emergency response can be roughly classified into two types: execution plan intervention and SQL throttling. If an SQL statement has a correct execution plan or an appropriate index, you can use an outline to bind the execution plan or index to the statement in OceanBase Database for quick response to the emergency. If a new index is required to handle the emergency, the process takes a long time, because index creation is time-consuming. If the SQL emergency is not caused by the execution plan, it is often a problem related to SQL capacity. In this case, the effective solution is SQL throttling. However, SQL throttling is lossy to business and undermines SQL concurrency.
Daily governance and continuous optimization continuously promote the optimization of low-performance SQL statements during database O&M, curbing database crash risks caused by bad SQL statements.