Overview
SQL R&D specifications are the foundation of SQL quality and an important means to ensure stability of SQL performance. SQL issues identified in the SQL execution phase may have already affected the business and are difficult to solve with common SQL emergency plans.
Common methods for SQL emergency response include SQL throttling and SQL execution plan intervention.
SQL throttling: This method limits the concurrency of SQL execution, which affects the business of throttled SQL queries. You can use this method to restrain abnormal SQL queries to recover the entire database.
SQL execution plan intervention: If the optimal index was not used in the SQL execution phase, you can bind an SQL execution plan or create a more appropriate index. An SQL execution plan takes effect immediately after the binding, but not all SQL queries have a more appropriate index. For example, a full table scan SQL query cannot use any index. Besides, index creation is time-consuming.
Clear SQL R&D specifications help solve elementary SQL errors in code and avoid potential SQL risks in the R&D phase.
SQL specifications cover two dimensions:
Quality: Specifications in this dimension affect SQL execution performance once violated. These specifications aim to reduce elementary SQL errors and low-performance SQL statements, and are usually universal and general, such as the specification that forbids SQL statements involving full table scans.
Style: Specifications in this dimension affect the readability of SQL statements once violated. These specifications aim to facilitate the understanding of SQL business scenarios and assist in manual troubleshooting of SQL issues, and are usually related to business scenarios, such as the naming specifications for tables, columns, and indexes.
SQL specifications are classified into two levels:
Mandatory: Specifications at this level must be followed. Violation to such a specification, for example, the one that forbids SQL statements involving full table scans, leads to a great impact.
Recommended: Specifications at this level are recommended for reference. Violation to such a specification, for example, the one that does not recommend multi-table update, may lead to an impact, even if it is required by business.
[Mandatory] [Quality] Do not use SELECT *
Description
If SELECT * is directly used in code, an error may be reported at the code level after new columns are added to the table.
Examples of incorrect code
Get all data by using SELECT * and then get phone_num.
<select id="getPhoneNum" parameterType="int" resultType="com.shadow.foretaste.entity.UserInfo">
select * from user_info where id = #{id}
</select>
Examples of correct code
Specify the column name phone_num to directly get the required data.
<select id="getPhoneNum" parameterType="int" resultType="String">
select phone_num from user_info where id = #{id}
</select>
[Mandatory] [Quality] Do not use SQL statements involving full table scans
Description
Full table scans entail great performance risks and can easily cause a database crash. Do not use SQL statements involving full table scans in code.
Notice
Full scans are allowed for configuration tables with no more than hundreds or thousands of rows. Configurations are periodically cached and loaded to those tables or read from the tables.
Examples of incorrect code
Example 1: SQL query without a WHERE condition
select 1 from a
Example 2: SQL query with a non-equality filter
select 1 from a where b != ?
select 1 from a where b <> ?
select 1 from a where b not like ?
select 1 from a where b not in ?
select 1 from a where not exists ()
Example 3: SQL query with a left or full fuzzy match
select 1 from a where b like %a
select 1 from a where b like %a%
[Mandatory] [Quality] No functions or operators on a filter column and no column type conversion
Description
For a WHERE filter condition in an SQL query, do not use functions or operators on the filter column, because this will disallow the use of the index on the column.
Do not specify the filter column of a type other than its defined type, because this will cause the database to perform implicit type conversion and therefore cannot use the index on the column.
Use a function
# Incorrect
select 1 from a where substr(name,1,3) = '123'
# Correct
select 1 from a where name like '123%'
# Incorrect
select 1 from a where data_format(create_time,'%b %d %Y %h:%i %p') = '2022-12-06 10:30:00'
# Correct
select 1 from a where create_time = str_to_date('2022-12-06 10:30:00')
Use an operator
## Incorrect
select 1 from a where a*2 = 10
## Correct
select 1 from a where a = 10/2
Type conversion
# Incorrect, because the b column is defined as a varchar column
select 1 from a where b = 123
# Correct
select 1 from a where b = '123'
[Mandatory] [Quality] Do not use OR to combine predicates
Description
When two predicates are combined by OR, indexes on the columns in predicates cannot be used and a full table scan will be performed.
Examples
# Incorrect, with predicates on the same column combined by OR
select 1 from a where b = 123 or b = 456
# Correct
select 1 from a where b in (123, 456)
# Incorrect, with predicates on different columns combined by OR
select 1 from a where b = 123 or c = 456
# Correct
select 1 from a where b = 123
union all
select 1 from a where c = 456
[Mandatory/Recommended] [Quality] Data update specifications
Description
Specifications for data update are as follows:
[Mandatory] Batch data update
For large-scale data update, use a
selectquery to find the primary keys of the rows to be updated, and then split them into batches for update.[Recommended] Multi-table update
Multi-table update or delete is not recommended. You can split the operation into multiple SQL statements to perform single-table update or delete by transaction.
[Recommended] Multi-row insert
It is recommended that batch insert operations be merged into a single insert operation that inserts multiple values. A single insert operation can affect up to 200 rows, as suggested.
[Recommended] [Quality] IN predicate
Description
Evaluate the number of set elements after IN. The recommended maximum is 200.
[Mandatory] [Quality] Separately handle nullable columns
Description
- Unique index: If a unique index contains a nullable column, NULLs in the column are not restricted by the
UNIQUEconstraint. WHEREcondition: NULLs are out of equality comparisons in aWHEREcondition, such as=,! =,<>,IN, andLIKE. When a value isNULL, theWHEREcondition is prone to generate unexpected results.- Aggregate functions: Some aggregate functions, such as
COUNT(),DISTINCT(), andSUM(), skip NULLs and generate unexpected results.
Notice
For a unique index, this specification is a recommendation. You need to decide whether to follow it based actual business requirements. In common scenarios, we recommend that you set a NOT NULL constraint on all columns in the unique index to avoid the case of an invalid unique index.
mysql> select 1 = 1,1 = null,1 is null, null is null ;
+-------+----------+-----------+--------------+
| 1 = 1 | 1 = null | 1 is null | null is null |
+-------+----------+-----------+--------------+
| 1 | NULL | 0 | 1 |
+-------+----------+-----------+--------------+
Examples of incorrect code
The table used in the examples is defined as follows:
CREATE TABLE `test_null` (
`id` int primary key,
`a` int NOT NULL,
`b` int DEFAULT NULL
UNIQUE KEY `idx_a_b` (`a`, `b`)
)
Example 1: NULLs in the unique index
In rows where
idis 1 and 2 respectively, the values of (a,b) are both (1, NULL). However, the database does not consider that NULL = NULL and allows coexistence of the two rows although there is aUNIQUEconstraint.In rows where
idis 1 and 3 respectively, the values of (a,b) are (1, NULL) and (1, 2), which conforms to the UNIQUE constraint.+------+------+------+ | id | a | b | +------+------+------+ | 1 | 1 | NULL | | 2 | 1 | NULL | | 3 | 1 | 2 | ----------------------Example 2: NULLs in a column in the
WHEREconditionThe filter condition
a!=1outputs only the row whereidis 3. The row whereidis 2 is omitted, becauseNULL!=1is considered NULL instead of a Boolean valuetrueorfalse.The filter condition
a not in (1,3)does not output the row whereidis 2, becauseNULL not in (1,3)is considered NULL instead of a Boolean valuetrueorfalse.mysql> select * from test_null; +------+------+ | id | a | +------+------+ | 1 | 1 | | 2 | NULL | | 3 | 3 | +------+------+ mysql> select * from test_null where a != 1; +------+------+ | id | a | +------+------+ | 3 | 3 | +------+------+ mysql> select * from test_null where a not in (1,3); Empty setExample 3: Aggregate function on a column with NULLs
As shown in the following figure, the column
ahas a NULL.select count(*)returns three rows, whileselect count(a)returns two rows.select * from test_null; +------+------+ | id | a | +------+------+ | 1 | 1 | | 2 | NULL | | 3 | 3 | +------+------+ 3 rows in set (0.01 sec) mysql> select count(*) from test_null; +----------+ | count(*) | +----------+ | 3 | +----------+ 1 row in set (0.02 sec) mysql> select count(a) from test_null; +----------+ | count(a) | +----------+ | 2 | +----------+ 1 row in set (0.01 sec)
[Recommended] [Quality] Do not use a non-unique column for ORDER BY
Description
If ORDER BY is followed by a non-unique column, MySQL sorts the returned result also by primary key, while OceanBase Database does not.
Do not use non-unique columns for ORDER BY in code, which may affect the business because the returned result is undefined.
Violation to this specification may lead to the following problems:
- When a primary/standby database switchover occurs, different results may be returned for the same SQL statement when it is executed on different OBServer nodes with different execution plans.
- If the column following
ORDER BYhas the same value in multiple rows, a different result is returned each time an SQL statement is executed.
We recommend that you do not use non-unique columns for ORDER BY. If it cannot be avoided, add a unique column after the sort column.
[Mandatory/Recommended] [Quality] Do not use unconditional LIMIT 1 on a buffer table; bind a hint
Description
A queuing table, also known as a buffer table, can be used in a similar way to buffer in actual business scenarios. That is, a large proportion of data in such a table is updated, added, or deleted. Specifications for a buffer table are as follows:
- [Mandatory] Do not use unconditional LIMIT 1 on a buffer table.
- [Recommended] Bind a hint to prevent exceptions caused by execution plan jitters.
Examples of incorrect code
Example 1: No hint is bound
# The correct hint is not bound. <!-- mapped statement for IbatisAccountLogCacheDAO.findNeedCacheBackAccountNo --> <select id="MS-ACCOUNT-LOG-CACHE-FIND-NEED-CACHE-BACK-ACCOUNT-NO-TABLE-INDEX-OB" resultMap="RM-CACHE-BACK-ACCOUNT-CNT-OB"> <![CDATA[ select TRANS_ACCOUNT as account_no, count(1) as cnt from IW_ACCOUNT_LOG_CACHE t where (TASK_NO = (- 1)) group by trans_account ]]> </select>Example 2: Unconditional LIMIT 1 is used
select * from iw_account_log_cace limit 1;
Examples of correct code
Example 1: Bind a hint forcibly
idx_taskno_acount_dt_gorder_logidis the top-priority index fortaskNo.# Bind an index correctly. <!-- mapped statement for IbatisAccountLogCacheDAO.findNeedCacheBackAccountNo --> <select id="MS-ACCOUNT-LOG-CACHE-FIND-NEED-CACHE-BACK-ACCOUNT-NO-TABLE-INDEX-OB" resultMap="RM-CACHE-BACK-ACCOUNT-CNT-OB"> <![CDATA[ select /*+ INDEX(t idx_taskno_acount_dt_gorder_logid)*/ TRANS_ACCOUNT as account_no, count(1) as cnt from IW_ACCOUNT_LOG_CACHE t where (TASK_NO = (- 1)) group by trans_account ]]> </select>Example 2: Use LIMIT 1 with a
WHEREconditionselect * from iw_account_log_cace where task_no = -1 limit 1;