SQL specifications|V4.3.0| docs|Distributed Database

SQL specifications

Last Updated：2024-04-19 08:42:50 Updated

Overview

SQL R&D specifications are the foundation of SQL quality and an important means to ensure stability of SQL performance. SQL issues identified in the SQL execution phase may have already affected the business and are difficult to solve with common SQL emergency plans.

Common methods for SQL emergency response include SQL throttling and SQL execution plan intervention.

SQL throttling: This method limits the concurrency of SQL execution, which affects the business of throttled SQL queries. You can use this method to restrain abnormal SQL queries to recover the entire database.
SQL execution plan intervention: If the optimal index was not used in the SQL execution phase, you can bind an SQL execution plan or create a more appropriate index. An SQL execution plan takes effect immediately after the binding, but not all SQL queries have a more appropriate index. For example, a full table scan SQL query cannot use any index. Besides, index creation is time-consuming.

Clear SQL R&D specifications help solve elementary SQL errors in code and avoid potential SQL risks in the R&D phase.

SQL specifications cover two dimensions:

Quality: Specifications in this dimension affect SQL execution performance once violated. These specifications aim to reduce elementary SQL errors and low-performance SQL statements, and are usually universal and general, such as the specification that forbids SQL statements involving full table scans.
Style: Specifications in this dimension affect the readability of SQL statements once violated. These specifications aim to facilitate the understanding of SQL business scenarios and assist in manual troubleshooting of SQL issues, and are usually related to business scenarios, such as the naming specifications for tables, columns, and indexes.

SQL specifications are classified into two levels:

Mandatory: Specifications at this level must be followed. Violation to such a specification, for example, the one that forbids SQL statements involving full table scans, leads to a great impact.
Recommended: Specifications at this level are recommended for reference. Violation to such a specification, for example, the one that does not recommend multi-table update, may lead to an impact, even if it is required by business.

[Mandatory] [Quality] Do not use SELECT *

Description

If SELECT * is directly used in code, an error may be reported at the code level after new columns are added to the table.

Examples of incorrect code

Get all data by using SELECT * and then get phone_num.

<select id="getPhoneNum" parameterType="int" resultType="com.shadow.foretaste.entity.UserInfo">
        select * from user_info where id = #{id}
</select>

Examples of correct code

Specify the column name phone_num to directly get the required data.

<select id="getPhoneNum" parameterType="int"  resultType="String">
    select phone_num  from user_info where id = #{id}
</select>

[Mandatory] [Quality] Do not use SQL statements involving full table scans

Description

Full table scans entail great performance risks and can easily cause a database crash. Do not use SQL statements involving full table scans in code.

Notice

Full scans are allowed for configuration tables with no more than hundreds or thousands of rows. Configurations are periodically cached and loaded to those tables or read from the tables.

Examples of incorrect code

Example 1: SQL query without a WHERE condition

select 1 from a

Example 2: SQL query with a non-equality filter

select 1 from a where b != ?
select 1 from a where b <> ?
select 1 from a where b not like ?
select 1 from a where b not in ?
select 1 from a where not exists ()

Example 3: SQL query with a left or full fuzzy match

select 1 from a where b like %a
select 1 from a where b like %a%

[Mandatory] [Quality] No functions or operators on a filter column and no column type conversion

Description

For a WHERE filter condition in an SQL query, do not use functions or operators on the filter column, because this will disallow the use of the index on the column.

Do not specify the filter column of a type other than its defined type, because this will cause the database to perform implicit type conversion and therefore cannot use the index on the column.

Use a function

# Incorrect
select 1 from a where substr(name,1,3) = '123'

# Correct
select 1 from a where name like '123%'

# Incorrect
select 1 from a where data_format(create_time,'%b %d %Y %h:%i %p') = '2022-12-06 10:30:00'

# Correct
select 1 from a where create_time = str_to_date('2022-12-06 10:30:00')

Use an operator

## Incorrect
select 1 from a where a*2 = 10

## Correct
select 1 from a where a = 10/2

Type conversion

# Incorrect, because the b column is defined as a varchar column
select 1 from a where b = 123

# Correct
select 1 from a where b = '123'

[Mandatory] [Quality] Do not use OR to combine predicates

Description

When two predicates are combined by OR, indexes on the columns in predicates cannot be used and a full table scan will be performed.

Examples

# Incorrect, with predicates on the same column combined by OR
select 1 from a where b = 123 or b = 456

# Correct
select 1 from a where b in (123, 456)

# Incorrect, with predicates on different columns combined by OR
select 1 from a where b = 123 or c = 456

# Correct
select 1 from a where b = 123
union all
select 1 from a where c = 456

[Mandatory/Recommended] [Quality] Data update specifications

Description

Specifications for data update are as follows:

[Mandatory] Batch data update

For large-scale data update, use a select query to find the primary keys of the rows to be updated, and then split them into batches for update.
[Recommended] Multi-table update

Multi-table update or delete is not recommended. You can split the operation into multiple SQL statements to perform single-table update or delete by transaction.
[Recommended] Multi-row insert

It is recommended that batch insert operations be merged into a single insert operation that inserts multiple values. A single insert operation can affect up to 200 rows, as suggested.

[Recommended] [Quality] IN predicate

Description

Evaluate the number of set elements after IN. The recommended maximum is 200.

[Mandatory] [Quality] Separately handle nullable columns

Description

Unique index: If a unique index contains a nullable column, NULLs in the column are not restricted by the UNIQUE constraint.
WHERE condition: NULLs are out of equality comparisons in a WHERE condition, such as=, ! =, <>, IN, and LIKE. When a value is NULL, the WHERE condition is prone to generate unexpected results.
Aggregate functions: Some aggregate functions, such as COUNT(), DISTINCT(), and SUM(), skip NULLs and generate unexpected results.

Notice

For a unique index, this specification is a recommendation. You need to decide whether to follow it based actual business requirements. In common scenarios, we recommend that you set a NOT NULL constraint on all columns in the unique index to avoid the case of an invalid unique index.

mysql> select 1 = 1,1 = null,1 is null, null is null ;
+-------+----------+-----------+--------------+
| 1 = 1 | 1 = null | 1 is null | null is null |
+-------+----------+-----------+--------------+
|     1 |     NULL |         0 |            1 |
+-------+----------+-----------+--------------+

Examples of incorrect code

The table used in the examples is defined as follows:

CREATE TABLE `test_null` (
  `id` int primary key,
  `a` int NOT NULL,
  `b` int DEFAULT NULL
  UNIQUE KEY `idx_a_b` (`a`, `b`)
)

Example 1: NULLs in the unique index

In rows where id is 1 and 2 respectively, the values of (a, b) are both (1, NULL). However, the database does not consider that NULL = NULL and allows coexistence of the two rows although there is a UNIQUE constraint.

In rows where id is 1 and 3 respectively, the values of (a, b) are (1, NULL) and (1, 2), which conforms to the UNIQUE constraint.
```
+------+------+------+
| id   | a    | b    |
+------+------+------+
| 1    | 1    | NULL |
| 2    | 1    | NULL |
| 3    | 1    | 2    |
----------------------
```
Example 2: NULLs in a column in the WHERE condition

The filter condition a!=1 outputs only the row where id is 3. The row where id is 2 is omitted, because NULL!=1 is considered NULL instead of a Boolean value true or false.

The filter condition a not in (1,3) does not output the row where id is 2, because NULL not in (1,3) is considered NULL instead of a Boolean value true or false.
```
mysql> select * from test_null;
+------+------+
| id   | a    |
+------+------+
|    1 |    1 |
|    2 | NULL |
|    3 |    3 |
+------+------+

mysql> select * from test_null where a != 1;
+------+------+
| id   | a    |
+------+------+
|    3 |    3 |
+------+------+

mysql> select * from test_null where a not in (1,3);
Empty set
```

Example 3: Aggregate function on a column with NULLs

As shown in the following figure, the column a has a NULL. select count(*) returns three rows, while select count(a) returns two rows.

select * from test_null;
+------+------+
| id   | a    |
+------+------+
|    1 |    1 |
|    2 | NULL |
|    3 |    3 |
+------+------+
3 rows in set (0.01 sec)

mysql> select count(*) from test_null;
+----------+
| count(*) |
+----------+
|        3 |
+----------+
1 row in set (0.02 sec)

mysql> select count(a) from test_null;
+----------+
| count(a) |
+----------+
|        2 |
+----------+
1 row in set (0.01 sec)

[Recommended] [Quality] Do not use a non-unique column for ORDER BY

Description

If ORDER BY is followed by a non-unique column, MySQL sorts the returned result also by primary key, while OceanBase Database does not.

Do not use non-unique columns for ORDER BY in code, which may affect the business because the returned result is undefined.

Violation to this specification may lead to the following problems:

When a primary/standby database switchover occurs, different results may be returned for the same SQL statement when it is executed on different OBServer nodes with different execution plans.
If the column following ORDER BY has the same value in multiple rows, a different result is returned each time an SQL statement is executed.

We recommend that you do not use non-unique columns for ORDER BY. If it cannot be avoided, add a unique column after the sort column.

[Mandatory/Recommended] [Quality] Do not use unconditional LIMIT 1 on a buffer table; bind a hint

Description

A queuing table, also known as a buffer table, can be used in a similar way to buffer in actual business scenarios. That is, a large proportion of data in such a table is updated, added, or deleted. Specifications for a buffer table are as follows:

[Mandatory] Do not use unconditional LIMIT 1 on a buffer table.
[Recommended] Bind a hint to prevent exceptions caused by execution plan jitters.

Examples of incorrect code

Example 1: No hint is bound

# The correct hint is not bound.
<!-- mapped statement for IbatisAccountLogCacheDAO.findNeedCacheBackAccountNo -->
<select id="MS-ACCOUNT-LOG-CACHE-FIND-NEED-CACHE-BACK-ACCOUNT-NO-TABLE-INDEX-OB" resultMap="RM-CACHE-BACK-ACCOUNT-CNT-OB">
<![CDATA[
    select TRANS_ACCOUNT as account_no, count(1) as cnt
    from IW_ACCOUNT_LOG_CACHE t where (TASK_NO = (- 1)) group by trans_account
]]>
</select>

Example 2: Unconditional LIMIT 1 is used

select * from iw_account_log_cace limit 1;

Examples of correct code

Example 1: Bind a hint forcibly

idx_taskno_acount_dt_gorder_logid is the top-priority index for taskNo.

# Bind an index correctly.
<!-- mapped statement for IbatisAccountLogCacheDAO.findNeedCacheBackAccountNo -->
<select id="MS-ACCOUNT-LOG-CACHE-FIND-NEED-CACHE-BACK-ACCOUNT-NO-TABLE-INDEX-OB" resultMap="RM-CACHE-BACK-ACCOUNT-CNT-OB">
<![CDATA[
    select /*+ INDEX(t idx_taskno_acount_dt_gorder_logid)*/ TRANS_ACCOUNT as account_no, count(1) as cnt
    from IW_ACCOUNT_LOG_CACHE t where (TASK_NO = (- 1)) group by trans_account
]]>
</select>

Example 2: Use LIMIT 1 with a WHERE condition

select * from iw_account_log_cace where task_no = -1 limit 1;