Parameterization refers to a process of converting constants in an SQL query into variables.
An SQL statement may use different parameters in each execution. These parameters are parameterized to generate a parameter-irrelevant SQL string. This SQL string is used as a plan caching key to retrieve an execution plan from the plan cache. This design ensures that SQL statements with different parameters can share the same plan.
Typically, the syntax tree in a conventional database is parameterized to generate keys for retrieving plans from the plan cache. Different from these databases, OceanBase Database uses a lexical parameterization process to directly parameterize text strings as the keys of the plan cache. That is why this process is called fast parameterization.
OceanBase Database also supports Adaptive Cursor Sharing to enable the selection of plans with different parameters.
Example
obclient> SELECT * FROM T1 WHERE c1 = 5 AND c2 ='oceanbase';
The following syntax shows the parameterization result of the SQL query in the preceding example. Constants 5 and oceanbase are parameterized into variables @1 and @2:
obclient> SELECT * FROM T1 WHERE c1 = @1 AND c2 = @2;
However, not all constants can be parameterized. For example, constants after ORDER BY cannot be parameterized because they indicate sorting by a specific column in the projected columns in the SELECT statement.
In the following example, Table t1 contains columns c1 and c2. The results of the SQL query are sorted by column c1, which is the primary key column. Sorting is not necessary because column c1 is ordered.
obclient> CREATE TABLE t1(c1 INT PRIMARY KEY,c2 INT);
Query OK, 0 rows affected
obclient> INSERT INTO t1 VALUES (1,2);
Query OK, 1 row affected
obclient> INSERT INTO t1 VALUES (2,1);
Query OK, 1 row affected
obclient> INSERT INTO t1 VALUES (3,1);
Query OK, 1 row affected
obclient> SELECT c1, c2 FROM t1 ORDER BY 1;
+----+------+
| C1 | C2 |
+----+------+
| 1 | 2 |
| 2 | 1 |
| 3 | 1 |
+----+------+
3 rows in set
obclient> EXPLAIN SELECT c1, c2 FROM t1 ORDER BY 1\G
*************************** 1. row ***************************
Query Plan:
| ===================================
|ID|OPERATOR |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1 |1000 |1381|
===================================
Outputs & filters:
-------------------------------------
0 - output([T1.C1], [T1.C2]), filter(nil),
access([T1.C1], [T1.C2]), partitions(p0)
However, if you execute the following statement:
obclient> SELECT c1, c2 FROM t1 ORDER BY 2;
+----+------+
| C1 | C2 |
+----+------+
| 2 | 1 |
| 3 | 1 |
| 1 | 2 |
+----+------+
3 rows in set
The results need to be sorted by column c2. So, a sorting operation is required. The execution plan is shown in the following example:
obclient> EXPLAIN SELECT c1, c2 FROM t1 ORDER BY 2\G
*************************** 1. row ***************************
Query Plan:
| ====================================
|ID|OPERATOR |NAME|EST. ROWS|COST|
------------------------------------
|0 |SORT | |1000 |1886|
|1 | TABLE SCAN|t1 |1000 |1381|
====================================
Outputs & filters:
-------------------------------------
0 - output([T1.C1], [T1.C2]), filter(nil), sort_keys([T1.C2, ASC])
1 - output([T1.C1], [T1.C2]), filter(nil),
access([T1.C1], [T1.C2]), partitions(p0)
Therefore, if constants after ORDER BY are parameterized, different values of ORDER BY are assimilated to have the same parameterized SQL statements, resulting in the retrieval of wrong plans. In addition, the following constants are constraints for parameterization and cannot be parameterized:
All constants after
ORDER BY, such asORDER BY 1,2;All constants after
GROUP BY, such asGROUP BY 1,2;Constants after
LIMIT, such asLIMIT 5;A string constant in a format string, such as
%dinSELECT DATE_FORMAT('2006-06-00', '%d');Constants, among the input parameters of a function, that affect the result of the function and eventually the execution plan, such as
NUMBER(2,1)inCAST(999.88 as NUMBER(2,1))or 1 and 2 inSUBSTR('abcd', 1, 2)Constants, among input parameters of a function, that carry implicit information and eventually affect the execution plan, such as '2015-11-13 10:20:19.012' in
SELECT UNIX_TIMESTAMP('2015-11-13 10:20:19.012');It not only specifies input timestamp but also implies that the precision of the function is set to milliseconds.
To solve the possible mismatch problems, the system parameterizes SQL queries by analyzing their syntax trees in the process of generating the execution plan through hard parsing and obtaining the corresponding distinctive information. Assume that a statement conveys this information: "The third item of the parameter array for fast parameterization must be 3". This information is considered a "constraint".
For query Q1 in the following example:
Q1:
obclient> SELECT c1, c2, c3 FROM t1
WHERE c1 = 1 AND c2 LIKE 'senior%' ORDER BY 3;
After lexical analysis of the syntax tree, the SQL statement is parameterized into the following string:
obclient> SELECT c1, c2, c3 FROM t1
WHERE c1 = @1 AND c2 LIKE @2 ORDER BY @3 ;
/*The parameterized array is {1, 'senior%', 3}*/
If constants after ORDER BY are different, they cannot share the same execution plan. Therefore, another result is generated by parameterizing the SQL statement:
obclient> SELECT c1, c2, c3 FROM t1
WHERE c1 = @1 AND c2 LIKE @2 ORDER BY 3 ;
/*The parameterized array is {1, 'senior'}.
The constraint is "The third item of the fast parameterization parameter array must be 3"*/
The new parameterized string, constraint, and execution plan for query Q1 are stored in the plan cache.
When you execute the following Q2 query:
Q2:
obclient> SELECT c1, c2, c3 FROM t1
WHERE c1 = 1 AND c2 LIKE 'senior%' ORDER BY 2;
The result after fast parameterization is:
obclient> SELECT c1, c2, c3 FROM t1
WHERE c1 = @1 and c2 like @2 ORDER BY @3;
/*The parameterized array is {1, 'senior%', 2}*/
The parameterized string is the same as that of the SQL statement for Q1. However, their constraints do not match. Therefore, query Q2 cannot match the plan for Q1. In this case, the optimizer generates a new execution plan and constraint "The third item of the fast parameterization parameter array must be 2" for Q2 through hard parsing and adds the new plan and constraint to the cache. The next time Q1 and Q2 are executed, the correct execution plans are retrieved from the cache.
Based on fast parameterization, the plan caching of OceanBase Database has the following benefits:
Simplified parsing process.
When you search for HASH Map, you can perform the HASH and
MEMCMPoperations on the text string instead of the HASH and COMPARISON operations on the parameterized syntax tree. This approach provides higher execution efficiency.