Disadvantages
As mentioned in the previous topic, an execution plan is generated for a parameterized SQL statement and then saved to the plan cache for reuse by subsequent SQL statements. Although this method can save the plan generation time, it implies a bad case in data skew scenarios. Here is an example:
The t1 table is defined as follows. Assume that the c1 column has only two unique values: value 0 that accounts for 1% of all data and value 1 that accounts for 99% of all data. The c2 column has 10 unique values, from 1 to 10, each of which accounts for 10% of total data. For the SELECT * FROM t1 WHERE c1 = ? AND c2 = ? SQL statement, if the optimizer receives the c1 = 0 AND c2 = 1 condition when generating the plan, the optimizer uses the idx1 index to find the 1% of data that meets the c1 = 0 condition from the index, then accesses the table by index primary key, removes the data that does not meet the c2 = 1 condition, and finally returns the result. In this case, only 1% of the data is scanned and accessed by index primary key to complete the query. However, if the OBServer node receives another SQL statement with the c1 = 1 AND c2 = 1 condition later, by reusing the previously generated plan, the optimizer finds the 99% of data that meets the c1 = 1 condition from the idx1 index, removes the data that does not meet the c2 = 1 condition after table access by index primary key, and finally returns the result. In this case, the 99% of data is scanned and accessed by index primary key to complete the query. Obviously, the idx2 index is a better choice for this SQL statement. This is because no matter how values in the condition change, by using the idx2 index, the optimizer only needs to scan the 10% of data and access the 10% of data in the table by index primary key.
create table t1 (
c1 int,
c2 int,
c3 int,
key idx1 (c1),
key idx2 (c2)
);
In this case, you can use an outline to bind the execution plan to the SQL statement to force the use of the idx2 index. Of course, this SQL statement has better optimization solutions, which are, however, not the focus of this topic, and therefore are omitted here.
Countermeasures (plan binding)
Plan binding is a mechanism provided by OceanBase Database that allows users to control the execution plan by creating an outline in the database without modifying the SQL statement. Currently, OceanBase Database provides two ways to create an outline, as shown below.
/* Create an outline based on SQL_TEXT. */
CREATE [OR REPLACE] OUTLINE outline_name ON stmt [ TO target_stmt ]
/* Create an outline based on SQL_ID. */
CREATE [OR REPLACE] OUTLINE outline_name ON sql_id USING HINT hint;
For the previous SQL statement, we can create an outline to force OceanBase Database to use the idx2 index when executing the SQL statement. Note that OceanBase Database does not ignore spaces or letter case when comparing two parameterized SQL statements. Therefore, the SQL statement in the CREATE OUTLINE statement for creating an outline based on SQL_TEXT is prone to differ from the actual SQL statement, resulting in an outline binding failure. We recommend that you create an outline based on SQL_ID.
/* Create an outline based on SQL_TEXT. */
CREATE OUTLINE otl_idx2 ON SELECT/*+ index(T1 idx2)*/ * FROM t1 WHERE c1 = 1 AND c2 = 1;
/* Create an outline based on SQL_ID. */
CREATE OUTLINE otl_idx2 ON '27F4FC32407331073407EAA24F5E5FA4'
USING HINT /*+ index(T1 idx2)*/ ;