Rule-based query rewrite|V4.3.5|OceanBase Database| docs|Distributed Database

Rule-based query rewrite includes subquery-related rewrite, outer join elimination, simplified condition rewrite, and non-select project join (non-SPJ) rewrite.

The optimizer usually executes subqueries in a nested way, which means that each time a parent query generates a row of data, the optimizer executes a subquery. This method requires multiple subqueries to be executed, resulting in low execution efficiency. To optimize the execution, the nested operation is rewritten to a join, which greatly improves the execution efficiency because of the following benefits:

It avoids the repeated execution of the same subquery.
The optimizer selects a better join order and better join algorithms based on the statistics.
After the join and filter conditions of the subquery are rewritten as the conditions of the parent query, more optimization options are available to the optimizer, such as conditional pushdown.

Frequently applied methods to rewrite a subquery include view merge, subquery expansion, and rewriting ANY or ALL by using MAX or MIN.

View merge

View merge refers to the process of merging a subquery that represents a view into a query that contains the view. The merge provides the optimizer with more options of join order types, access paths, and rewrites, making the selection of a better execution plan much easier.

OceanBase Database supports the merge of SPJ views. In the following example, Q1 is rewritten to Q2:

obclient>CREATE TABLE t1 (c1 INT, c2 INT);
Query OK, 0 rows affected

obclient>CREATE TABLE t2 (c1 INT PRIMARY KEY, c2 INT);
Query OK, 0 rows affected

obclient>CREATE TABLE t3 (c1 INT PRIMARY KEY, c2 INT);
Query OK, 0 rows affected

Q1:
obclient>SELECT t1.c1, v.c1
                FROM t1, (SELECT t2.c1, t3.c2
                FROM t2, t3
                WHERE t2.c1 = t3.c1) v
                WHERE t1.c2 = v.c2;
<==>
Q2:
obclient>SELECT t1.c1, t2.c1
                FROM t1, t2, t3
                WHERE t2.c1 = t3.c1 AND t1.c2 = t3.c2;

Before the rewrite, the types of join orders available for query Q1 are as follows:

t1, v(t2,t3)
t1, v(t3,t2)
v(t2,t3), t1
v(t3,t2), t1

After the rewrite, the following types of join orders are available:

t1, t2, t3
t1, t3, t2
t2, t1, t3
t2, t3, t1
t3, t1, t2
t3, t2, t1

This indicates that view merge increases options for the join order. For complex queries, view merge provides higher flexibility in selecting access paths and rewrite methods, allowing the optimizer to generate a better plan.

Subquery unnesting

Subquery unnesting promotes the subquery in a WHERE condition to the parent query and unnests it to produce a join condition in parallel with the parent query. The rewrite deconstructs the subquery and changes the outer parent query to a multi-table join.

The benefit is that the optimizer takes into account tables in the subquery when it selects access paths, join methods, and sorting methods, thus obtaining a better execution plan. Relevant subquery unnesting expressions include NOT IN, IN, NOT EXIST, EXIST, ANY, and ALL.

You can unnest a subquery by using the following methods:

Rewrite conditions so that the generated join statement is enabled to return the same rows as the original statement.

Unnest a subquery to produce a semi join or anti join.

In the following example, t2.c2 is not unique and the statement is rewritten to a semi join. The new execution plan is as follows:

obclient> CREATE TABLE t1 (c1 INT, c2 INT);
Query OK, 0 rows affected

obclient> CREATE TABLE t2 (c1 INT PRIMARY KEY, c2 INT);
Query OK, 0 rows affected

obclient> EXPLAIN SELECT * FROM t1 WHERE t1.c1 IN (SELECT t2.c2 FROM t2);

Query Plan:
=======================================
|ID|OPERATOR      |NAME|EST. ROWS|COST|
---------------------------------------
|0 |HASH SEMI JOIN|    |495      |3931|
|1 | TABLE SCAN   |t1  |1000     |499 |
|2 | TABLE SCAN   |t2  |1000     |433 |
=======================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2]), filter(nil),
      equal_conds([t1.c1 = t2.c2]), other_conds(nil)
  1 - output([t1.c1], [t1.c2]), filter(nil),
      access([t1.c1], [t1.c2]), partitions(p0)
  2 - output([t2.c2]), filter(nil),
      access([t2.c2]), partitions(p0)

After you change the operator before a query to NOT IN, you can rewrite the query to an anti join. The new execution plan is as follows:

obclient> EXPLAIN SELECT * FROM t1 WHERE t1.c1 NOT IN (SELECT t2.c2 FROM t2);

Query Plan:
================================================
|ID|OPERATOR             |NAME|EST. ROWS|COST  |
------------------------------------------------
|0 |NESTED-LOOP ANTI JOIN|    |0        |520245|
|1 | TABLE SCAN          |t1  |1000     |499   |
|2 | TABLE SCAN          |t2  |22       |517   |
================================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2]), filter(nil),
      conds(nil), nl_params_([t1.c1], [(T_OP_IS, t1.c1, NULL, 0)])
  1 - output([t1.c1], [t1.c2], [(T_OP_IS, t1.c1, NULL, 0)]), filter(nil),
      access([t1.c1], [t1.c2]), partitions(p0)
  2 - output([t2.c2]), filter([(T_OP_OR, ? = t2.c2, ?, (T_OP_IS, t2.c2, NULL, 0))]),
      access([t2.c2]), partitions(p0)

Unnest a subquery to produce an inner join.

For query Q1 in the preceding example, if t2.c2 is changed to t2.c1, which is the primary key, the output of the subquery is unique, and you can rewrite it to an inner join, as shown in the following example:

Q1:
obclient> SELECT * FROM t1 WHERE t1.c1 IN  (SELECT t2.c1 FROM t2);
<==>
Q2:
obclient> SELECT t1.* FROM t1, t2 WHERE t1.c1 = t2.c1;

The execution plan of query Q1 is then rewritten as follows:

obclient> EXPLAIN SELECT * FROM t1 WHERE t1.c1 IN (SELECT t2.c1 FROM t2);

Query Plan:
 ====================================
|ID|OPERATOR   |NAME|EST. ROWS|COST|
------------------------------------
|0 |HASH JOIN  |    |1980     |3725|
|1 | TABLE SCAN|t2  |1000     |411 |
|2 | TABLE SCAN|t1  |1000     |499 |
====================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2]), filter(nil),
      equal_conds([t1.c1 = t2.c1]), other_conds(nil)
  1 - output([t2.c1]), filter(nil),
      access([t2.c1]), partitions(p0)
  2 - output([t1.c1], [t1.c2]), filter(nil),
      access([t1.c1], [t1.c2]), partitions(p0)

You can rewrite subquery expressions, such as NOT IN, IN, NOT EXIST, EXIST, ANY, and ALL, in the same way.

Rewrite ANY or ALL by using MAX or MIN

For a subquery with an ANY or ALL expression, if the subquery does not contain a GROUP BY clause, aggregate function, or HAVING clause, the expression shown in the following example can be equivalently converted by the aggregate function MIN or MAX, where col_item is a separate, non-NULL column:

val > ALL(SELECT col_item ...)  <==> val > (SELECT MAX(col_item) ...);
val >= ALL(SELECT col_item ...) <==> val >= (SELECT MAX(col_item) ...);
val < ALL(SELECT col_item ...)  <==> val < (SELECT MIN(col_item) ...);
val <= ALL(SELECT col_item ...) <==> val <= (SELECT MIN(col_item) ...);
val > ANY(SELECT col_item ...)  <==> val > (SELECT MIN(col_item) ...);
val >= ANY(SELECT col_item ...) <==> val >= (SELECT MIN(col_item) ...);
val < ANY(SELECT col_item ...)  <==> val < (SELECT MAX(col_item) ...);
val <= ANY(SELECT col_item ...) <==> val <= (SELECT MAX(col_item) ...);

After the subquery is converted to contain MAX or MIN, you can further rewrite MAX or MIN to reduce the number of scanning operations on the right-side table. Here is an example:

obclient> SELECT c1 FROM t1 WHERE c1 > ANY(SELECT c1 FROM t2);
<==>
obclient> SELECT c1 FROM t1 WHERE c1 > (SELECT MIN(c1) FROM t2);

After the MAX/MIN rewrite, the primary key t2.c1 can be used to directly push LIMIT 1 down to TABLE SCAN to generate the MIN value. The execution plan is as follows:

obclient> EXPLAIN SELECT c1 FROM t1 WHERE c1 > ANY(SELECT c1 FROM t2);

Query Plan:
 ===================================================
|ID|OPERATOR        |NAME          |EST. ROWS|COST|
---------------------------------------------------
|0 |SUBPLAN FILTER  |              |1        |73  |
|1 | TABLE SCAN     |t1            |1        |37  |
|2 | SCALAR GROUP BY|              |1        |37  |
|3 |  SUBPLAN SCAN  |subquery_table|1        |37  |
|4 |   TABLE SCAN   |t2            |1        |36  |
===================================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1]), filter([t1.c1 > ANY(subquery(1))]),
      exec_params_(nil), onetime_exprs_(nil), init_plan_idxs_([1])
  1 - output([t1.c1]), filter(nil),
      access([t1.c1]), partitions(p0)
  2 - output([T_FUN_MIN(subquery_table.c1)]), filter(nil),
      group(nil), agg_func([T_FUN_MIN(subquery_table.c1)])
  3 - output([subquery_table.c1]), filter(nil),
      access([subquery_table.c1])
  4 - output([t2.c1]), filter(nil),
      access([t2.c1]), partitions(p0),
      limit(1), offset(nil)

Outer join elimination

Outer joins are classified into left outer joins, right outer joins, and full outer joins. The order of an outer join cannot be changed during the operation, the optimizer has limited options of join order. Outer join elimination helps convert an outer join into an inner join, providing the optimizer with more join path options.

However, the outer join elimination requires a "Reject-NULL" condition, which is contained in WHERE conditions, and exports FALSE when the right-side table generates a NULL value.

Here is an example:

obclient>SELECT t1.c1, t2.c2 FROM t1 LEFT JOIN t2 ON t1.c2 = t2.c2;

This is an outer join, whose output row t2.c2 may be NULL. If you add the condition t2.c2 > 5, data is filtered by this condition, and the output of t2.c1 cannot be NULL, so that the outer join can be converted into an inner join.

obclient> SELECT t1.c1, t2.c2 FROM t1 LEFT JOIN t2 ON t1.c2 = t2.c2 WHERE t2.c2 > 5;
<==>
obclient> SELECT t1.c1, t2.c2 FROM t1 INNER JOIN t2 ON t1.c2 = t2.c2
            WHERE t2.c2 > 5;

Condition simplification

HAVING condition elimination

If no aggregate or GROUP BY operation exists in the query, the HAVING condition can be merged into the WHERE conditions, with the HAVING condition eliminated. This way, the HAVING condition can be managed with other conditions in WHERE for further optimization.

obclient>SELECT * FROM t1, t2 WHERE t1.c1 = t2.c1 HAVING t1.c2 > 1;
<==>
obclient>SELECT * FROM t1, t2 WHERE t1.c1 = t2.c1 AND t1.c2 > 1;

The rewritten plan is shown in the following example, where the condition t1.c2 > 1 is pushed down to the TABLE SCAN layer.

obclient> EXPLAIN SELECT * FROM t1, t2 WHERE t1.c1 = t2.c1 HAVING t1.c2 > 1;

Query Plan:
=========================================
|ID|OPERATOR        |NAME|EST. ROWS|COST|
-----------------------------------------
|0 |NESTED-LOOP JOIN|    |1        |59  |
|1 | TABLE SCAN     |t1  |1        |37  |
|2 | TABLE GET      |t2  |1        |36  |
=========================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2], [t2.c1], [t2.c2]), filter(nil),
      conds(nil), nl_params_([t1.c1])
  1 - output([t1.c1], [t1.c2]), filter([t1.c2 > 1]),
      access([t1.c1], [t1.c2]), partitions(p0)
  2 - output([t2.c1], [t2.c2]), filter(nil),
      access([t2.c1], [t2.c2]), partitions(p0)

Equivalence relation deduction

In the process of equivalence relation deduction, new conditional expressions are deduced based on the transitivity of comparison operators to reduce the number of rows to be processed or to select a more efficient index.

OceanBase Database supports the deduction of equi-join conditions. For example, assume that a table contains columns a and b. a ＝ b AND a > 1 AND b > 1 can be deduced from a ＝ b AND a > 1. In this case, if column b is indexed and selectivity of the index is low for the condition b > 1, the performance of accessing the table that contains column b can be significantly improved by using the deduced condition.

The following example shows that the condition t1.c1 = t2.c2 AND t1.c1 > 2 is equivalently deduced to t1.c1 = t2.c2 AND t1.c1 > 2 AND t2.c2 > 2. You can learn from the plan that t2.c2 is pushed down to TABLE SCAN and the corresponding index of t2.c2 is applied.

obclient> CREATE TABLE t1(c1 INT PRIMARY KEY, c2 INT);
Query OK, 0 rows affected

obclient> CREATE TABLE t2(c1 INT PRIMARY KEY, c2 INT, c3 INT, KEY IDX_c2(c2));
Query OK, 0 rows affected
/*Run this command in MySQL mode*/

obclient> EXPLAIN EXTENDED_NOADDR SELECT t1.c1, t2.c2 FROM t1, t2 WHERE t1.c1 = t2.c2 AND t1.c1 > 2;

Query Plan:
==========================================
|ID|OPERATOR   |NAME      |EST. ROWS|COST|
------------------------------------------
|0 |MERGE JOIN |          |5        |78  |
|1 | TABLE SCAN|t2(IDX_c2)|5        |37  |
|2 | TABLE SCAN|t1        |3        |37  |
==========================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t2.c2]), filter(nil),
      equal_conds([t1.c1 = t2.c2]), other_conds(nil)
  1 - output([t2.c2]), filter(nil),
      access([t2.c2]), partitions(p0),
      is_index_back=false,
      range_key([t2.c2], [t2.c1]), range(2,MAX ; MAX,MAX),
      range_cond([t2.c2 > 2])
  2 - output([t1.c1]), filter(nil),
      access([t1.c1]), partitions(p0),
      is_index_back=false,
      range_key([t1.c1]), range(2 ; MAX),
      range_cond([t1.c1 > 2])

Identically true/false elimination

The following identically true and false conditions can be eliminated:

false and expr ＝ Identically false
true or expr = Identically true

In the following example, for the condition WHERE 0 > 1 AND c1 ＝ 3, AND is identically false because of the condition 0 > 1. Therefore, the result can be directly returned without the need to execute the SQL query, thereby accelerating the execution process.

obclient> EXPLAIN EXTENDED_NOADDR SELECT * FROM t1 WHERE 0 > 1 AND c1 = 3;

Query Plan:
===================================
|ID|OPERATOR  |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1  |0        |38  |
===================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2]), filter([0], [t1.c1 = 3]), startup_filter([0]),
      access([t1.c1], [t1.c2]), partitions(p0),
      is_index_back=false, filter_before_indexback[false,false],
      range_key([t1.__pk_increment], [t1.__pk_cluster_id], [t1.__pk_partition_id]),
      range(MAX,MAX,MAX ; MIN,MIN,MIN)always false

Non-SPJ rewrite

Sorting redundancy elimination

Redundancy elimination for sorting removes redundant ordered items to reduce resource consumption for sorting. Redundancy elimination for sorting is required in the following three circumstances:

Duplicate columns in the ORDER BY expression list. You can sort the columns after deduplication.

obclient> SELECT * FROM t1 WHERE c2 = 5 ORDER BY c1, c1, c2, c3 ;
<==>
obclient> SELECT * FROM t1 WHERE c2 = 5 ORDER BY c1, c2, c3;

A single-valued column in both ORDER BY and WHERE. This column can be deleted before sorting.

obclient> SELECT * FROM t1 WHERE c2 = 5 ORDER BY c1, c2, c3;
<==>
obclient> SELECT * FROM t1 WHERE c2 = 5 ORDER BY c1, c3;

If a query at the current layer contains ORDER BY but not LIMIT, and the query resides in the set operation of the parent query, then ORDER BY can be eliminated. The reason is that the UNION operation on two ordered sets simply leads to an unordered result set. However, if ORDER BY contains LIMIT, which semantically implies the selection of the greatest or smallest N items, ORDER BY must be retained to avoid semantic errors.
```
obclient> (SELECT c1,c2 FROM t1 ORDER BY c1) UNION (SELECT c3,c4 FROM t2 ORDER BY c3);
<==>
obclient> (SELECT c1,c2 FROM t1) UNION (SELECT c3,c4 FROM t2);
```

LIMIT pushdown to subquery

LIMIT pushdown rewrite means to push the LIMIT down to the subquery. OceanBase Database supports pushing down LIMIT to a view (Example 1) or to a subquery that contains the UNION operator (Example 2), without changing the semantics.

Example 1: Push LIMIT down to the view.

obclient> SELECT * FROM (SELECT * FROM t1 ORDER BY c1) a LIMIT 1;
<==>
obclient> SELECT * FROM (SELECT * FROM t1 ORDER BY c1 LIMIT 1) a LIMIT 1;

Example 2: Push LIMIT down to the subquery corresponding to UNION.

obclient> (SELECT c1,c2 FROM t1) UNION ALL (SELECT c3,c4 FROM t2) LIMIT 5;
<==>
obclient> (SELECT c1,c2 FROM t1 LIMIT 5) UNION ALL (SELECT c3,c4 FROM t2 limit 5) LIMIT 5;

LIMIT pushdown to outer join or cross join

If an SQL statement does not contain the WINDOW FUNCTION, DISTINCT, GROUP BY, or HAVING clause and does not contain the WHERE or ORDER BY condition or the WHERE or ORDER BY condition is related only to one side of the join, you can push down the LIMIT statement to one side (outer join) or multiple sides (cross join) of the joined table. This rewrite method is called LIMIT pushdown to outer join or cross join. Pushdown of the LIMIT statement can effectively reduce the number of rows that are joined, thus reducing the overheads of queries.

When the LIMIT statement is pushed down for an outer join, a view is encapsulated on the table pushed down. The following example shows query Q1 for a left outer join:

Q1:
SELECT * FROM t1 LEFT JOIN t2 ON t1.c1 = t2.c1 LIMIT 1;
==>
Q2:
SELECT * FROM V LEFT JOIN t2 ON t1.c1 = t2.c1 LIMIT 1;
V: (SELECT * FROM t1 LIMIT 1);

Query Q1 does not contain the preceding clauses or the WHERE or ORDER BY condition. Therefore, view V can be encapsulated on table t1 on the left side, and LIMIT 1 can be encapsulated into view V, thus rewriting query Q1 into query Q2. After the LIMIT statement is pushed down, the original LIMIT statement must be retained. The execution plan is rewritten as follows:

=============================================
|ID|OPERATOR         |NAME   |EST. ROWS|COST|
---------------------------------------------
|0 |LIMIT            |       |1        |4   |
|1 | MERGE OUTER JOIN|       |1        |4   |
|2 |  SUBPLAN SCAN   |VIEW1  |1        |2   |
|3 |   TABLE SCAN    |t1(idx)|1        |2   |
|4 |  SORT           |       |1        |2   |
|5 |   TABLE SCAN    |t2     |1        |2   |
=============================================
Outputs & filters:
-------------------------------------
0 - output([1]), filter(nil), rowset=256, limit(1), offset(nil)
1 - output(nil), filter(nil), rowset=256,
equal_conds([VIEW1.t1.c1 = t2.c1]), other_conds(nil)
2 - output([VIEW1.t1.c1]), filter(nil), rowset=256,
access([VIEW1.t1.c1])
3 - output([t1.c1]), filter(nil), rowset=256,
access([t1.c1]), partitions(p0),
limit(1), offset(nil)
4 - output([t2.c1]), filter(nil), rowset=256, sort_keys([t2.c1, ASC])
5 - output([t2.c1]), filter(nil), rowset=256,
access([t2.c1]), partitions(p0)

Likewise, a right outer join can push down the LIMIT statement to the table on the right side when the preceding conditions are met.

A cross join can push down the LIMIT statement to both sides. Here is an example:

Q3:
SELECT 1 FROM t1, t2 WHERE t1.c1 > 0 ORDER BY t1.c1 LIMIT 1;
==>
Q4:
SELECT 1 FROM V1, V2 LIMIT 1;
V1: SELECT 1 FROM t1 WHERE t1.c1 > 0 ORDER BY t1.c1 LIMIT 1;
V2: SELECT 1 FROM t2 LIMIT 1;

Query Q3 does not contain the conditions in the LIMIT pushdown to outer join and contains only the WHERE and ORDER BY conditions applied to table t1. In this case, you can create views V1 and V2 on tables t1 and t2, respectively. Then, you can push down the LIMIT statement to the views, thus rewriting query Q3 into query Q4. The execution plan is rewritten as follows:

=====================================================
|ID|OPERATOR                   |NAME |EST. ROWS|COST|
-----------------------------------------------------
|0 |LIMIT                      |     |1        |5   |
|1 | NESTED-LOOP JOIN CARTESIAN|     |1        |5   |
|2 |  SUBPLAN SCAN             |VIEW1|1        |3   |
|3 |   TOP-N SORT              |     |1        |3   |
|4 |    TABLE SCAN             |t1   |3        |3   |
|5 |  MATERIAL                 |     |1        |2   |
|6 |   SUBPLAN SCAN            |VIEW2|1        |2   |
|7 |    TABLE SCAN             |t2   |1        |2   |
=====================================================
Outputs & filters:
-------------------------------------
  0 - output([1]), filter(nil), rowset=256, limit(1), offset(nil)
  1 - output(nil), filter(nil), rowset=256,
      conds(nil), nl_params_(nil)
  2 - output(nil), filter(nil), rowset=256,
      access(nil)
  3 - output([t1.c1]), filter(nil), rowset=256, sort_keys([t1.c1, ASC]), topn(1)
  4 - output([t1.c1]), filter([t1.c1 > 0]), rowset=256,
      access([t1.c1]), partitions(p0)
  5 - output(nil), filter(nil), rowset=256
  6 - output(nil), filter(nil), rowset=256,
      access(nil)
  7 - output([1]), filter([t2.c1 > 0]), rowset=256,
      access([t2.c1]), partitions(p0),
      limit(1), offset(nil)

An SQL query for a multi-table join that meets the preceding conditions can execute LIMIT pushdown to outer joins and cross joins multiple times to increase the room for query rewrite and achieve better rewrite results.

DISTINCT elimination

If the SELECT statement contains only constants, DISTINCT can be eliminated, with LIMIT 1 added.

obclient> SELECT DISTINCT 1,2 FROM t1 ;
<==>
obclient> SELECT 1,2 FROM t1 LIMIT 1;

obclient> CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 INT);
Query OK, 0 rows affected

obclient> EXPLAIN EXTENDED_NOADDR SELECT DISTINCT 1,2 FROM t1;

Query Plan:
===================================
|ID|OPERATOR  |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1  |1        |36  |
===================================

Outputs & filters:
-------------------------------------
  0 - output([1], [2]), filter(nil),
      access([t1.c1]), partitions(p0),
      limit(1), offset(nil),
      is_index_back=false,
      range_key([t1.c1]), range(MIN ; MAX)always true

DISTINCT can be eliminated if the SELECT statement contains a column that ensures uniqueness. In the following example, (c1, c2) is the primary key. It ensures the uniqueness of c1, c2, and c3. Therefore, DISTINCT can be eliminated.

obclient> CREATE TABLE t2(c1 INT, c2 INT, c3 INT, PRIMARY KEY(c1, c2));
Query OK, 0 rows affected

obclient> SELECT DISTINCT c1, c2, c3 FROM t2;
<==>
obclient> SELECT c1, c2 c3 FROM t2;

obclient> EXPLAIN SELECT DISTINCT c1, c2, c3 FROM t2;

Query Plan:
===================================
|ID|OPERATOR  |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t2  |1000     |455 |
===================================

Outputs & filters:
-------------------------------------
  0 - output([t2.c1], [t2.c2], [t2.c3]), filter(nil),
      access([t2.c1], [t2.c2], [t2.c3]), partitions(p0)

MIN/MAX rewrite

If the MIN or MAX function contains a parameter serving as the index prefix column and does not contain GROUP BY, SCALAR aggregate can be rewritten into an index scan for scanning only one row, as shown in the following example:

obclient> CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 INT, c3 INT, KEY IDX_c2_c3(c2,c3));
Query OK, 0 rows affected

obclient> SELECT MIN(c2) FROM t1;
<==>
obclient> SELECT MIN(c2) FROM (SELECT c2 FROM t2 ORDER BY c2 LIMIT 1) AS t;

obclient> EXPLAIN SELECT MIN(c2) FROM t1;

Query Plan:
==================================================
|ID|OPERATOR       |NAME          |EST. ROWS|COST|
--------------------------------------------------
|0 |SCALAR GROUP BY|              |1        |37  |
|1 | SUBPLAN SCAN  |subquery_table|1        |37  |
|2 |  TABLE SCAN   |t1(idx_c2_c3) |1        |36  |
==================================================

Outputs & filters:
-------------------------------------
  0 - output([T_FUN_MIN(subquery_table.c2)]), filter(nil),
      group(nil), agg_func([T_FUN_MIN(subquery_table.c2)])
  1 - output([subquery_table.c2]), filter(nil),
      access([subquery_table.c2])
  2 - output([t1.c2]), filter([(T_OP_IS_NOT, t1.c2, NULL, 0)]),
      access([t1.c2]), partitions(p0),
      limit(1), offset(nil)

If all parameters of SELECT MIN or SELECT MAX are constants and GROUP BY is contained, you can rewrite MIN or MAX to a constant to reduce the resource consumption of the MIN or MAX operation.

obclient> SELECT MAX(1) FROM t1 GROUP BY c1;
<==>
obclient> SELECT 1 FROM t1 GROUP BY c1;

obclient> EXPLAIN EXTENDED_NOADDR SELECT MAX(1) FROM t1 GROUP BY c1;

Query Plan:
===================================
|ID|OPERATOR  |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1  |1000     |411 |
===================================

Outputs & filters:
-------------------------------------
  0 - output([1]), filter(nil),
      access([t1.c1]), partitions(p0),
      is_index_back=false,
      range_key([t1.c1]), range(MIN ; MAX)always true

If all parameters of SELECT MIN or SELECT MAX are constants and GROUP BY is not contained, you can rewrite the query to scan only one row by using the index, as shown in the following example:

obclient> SELECT MAX(1) FROM t1;
<==>
obclient> SELECT MAX(t.a) FROM (SELECT 1 AS a FROM t1 LIMIT 1) t;

obclient> EXPLAIN EXTENDED_NOADDR SELECT MAX(1) FROM t1;

Query Plan:
==================================================
|ID|OPERATOR       |NAME          |EST. ROWS|COST|
--------------------------------------------------
|0 |SCALAR GROUP BY|              |1        |37  |
|1 | SUBPLAN SCAN  |subquery_table|1        |37  |
|2 |  TABLE SCAN   |t1            |1        |36  |
==================================================

Outputs & filters:
-------------------------------------
  0 - output([T_FUN_MAX(subquery_table.subquery_col_alias)]), filter(nil),
      group(nil), agg_func([T_FUN_MAX(subquery_table.subquery_col_alias)])
  1 - output([subquery_table.subquery_col_alias]), filter(nil),
      access([subquery_table.subquery_col_alias])
  2 - output([1]), filter(nil),
      access([t1.c1]), partitions(p0),
      limit(1), offset(nil),
      is_index_back=false,
      range_key([t1.c1]), range(MIN ; MAX)always true

Rule-based query rewrite includes subquery-related rewrite, outer join elimination, simplified condition rewrite, and non-select project join (non-SPJ) rewrite.

It avoids the repeated execution of the same subquery.
The optimizer selects a better join order and better join algorithms based on the statistics.
After the join and filter conditions of the subquery are rewritten as the conditions of the parent query, more optimization options are available to the optimizer, such as conditional pushdown.

Frequently applied methods to rewrite a subquery include view merge, subquery expansion, and rewriting ANY or ALL by using MAX or MIN.

View merge

OceanBase Database supports the merge of SPJ views. In the following example, Q1 is rewritten to Q2:

obclient>CREATE TABLE t1 (c1 INT, c2 INT);
Query OK, 0 rows affected

obclient>CREATE TABLE t2 (c1 INT PRIMARY KEY, c2 INT);
Query OK, 0 rows affected

obclient>CREATE TABLE t3 (c1 INT PRIMARY KEY, c2 INT);
Query OK, 0 rows affected

Q1:
obclient>SELECT t1.c1, v.c1
                FROM t1, (SELECT t2.c1, t3.c2
                FROM t2, t3
                WHERE t2.c1 = t3.c1) v
                WHERE t1.c2 = v.c2;
<==>
Q2:
obclient>SELECT t1.c1, t2.c1
                FROM t1, t2, t3
                WHERE t2.c1 = t3.c1 AND t1.c2 = t3.c2;

Before the rewrite, the types of join orders available for query Q1 are as follows:

t1, v(t2,t3)
t1, v(t3,t2)
v(t2,t3), t1
v(t3,t2), t1

After the rewrite, the following types of join orders are available:

t1, t2, t3
t1, t3, t2
t2, t1, t3
t2, t3, t1
t3, t1, t2
t3, t2, t1

Subquery unnesting

You can unnest a subquery by using the following methods:

Rewrite conditions so that the generated join statement is enabled to return the same rows as the original statement.

Unnest a subquery to produce a semi join or anti join.

In the following example, t2.c2 is not unique and the statement is rewritten to a semi join. The new execution plan is as follows:

obclient> CREATE TABLE t1 (c1 INT, c2 INT);
Query OK, 0 rows affected

obclient> CREATE TABLE t2 (c1 INT PRIMARY KEY, c2 INT);
Query OK, 0 rows affected

obclient> EXPLAIN SELECT * FROM t1 WHERE t1.c1 IN (SELECT t2.c2 FROM t2);

Query Plan:
=======================================
|ID|OPERATOR      |NAME|EST. ROWS|COST|
---------------------------------------
|0 |HASH SEMI JOIN|    |495      |3931|
|1 | TABLE SCAN   |t1  |1000     |499 |
|2 | TABLE SCAN   |t2  |1000     |433 |
=======================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2]), filter(nil),
      equal_conds([t1.c1 = t2.c2]), other_conds(nil)
  1 - output([t1.c1], [t1.c2]), filter(nil),
      access([t1.c1], [t1.c2]), partitions(p0)
  2 - output([t2.c2]), filter(nil),
      access([t2.c2]), partitions(p0)

After you change the operator before a query to NOT IN, you can rewrite the query to an anti join. The new execution plan is as follows:

obclient> EXPLAIN SELECT * FROM t1 WHERE t1.c1 NOT IN (SELECT t2.c2 FROM t2);

Query Plan:
================================================
|ID|OPERATOR             |NAME|EST. ROWS|COST  |
------------------------------------------------
|0 |NESTED-LOOP ANTI JOIN|    |0        |520245|
|1 | TABLE SCAN          |t1  |1000     |499   |
|2 | TABLE SCAN          |t2  |22       |517   |
================================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2]), filter(nil),
      conds(nil), nl_params_([t1.c1], [(T_OP_IS, t1.c1, NULL, 0)])
  1 - output([t1.c1], [t1.c2], [(T_OP_IS, t1.c1, NULL, 0)]), filter(nil),
      access([t1.c1], [t1.c2]), partitions(p0)
  2 - output([t2.c2]), filter([(T_OP_OR, ? = t2.c2, ?, (T_OP_IS, t2.c2, NULL, 0))]),
      access([t2.c2]), partitions(p0)

Unnest a subquery to produce an inner join.

Q1:
obclient> SELECT * FROM t1 WHERE t1.c1 IN  (SELECT t2.c1 FROM t2);
<==>
Q2:
obclient> SELECT t1.* FROM t1, t2 WHERE t1.c1 = t2.c1;

The execution plan of query Q1 is then rewritten as follows:

obclient> EXPLAIN SELECT * FROM t1 WHERE t1.c1 IN (SELECT t2.c1 FROM t2);

Query Plan:
 ====================================
|ID|OPERATOR   |NAME|EST. ROWS|COST|
------------------------------------
|0 |HASH JOIN  |    |1980     |3725|
|1 | TABLE SCAN|t2  |1000     |411 |
|2 | TABLE SCAN|t1  |1000     |499 |
====================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2]), filter(nil),
      equal_conds([t1.c1 = t2.c1]), other_conds(nil)
  1 - output([t2.c1]), filter(nil),
      access([t2.c1]), partitions(p0)
  2 - output([t1.c1], [t1.c2]), filter(nil),
      access([t1.c1], [t1.c2]), partitions(p0)

You can rewrite subquery expressions, such as NOT IN, IN, NOT EXIST, EXIST, ANY, and ALL, in the same way.

Rewrite ANY or ALL by using MAX or MIN

val > ALL(SELECT col_item ...)  <==> val > (SELECT MAX(col_item) ...);
val >= ALL(SELECT col_item ...) <==> val >= (SELECT MAX(col_item) ...);
val < ALL(SELECT col_item ...)  <==> val < (SELECT MIN(col_item) ...);
val <= ALL(SELECT col_item ...) <==> val <= (SELECT MIN(col_item) ...);
val > ANY(SELECT col_item ...)  <==> val > (SELECT MIN(col_item) ...);
val >= ANY(SELECT col_item ...) <==> val >= (SELECT MIN(col_item) ...);
val < ANY(SELECT col_item ...)  <==> val < (SELECT MAX(col_item) ...);
val <= ANY(SELECT col_item ...) <==> val <= (SELECT MAX(col_item) ...);

After the subquery is converted to contain MAX or MIN, you can further rewrite MAX or MIN to reduce the number of scanning operations on the right-side table. Here is an example:

obclient> SELECT c1 FROM t1 WHERE c1 > ANY(SELECT c1 FROM t2);
<==>
obclient> SELECT c1 FROM t1 WHERE c1 > (SELECT MIN(c1) FROM t2);

After the MAX/MIN rewrite, the primary key t2.c1 can be used to directly push LIMIT 1 down to TABLE SCAN to generate the MIN value. The execution plan is as follows:

obclient> EXPLAIN SELECT c1 FROM t1 WHERE c1 > ANY(SELECT c1 FROM t2);

Query Plan:
 ===================================================
|ID|OPERATOR        |NAME          |EST. ROWS|COST|
---------------------------------------------------
|0 |SUBPLAN FILTER  |              |1        |73  |
|1 | TABLE SCAN     |t1            |1        |37  |
|2 | SCALAR GROUP BY|              |1        |37  |
|3 |  SUBPLAN SCAN  |subquery_table|1        |37  |
|4 |   TABLE SCAN   |t2            |1        |36  |
===================================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1]), filter([t1.c1 > ANY(subquery(1))]),
      exec_params_(nil), onetime_exprs_(nil), init_plan_idxs_([1])
  1 - output([t1.c1]), filter(nil),
      access([t1.c1]), partitions(p0)
  2 - output([T_FUN_MIN(subquery_table.c1)]), filter(nil),
      group(nil), agg_func([T_FUN_MIN(subquery_table.c1)])
  3 - output([subquery_table.c1]), filter(nil),
      access([subquery_table.c1])
  4 - output([t2.c1]), filter(nil),
      access([t2.c1]), partitions(p0),
      limit(1), offset(nil)

Outer join elimination

However, the outer join elimination requires a "Reject-NULL" condition, which is contained in WHERE conditions, and exports FALSE when the right-side table generates a NULL value.

Here is an example:

obclient>SELECT t1.c1, t2.c2 FROM t1 LEFT JOIN t2 ON t1.c2 = t2.c2;

obclient> SELECT t1.c1, t2.c2 FROM t1 LEFT JOIN t2 ON t1.c2 = t2.c2 WHERE t2.c2 > 5;
<==>
obclient> SELECT t1.c1, t2.c2 FROM t1 INNER JOIN t2 ON t1.c2 = t2.c2
            WHERE t2.c2 > 5;

Condition simplification

HAVING condition elimination

obclient>SELECT * FROM t1, t2 WHERE t1.c1 = t2.c1 HAVING t1.c2 > 1;
<==>
obclient>SELECT * FROM t1, t2 WHERE t1.c1 = t2.c1 AND t1.c2 > 1;

The rewritten plan is shown in the following example, where the condition t1.c2 > 1 is pushed down to the TABLE SCAN layer.

obclient> EXPLAIN SELECT * FROM t1, t2 WHERE t1.c1 = t2.c1 HAVING t1.c2 > 1;

Query Plan:
=========================================
|ID|OPERATOR        |NAME|EST. ROWS|COST|
-----------------------------------------
|0 |NESTED-LOOP JOIN|    |1        |59  |
|1 | TABLE SCAN     |t1  |1        |37  |
|2 | TABLE GET      |t2  |1        |36  |
=========================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2], [t2.c1], [t2.c2]), filter(nil),
      conds(nil), nl_params_([t1.c1])
  1 - output([t1.c1], [t1.c2]), filter([t1.c2 > 1]),
      access([t1.c1], [t1.c2]), partitions(p0)
  2 - output([t2.c1], [t2.c2]), filter(nil),
      access([t2.c1], [t2.c2]), partitions(p0)

Equivalence relation deduction

obclient> CREATE TABLE t1(c1 INT PRIMARY KEY, c2 INT);
Query OK, 0 rows affected

obclient> CREATE TABLE t2(c1 INT PRIMARY KEY, c2 INT, c3 INT, KEY IDX_c2(c2));
Query OK, 0 rows affected
/*Run this command in MySQL mode*/

obclient> EXPLAIN EXTENDED_NOADDR SELECT t1.c1, t2.c2 FROM t1, t2 WHERE t1.c1 = t2.c2 AND t1.c1 > 2;

Query Plan:
==========================================
|ID|OPERATOR   |NAME      |EST. ROWS|COST|
------------------------------------------
|0 |MERGE JOIN |          |5        |78  |
|1 | TABLE SCAN|t2(IDX_c2)|5        |37  |
|2 | TABLE SCAN|t1        |3        |37  |
==========================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t2.c2]), filter(nil),
      equal_conds([t1.c1 = t2.c2]), other_conds(nil)
  1 - output([t2.c2]), filter(nil),
      access([t2.c2]), partitions(p0),
      is_index_back=false,
      range_key([t2.c2], [t2.c1]), range(2,MAX ; MAX,MAX),
      range_cond([t2.c2 > 2])
  2 - output([t1.c1]), filter(nil),
      access([t1.c1]), partitions(p0),
      is_index_back=false,
      range_key([t1.c1]), range(2 ; MAX),
      range_cond([t1.c1 > 2])

Identically true/false elimination

The following identically true and false conditions can be eliminated:

false and expr ＝ Identically false
true or expr = Identically true

obclient> EXPLAIN EXTENDED_NOADDR SELECT * FROM t1 WHERE 0 > 1 AND c1 = 3;

Query Plan:
===================================
|ID|OPERATOR  |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1  |0        |38  |
===================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2]), filter([0], [t1.c1 = 3]), startup_filter([0]),
      access([t1.c1], [t1.c2]), partitions(p0),
      is_index_back=false, filter_before_indexback[false,false],
      range_key([t1.__pk_increment], [t1.__pk_cluster_id], [t1.__pk_partition_id]),
      range(MAX,MAX,MAX ; MIN,MIN,MIN)always false

Non-SPJ rewrite

Sorting redundancy elimination

Redundancy elimination for sorting removes redundant ordered items to reduce resource consumption for sorting. Redundancy elimination for sorting is required in the following three circumstances:

Duplicate columns in the ORDER BY expression list. You can sort the columns after deduplication.

obclient> SELECT * FROM t1 WHERE c2 = 5 ORDER BY c1, c1, c2, c3 ;
<==>
obclient> SELECT * FROM t1 WHERE c2 = 5 ORDER BY c1, c2, c3;

A single-valued column in both ORDER BY and WHERE. This column can be deleted before sorting.

obclient> SELECT * FROM t1 WHERE c2 = 5 ORDER BY c1, c2, c3;
<==>
obclient> SELECT * FROM t1 WHERE c2 = 5 ORDER BY c1, c3;

If a query at the current layer contains ORDER BY but not LIMIT, and the query resides in the set operation of the parent query, then ORDER BY can be eliminated. The reason is that the UNION operation on two ordered sets simply leads to an unordered result set. However, if ORDER BY contains LIMIT, which semantically implies the selection of the greatest or smallest N items, ORDER BY must be retained to avoid semantic errors.
```
obclient> (SELECT c1,c2 FROM t1 ORDER BY c1) UNION (SELECT c3,c4 FROM t2 ORDER BY c3);
<==>
obclient> (SELECT c1,c2 FROM t1) UNION (SELECT c3,c4 FROM t2);
```

LIMIT pushdown to subquery

Example 1: Push LIMIT down to the view.

obclient> SELECT * FROM (SELECT * FROM t1 ORDER BY c1) a LIMIT 1;
<==>
obclient> SELECT * FROM (SELECT * FROM t1 ORDER BY c1 LIMIT 1) a LIMIT 1;

Example 2: Push LIMIT down to the subquery corresponding to UNION.

obclient> (SELECT c1,c2 FROM t1) UNION ALL (SELECT c3,c4 FROM t2) LIMIT 5;
<==>
obclient> (SELECT c1,c2 FROM t1 LIMIT 5) UNION ALL (SELECT c3,c4 FROM t2 limit 5) LIMIT 5;

LIMIT pushdown to outer join or cross join

When the LIMIT statement is pushed down for an outer join, a view is encapsulated on the table pushed down. The following example shows query Q1 for a left outer join:

Q1:
SELECT * FROM t1 LEFT JOIN t2 ON t1.c1 = t2.c1 LIMIT 1;
==>
Q2:
SELECT * FROM V LEFT JOIN t2 ON t1.c1 = t2.c1 LIMIT 1;
V: (SELECT * FROM t1 LIMIT 1);

=============================================
|ID|OPERATOR         |NAME   |EST. ROWS|COST|
---------------------------------------------
|0 |LIMIT            |       |1        |4   |
|1 | MERGE OUTER JOIN|       |1        |4   |
|2 |  SUBPLAN SCAN   |VIEW1  |1        |2   |
|3 |   TABLE SCAN    |t1(idx)|1        |2   |
|4 |  SORT           |       |1        |2   |
|5 |   TABLE SCAN    |t2     |1        |2   |
=============================================
Outputs & filters:
-------------------------------------
0 - output([1]), filter(nil), rowset=256, limit(1), offset(nil)
1 - output(nil), filter(nil), rowset=256,
equal_conds([VIEW1.t1.c1 = t2.c1]), other_conds(nil)
2 - output([VIEW1.t1.c1]), filter(nil), rowset=256,
access([VIEW1.t1.c1])
3 - output([t1.c1]), filter(nil), rowset=256,
access([t1.c1]), partitions(p0),
limit(1), offset(nil)
4 - output([t2.c1]), filter(nil), rowset=256, sort_keys([t2.c1, ASC])
5 - output([t2.c1]), filter(nil), rowset=256,
access([t2.c1]), partitions(p0)

Likewise, a right outer join can push down the LIMIT statement to the table on the right side when the preceding conditions are met.

A cross join can push down the LIMIT statement to both sides. Here is an example:

Q3:
SELECT 1 FROM t1, t2 WHERE t1.c1 > 0 ORDER BY t1.c1 LIMIT 1;
==>
Q4:
SELECT 1 FROM V1, V2 LIMIT 1;
V1: SELECT 1 FROM t1 WHERE t1.c1 > 0 ORDER BY t1.c1 LIMIT 1;
V2: SELECT 1 FROM t2 LIMIT 1;

=====================================================
|ID|OPERATOR                   |NAME |EST. ROWS|COST|
-----------------------------------------------------
|0 |LIMIT                      |     |1        |5   |
|1 | NESTED-LOOP JOIN CARTESIAN|     |1        |5   |
|2 |  SUBPLAN SCAN             |VIEW1|1        |3   |
|3 |   TOP-N SORT              |     |1        |3   |
|4 |    TABLE SCAN             |t1   |3        |3   |
|5 |  MATERIAL                 |     |1        |2   |
|6 |   SUBPLAN SCAN            |VIEW2|1        |2   |
|7 |    TABLE SCAN             |t2   |1        |2   |
=====================================================
Outputs & filters:
-------------------------------------
  0 - output([1]), filter(nil), rowset=256, limit(1), offset(nil)
  1 - output(nil), filter(nil), rowset=256,
      conds(nil), nl_params_(nil)
  2 - output(nil), filter(nil), rowset=256,
      access(nil)
  3 - output([t1.c1]), filter(nil), rowset=256, sort_keys([t1.c1, ASC]), topn(1)
  4 - output([t1.c1]), filter([t1.c1 > 0]), rowset=256,
      access([t1.c1]), partitions(p0)
  5 - output(nil), filter(nil), rowset=256
  6 - output(nil), filter(nil), rowset=256,
      access(nil)
  7 - output([1]), filter([t2.c1 > 0]), rowset=256,
      access([t2.c1]), partitions(p0),
      limit(1), offset(nil)

DISTINCT elimination

If the SELECT statement contains only constants, DISTINCT can be eliminated, with LIMIT 1 added.

obclient> SELECT DISTINCT 1,2 FROM t1 ;
<==>
obclient> SELECT 1,2 FROM t1 LIMIT 1;

obclient> CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 INT);
Query OK, 0 rows affected

obclient> EXPLAIN EXTENDED_NOADDR SELECT DISTINCT 1,2 FROM t1;

Query Plan:
===================================
|ID|OPERATOR  |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1  |1        |36  |
===================================

Outputs & filters:
-------------------------------------
  0 - output([1], [2]), filter(nil),
      access([t1.c1]), partitions(p0),
      limit(1), offset(nil),
      is_index_back=false,
      range_key([t1.c1]), range(MIN ; MAX)always true

obclient> CREATE TABLE t2(c1 INT, c2 INT, c3 INT, PRIMARY KEY(c1, c2));
Query OK, 0 rows affected

obclient> SELECT DISTINCT c1, c2, c3 FROM t2;
<==>
obclient> SELECT c1, c2 c3 FROM t2;

obclient> EXPLAIN SELECT DISTINCT c1, c2, c3 FROM t2;

Query Plan:
===================================
|ID|OPERATOR  |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t2  |1000     |455 |
===================================

Outputs & filters:
-------------------------------------
  0 - output([t2.c1], [t2.c2], [t2.c3]), filter(nil),
      access([t2.c1], [t2.c2], [t2.c3]), partitions(p0)

MIN/MAX rewrite

obclient> CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 INT, c3 INT, KEY IDX_c2_c3(c2,c3));
Query OK, 0 rows affected

obclient> SELECT MIN(c2) FROM t1;
<==>
obclient> SELECT MIN(c2) FROM (SELECT c2 FROM t2 ORDER BY c2 LIMIT 1) AS t;

obclient> EXPLAIN SELECT MIN(c2) FROM t1;

Query Plan:
==================================================
|ID|OPERATOR       |NAME          |EST. ROWS|COST|
--------------------------------------------------
|0 |SCALAR GROUP BY|              |1        |37  |
|1 | SUBPLAN SCAN  |subquery_table|1        |37  |
|2 |  TABLE SCAN   |t1(idx_c2_c3) |1        |36  |
==================================================

Outputs & filters:
-------------------------------------
  0 - output([T_FUN_MIN(subquery_table.c2)]), filter(nil),
      group(nil), agg_func([T_FUN_MIN(subquery_table.c2)])
  1 - output([subquery_table.c2]), filter(nil),
      access([subquery_table.c2])
  2 - output([t1.c2]), filter([(T_OP_IS_NOT, t1.c2, NULL, 0)]),
      access([t1.c2]), partitions(p0),
      limit(1), offset(nil)

obclient> SELECT MAX(1) FROM t1 GROUP BY c1;
<==>
obclient> SELECT 1 FROM t1 GROUP BY c1;

obclient> EXPLAIN EXTENDED_NOADDR SELECT MAX(1) FROM t1 GROUP BY c1;

Query Plan:
===================================
|ID|OPERATOR  |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1  |1000     |411 |
===================================

Outputs & filters:
-------------------------------------
  0 - output([1]), filter(nil),
      access([t1.c1]), partitions(p0),
      is_index_back=false,
      range_key([t1.c1]), range(MIN ; MAX)always true

If all parameters of SELECT MIN or SELECT MAX are constants and GROUP BY is not contained, you can rewrite the query to scan only one row by using the index, as shown in the following example:

obclient> SELECT MAX(1) FROM t1;
<==>
obclient> SELECT MAX(t.a) FROM (SELECT 1 AS a FROM t1 LIMIT 1) t;

obclient> EXPLAIN EXTENDED_NOADDR SELECT MAX(1) FROM t1;

Query Plan:
==================================================
|ID|OPERATOR       |NAME          |EST. ROWS|COST|
--------------------------------------------------
|0 |SCALAR GROUP BY|              |1        |37  |
|1 | SUBPLAN SCAN  |subquery_table|1        |37  |
|2 |  TABLE SCAN   |t1            |1        |36  |
==================================================

Outputs & filters:
-------------------------------------
  0 - output([T_FUN_MAX(subquery_table.subquery_col_alias)]), filter(nil),
      group(nil), agg_func([T_FUN_MAX(subquery_table.subquery_col_alias)])
  1 - output([subquery_table.subquery_col_alias]), filter(nil),
      access([subquery_table.subquery_col_alias])
  2 - output([1]), filter(nil),
      access([t1.c1]), partitions(p0),
      limit(1), offset(nil),
      is_index_back=false,
      range_key([t1.c1]), range(MIN ; MAX)always true

OceanBase

Customer Stories

Documentation

Rule-based query rewrite

View merge

Subquery unnesting

Rewrite ANY or ALL by using MAX or MIN

Outer join elimination

Condition simplification

HAVING condition elimination

Equivalence relation deduction

Identically true/false elimination

Non-SPJ rewrite

Sorting redundancy elimination

LIMIT pushdown to subquery

LIMIT pushdown to outer join or cross join

DISTINCT elimination

MIN/MAX rewrite

Rule-based query rewrite

View merge

Subquery unnesting

Rewrite ANY or ALL by using MAX or MIN

Outer join elimination

Condition simplification

HAVING condition elimination

Equivalence relation deduction

Identically true/false elimination

Non-SPJ rewrite

Sorting redundancy elimination

LIMIT pushdown to subquery

LIMIT pushdown to outer join or cross join

DISTINCT elimination

MIN/MAX rewrite

OceanBase

Customer Stories

Documentation

Rule-based query rewrite

Subquery related rewrite

View merge

Subquery unnesting

Rewrite ANY or ALL by using MAX or MIN

Outer join elimination

Condition simplification

HAVING condition elimination

Equivalence relation deduction

Identically true/false elimination

Non-SPJ rewrite

Sorting redundancy elimination

LIMIT pushdown to subquery

LIMIT pushdown to outer join or cross join

DISTINCT elimination

MIN/MAX rewrite

Rule-based query rewrite

Subquery related rewrite

View merge

Subquery unnesting

Rewrite ANY or ALL by using MAX or MIN

Outer join elimination

Condition simplification

HAVING condition elimination

Equivalence relation deduction

Identically true/false elimination

Non-SPJ rewrite

Sorting redundancy elimination

LIMIT pushdown to subquery

LIMIT pushdown to outer join or cross join

DISTINCT elimination

MIN/MAX rewrite