Rule-based query rewriting|V4.3.1|OceanBase Database| docs|Distributed Database

Rule-based query rewriting includes subquery-related rewriting, outer join elimination, conditional simplification, and rewriting of non-SPJ queries.

Subquery rewriting

The optimizer usually executes a subquery nested within a parent query. In this case, the optimizer needs to execute the subquery once for each row generated by the parent query. This execution method results in low efficiency because the subquery is executed multiple times. To improve the execution efficiency, the optimizer usually rewrites the subquery into a join operation. The main advantages are as follows:

The subquery is executed only once.
The optimizer can choose a better join order and join method based on statistics.
After the join conditions and filter conditions of the subquery are rewritten into those of the parent query, the optimizer can further optimize the conditions, such as pushing down the conditions.

Subquery rewriting includes view merging, subquery expansion, and rewriting ANY/ALL predicates into MAX/MIN predicates.

View merging

View merging is the process of merging a subquery that represents a view into the query that contains the view. This process helps the optimizer to increase the flexibility of join orders, access paths, and other transformations, thereby helping to find a better execution plan.

OceanBase Database supports merging SPJ views. As shown in the following example, query Q1 can be rewritten into Q2:

obclient>CREATE TABLE t1 (c1 INT, c2 INT);
Query OK, 0 rows affected 

obclient>CREATE TABLE t2 (c1 INT PRIMARY KEY, c2 INT);
Query OK, 0 rows affected 

obclient>CREATE TABLE t3 (c1 INT PRIMARY KEY, c2 INT);
Query OK, 0 rows affected 

Q1: 
obclient>SELECT t1.c1, v.c1 
                FROM t1, (SELECT t2.c1, t3.c2 
                FROM t2, t3 
                WHERE t2.c1 = t3.c1) v 
                WHERE t1.c2 = v.c2;
<==>
Q2: 
obclient>SELECT t1.c1, t2.c1 
                FROM t1, t2, t3 
                WHERE t2.c1 = t3.c1 AND t1.c2 = t3.c2;

If Q1 is not rewritten, possible join orders are as follows:

t1, v(t2,t3)
t1, v(t3,t2)
v(t2,t3), t1
v(t3,t2), t1

After view merging, possible join orders are as follows:

t1, t2, t3
t1, t3, t2
t2, t1, t3
t2, t3, t1
t3, t1, t2
t3, t2, t1

As we can see, view merging increases the flexibility of join orders. For complex queries, view merging increases the flexibility of path selection and transformation, thereby enabling the optimizer to find better execution plans.

Subquery expansion

Subquery expansion is the process of bringing a subquery in the WHERE condition to the parent query and rewriting it into a join condition. After the subquery is expanded, it is removed and the parent query is changed into a multi-table join.

This modification enables the optimizer to consider tables in the subquery during path selection, join method determination, and join order arrangement, thereby helping to find a better execution plan. Subqueries with NOT IN, IN, NOT EXISTS, EXISTS, ANY, and ALL predicates are involved in subquery expansion.

Subquery expansion can be done in the following ways:

Rewrite the condition so that the generated join statement can return the same rows as the original statement.

Expand the subquery into a semi join or anti join.

In the following example, t2.c2 does not have uniqueness. The semi join-based execution plan is as follows:

obclient> CREATE TABLE t1 (c1 INT, c2 INT);
Query OK, 0 rows affected 

obclient> CREATE TABLE t2 (c1 INT PRIMARY KEY, c2 INT);
Query OK, 0 rows affected 

obclient> EXPLAIN SELECT * FROM t1 WHERE t1.c1 IN (SELECT t2.c2 FROM t2);

Query Plan: 
=======================================
|ID|OPERATOR      |NAME|EST. ROWS|COST|
---------------------------------------
|0 |HASH SEMI JOIN|    |495      |3931|
|1 | TABLE SCAN   |t1  |1000     |499 |
|2 | TABLE SCAN   |t2  |1000     |433 |
=======================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2]), filter(nil),
      equal_conds([t1.c1 = t2.c2]), other_conds(nil)
  1 - output([t1.c1], [t1.c2]), filter(nil),
      access([t1.c1], [t1.c2]), partitions(p0)
  2 - output([t2.c2]), filter(nil),
      access([t2.c2]), partitions(p0)

If you change the preceding operator IN to NOT IN, the anti join-based execution plan is as follows:

obclient> EXPLAIN SELECT * FROM t1 WHERE t1.c1 NOT IN (SELECT t2.c2 FROM t2);

Query Plan:
================================================
|ID|OPERATOR             |NAME|EST. ROWS|COST  |
------------------------------------------------
|0 |NESTED-LOOP ANTI JOIN|    |0        |520245|
|1 | TABLE SCAN          |t1  |1000     |499   |
|2 | TABLE SCAN          |t2  |22       |517   |
================================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2]), filter(nil),
      conds(nil), nl_params_([t1.c1], [(T_OP_IS, t1.c1, NULL, 0)])
  1 - output([t1.c1], [t1.c2], [(T_OP_IS, t1.c1, NULL, 0)]), filter(nil),
      access([t1.c1], [t1.c2]), partitions(p0)
  2 - output([t2.c2]), filter([(T_OP_OR, ? = t2.c2, ?, (T_OP_IS, t2.c2, NULL, 0))]),
      access([t2.c2]), partitions(p0)

Expand the subquery into an inner join

In the preceding example, if you change t2.c2 into t2.c1 in query Q1, since t2.c1 is a primary key, the join result is unique. In this case, you can directly expand the subquery into an inner join, as shown below:

Q1: 
obclient> SELECT * FROM t1 WHERE t1.c1 IN  (SELECT t2.c1 FROM t2);
<==>
Q2: 
obclient> SELECT t1.* FROM t1, t2 WHERE t1.c1 = t2.c1;

The execution plan after Q1 is rewritten is as follows:

obclient> EXPLAIN SELECT * FROM t1 WHERE t1.c1 IN (SELECT t2.c1 FROM t2);

Query Plan:
 ====================================
|ID|OPERATOR   |NAME|EST. ROWS|COST|
------------------------------------
|0 |HASH JOIN  |    |1980     |3725|
|1 | TABLE SCAN|t2  |1000     |411 |
|2 | TABLE SCAN|t1  |1000     |499 |
====================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2]), filter(nil),
      equal_conds([t1.c1 = t2.c1]), other_conds(nil)
  1 - output([t2.c1]), filter(nil),
      access([t2.c1]), partitions(p0)
  2 - output([t1.c1], [t1.c2]), filter(nil),
      access([t1.c1], [t1.c2]), partitions(p0)

You can expand NOT IN, IN, NOT EXISTS, EXISTS, ANY, and ALL subqueries into inner joins.

Rewrite using MAX/MIN after converting ANY/ALL

If the subquery does not contain a GROUP BY clause, aggregate functions, or HAVING conditions, you can convert the ANY/ALL expression by using the aggregate functions MIN and MAX. Specifically, col_item must be a single column without NULL values:

val > ALL(SELECT col_item ...)  <==> val > (SELECT MAX(col_item) ...);
val >= ALL(SELECT col_item ...) <==> val >= (SELECT MAX(col_item) ...);
val < ALL(SELECT col_item ...)  <==> val < (SELECT MIN(col_item) ...);
val <= ALL(SELECT col_item ...) <==> val <= (SELECT MIN(col_item) ...);
val > ANY(SELECT col_item ...)  <==> val > (SELECT MIN(col_item) ...);
val >= ANY(SELECT col_item ...) <==> val >= (SELECT MIN(col_item) ...);
val < ANY(SELECT col_item ...)  <==> val < (SELECT MAX(col_item) ...);
val <= ANY(SELECT col_item ...) <==> val <= (SELECT MAX(col_item) ...);

After you convert the subquery containing the ANY/ALL operator into a subquery containing the MAX/MIN operator, you can apply the MAX/MIN operator to rewrite the query statement. This approach can reduce the number of scans of the inner table before the conversion. Here is an example:

obclient> SELECT c1 FROM t1 WHERE c1 > ANY(SELECT c1 FROM t2);
<==>
obclient> SELECT c1 FROM t1 WHERE c1 > (SELECT MIN(c1) FROM t2);

After you rewrite the query statement by using the MAX/MIN operator, you can push the LIMIT 1 condition, based on the primary key order of t2.c1, directly down to the TABLE SCAN operator to output the minimum value. The execution plan is as follows:

obclient> EXPLAIN SELECT c1 FROM t1 WHERE c1 > ANY(SELECT c1 FROM t2);

Query Plan:
 ===================================================
|ID|OPERATOR        |NAME          |EST. ROWS|COST|
---------------------------------------------------
|0 |SUBPLAN FILTER  |              |1        |73  |
|1 | TABLE SCAN     |t1            |1        |37  |
|2 | SCALAR GROUP BY|              |1        |37  |
|3 |  SUBPLAN SCAN  |subquery_table|1        |37  |
|4 |   TABLE SCAN   |t2            |1        |36  |
===================================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1]), filter([t1.c1 > ANY(subquery(1))]),
      exec_params_(nil), onetime_exprs_(nil), init_plan_idxs_([1])
  1 - output([t1.c1]), filter(nil),
      access([t1.c1]), partitions(p0)
  2 - output([T_FUN_MIN(subquery_table.c1)]), filter(nil),
      group(nil), agg_func([T_FUN_MIN(subquery_table.c1)])
  3 - output([subquery_table.c1]), filter(nil),
      access([subquery_table.c1])
  4 - output([t2.c1]), filter(nil),
      access([t2.c1]), partitions(p0),
      limit(1), offset(nil)

Elimination of outer joins

Outer joins can be left outer joins, right outer joins, and full outer joins. The optimizer cannot freely exchange the order of outer joins, which limits the possible join orders. Elimination of outer joins converts outer joins into inner joins, providing more join orders for the optimizer.

To eliminate an outer join, a "null value rejecting condition" must be met. In other words, a WHERE condition must exist such that when a null value is generated by the inner table, the condition evaluates to FALSE.

Here is an example:

obclient>SELECT t1.c1, t2.c2 FROM t1 LEFT JOIN t2 ON t1.c2 = t2.c2;

This query statement contains an outer join. The value of t2.c2 in the output row can be NULL. If a condition t2.c2 > 5 is added, the value of t2.c1 output after filtering based on the condition cannot be NULL. In this case, the outer join can be converted into an inner join.

obclient> SELECT t1.c1, t2.c2 FROM t1 LEFT JOIN t2 ON t1.c2 = t2.c2 WHERE t2.c2 > 5;
<==>
obclient> SELECT t1.c1, t2.c2 FROM t1 INNER JOIN t2 ON t1.c2 = t2.c2 
            WHERE t2.c2 > 5;

Simplify conditions

Elimination of the HAVING condition

If the query does not contain an aggregate operation or a GROUP BY clause, the HAVING condition can be combined with the WHERE condition and removed. This way, the HAVING condition can be managed and optimized together with the WHERE condition.

obclient>SELECT * FROM t1, t2 WHERE t1.c1 = t2.c1 HAVING t1.c2 > 1;
<==>
obclient>SELECT * FROM t1, t2 WHERE t1.c1 = t2.c1 AND t1.c2 > 1;

The following example shows a rewritten execution plan. The t1.c2 > 1 condition is pushed down to the TABLE SCAN layer.

obclient> EXPLAIN SELECT * FROM t1, t2 WHERE t1.c1 = t2.c1 HAVING t1.c2 > 1;

Query Plan: 
=========================================
|ID|OPERATOR        |NAME|EST. ROWS|COST|
-----------------------------------------
|0 |NESTED-LOOP JOIN|    |1        |59  |
|1 | TABLE SCAN     |t1  |1        |37  |
|2 | TABLE GET      |t2  |1        |36  |
=========================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2], [t2.c1], [t2.c2]), filter(nil),
      conds(nil), nl_params_([t1.c1])
  1 - output([t1.c1], [t1.c2]), filter([t1.c2 > 1]),
      access([t1.c1], [t1.c2]), partitions(p0)
  2 - output([t2.c1], [t2.c2]), filter(nil),
      access([t2.c1], [t2.c2]), partitions(p0)

Derivation of equivalent conditions

Based on the transitivity of comparison operators, new conditions can be derived. This reduces the number of rows to be processed or helps select a more efficient index.

OceanBase Database can derive equivalent conditions for value-based joins. For example, in a table that contains the a and b columns, the condition a = b AND a > 1 can be derived into a = b AND a > 1 AND b > 1. If an index exists on the b column and the selectivity of the b > 1 condition is low for the index, the performance of accessing the table based on the b column can be significantly improved.

Consider the condition t1.c1 = t2.c2 AND t1.c1 > 2. After equivalence condition derivation, it becomes t1.c1 = t2.c2 AND t1.c1 > 2 AND t2.c2 > 2. As we can see in the plan below, the t2.c2 > 2 condition is pushed down to the TABLE SCAN layer and the IDX_c2 index is used.

obclient> CREATE TABLE t1(c1 INT PRIMARY KEY, c2 INT);
Query OK, 0 rows affected

obclient> CREATE TABLE t2(c1 INT PRIMARY KEY, c2 INT, c3 INT, KEY IDX_c2(c2));
Query OK, 0 rows affected
/*This command needs to be run in MySQL mode.*/

obclient> EXPLAIN EXTENDED_NOADDR SELECT t1.c1, t2.c2 FROM t1, t2 WHERE t1.c1 = t2.c2 AND t1.c1 > 2;

Query Plan:
==========================================
|ID|OPERATOR   |NAME      |EST. ROWS|COST|
------------------------------------------
|0 |MERGE JOIN |          |5        |78  |
|1 | TABLE SCAN|t2(IDX_c2)|5        |37  |
|2 | TABLE SCAN|t1        |3        |37  |
==========================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t2.c2]), filter(nil),
      equal_conds([t1.c1 = t2.c2]), other_conds(nil)
  1 - output([t2.c2]), filter(nil),
      access([t2.c2]), partitions(p0),
      is_index_back=false,
      range_key([t2.c2], [t2.c1]), range(2,MAX ; MAX,MAX),
      range_cond([t2.c2 > 2])
  2 - output([t1.c1]), filter(nil),
      access([t1.c1]), partitions(p0),
      is_index_back=false,
      range_key([t1.c1]), range(2 ; MAX),
      range_cond([t1.c1 > 2])

Elimination of tautologies and contradictions

The system can eliminate tautologies and contradictions:

false and expr = Always False
true or expr = Always True

For example, in the WHERE 0 > 1 AND c1 = 3 condition, 0 > 1 is a contradiction, making the entire condition a contradiction. Therefore, the SQL query does not need to be executed. Instead, it can directly return the result, thus accelerating the query.

obclient> EXPLAIN EXTENDED_NOADDR SELECT * FROM t1 WHERE 0 > 1 AND c1 = 3;

Query Plan: 
===================================
|ID|OPERATOR  |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1  |0        |38  |
===================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2]), filter([0], [t1.c1 = 3]), startup_filter([0]),
      access([t1.c1], [t1.c2]), partitions(p0),
      is_index_back=false, filter_before_indexback[false,false],
      range_key([t1.__pk_increment], [t1.__pk_cluster_id], [t1.__pk_partition_id]),
      range(MAX,MAX,MAX ; MIN,MIN,MIN)always false

Non-SPJ rewriting

Elimination of redundant sorts

Elimination of redundant sorts is the process of removing unnecessary items from the order item to reduce the sorting overhead. Elimination of redundant sorts can be performed under the following conditions:

The ORDER BY clause contains duplicate columns. In this case, redundant sorts can be eliminated by sorting only once.

obclient> SELECT * FROM t1 WHERE c2 = 5 ORDER BY c1, c1, c2, c3 ;
<==>
obclient> SELECT * FROM t1 WHERE c2 = 5 ORDER BY c1, c2, c3;

The ORDER BY clause contains a column that has a single-valued condition in the WHERE clause. In this case, the sorting on the column can be eliminated.

obclient> SELECT * FROM t1 WHERE c2 = 5 ORDER BY c1, c2, c3;
<==>
obclient> SELECT * FROM t1 WHERE c2 = 5 ORDER BY c1, c3;

If the current query has an ORDER BY clause but no LIMIT clause and is located within a set operation of a parent query, the ORDER BY clause can be eliminated. The reason is that the result of a UNION operation on two ordered sets is unordered. However, if the ORDER BY clause contains a LIMIT clause, it specifies to retrieve the largest or smallest N records. In this case, the ORDER BY clause cannot be eliminated to avoid semantic errors.
```
obclient> (SELECT c1,c2 FROM t1 ORDER BY c1) UNION (SELECT c3,c4 FROM t2 ORDER BY c3);
<==>
obclient> (SELECT c1,c2 FROM t1) UNION (SELECT c3,c4 FROM t2);
```

Pushdown of the LIMIT clause into subqueries

The pushdown of the LIMIT clause is the process of moving the LIMIT clause into a subquery. OceanBase Database now supports pushing down the LIMIT clause into a view (example 1) or a subquery corresponding to the UNION operator (example 2) without altering the semantics.

Example 1: Push the LIMIT clause into a view.

obclient> SELECT * FROM (SELECT * FROM t1 ORDER BY c1) a LIMIT 1; 
<==>
obclient> SELECT * FROM (SELECT * FROM t1 ORDER BY c1 LIMIT 1) a LIMIT 1;

Example 2: Push the LIMIT clause into a subquery corresponding to the UNION operator.

obclient> (SELECT c1,c2 FROM t1) UNION ALL (SELECT c3,c4 FROM t2) LIMIT 5;
<==>
obclient> (SELECT c1,c2 FROM t1 LIMIT 5) UNION ALL (SELECT c3,c4 FROM t2 limit 5) LIMIT 5;

Pushdown of the LIMIT clause to outer or cross joins

If the SQL query statement does not contain WINDOW FUNCTION, DISTINCT, GROUP BY, or HAVING, and the WHERE condition or ORDER BY clause applies to only one side of the join or does not contain WHERE and ORDER BY, the LIMIT clause can be pushed down to one side (outer join) or both sides (cross join) of the join. This process is called the pushdown of the LIMIT clause to outer joins or cross joins. By pushing down the LIMIT clause, the number of rows processed in the join can be effectively reduced, thus lowering the execution overheads of the query.

For outer joins, a view is generated on the table to which the LIMIT clause is pushed down. Take a left outer join as an example. When the LIMIT clause is pushed down to the left table, a view named V is generated on the left table t1 as shown in the following example:

Q1:
SELECT * FROM t1 LEFT JOIN t2 ON t1.c1 = t2.c1 LIMIT 1;
==>
Q2:
SELECT * FROM V LEFT JOIN t2 ON t1.c1 = t2.c1 LIMIT 1;
V: (SELECT * FROM t1 LIMIT 1);

Assume that the Q1 query contains no restrictions for pushing down the LIMIT clause and does not contain WHERE and ORDER BY clauses. In this case, a view V can be generated on the left table t1, and the LIMIT 1 condition can be pushed down and encapsulated into the view V. The Q2 query is the rewritten query statement. Note that the original LIMIT clause must be retained after the LIMIT clause is pushed down. The rewritten query plan is as follows:

=============================================
|ID|OPERATOR         |NAME   |EST. ROWS|COST|
---------------------------------------------
|0 |LIMIT            |       |1        |4   |
|1 | MERGE OUTER JOIN|       |1        |4   |
|2 |  SUBPLAN SCAN   |VIEW1  |1        |2   |
|3 |   TABLE SCAN    |t1(idx)|1        |2   |
|4 |  SORT           |       |1        |2   |
|5 |   TABLE SCAN    |t2     |1        |2   |
=============================================
Outputs & filters: 
-------------------------------------
0 - output([1]), filter(nil), rowset=256, limit(1), offset(nil)
1 - output(nil), filter(nil), rowset=256, 
equal_conds([VIEW1.t1.c1 = t2.c1]), other_conds(nil)
2 - output([VIEW1.t1.c1]), filter(nil), rowset=256, 
access([VIEW1.t1.c1])
3 - output([t1.c1]), filter(nil), rowset=256, 
access([t1.c1]), partitions(p0), 
limit(1), offset(nil)
4 - output([t2.c1]), filter(nil), rowset=256, sort_keys([t2.c1, ASC])
5 - output([t2.c1]), filter(nil), rowset=256, 
access([t2.c1]), partitions(p0)

Similarly, if the conditions are met, the LIMIT clause can be pushed down to the right table of a right outer join.

For cross joins, the LIMIT clause can be pushed down to both sides of the join. Take a look at the following example:

Q3:
SELECT 1 FROM t1, t2 WHERE t1.c1 > 0 ORDER BY t1.c1 LIMIT 1;
==>
Q4:
SELECT 1 FROM V1, V2 LIMIT 1;
V1: SELECT 1 FROM t1 WHERE t1.c1 > 0 ORDER BY t1.c1 LIMIT 1; 
V2: SELECT 1 FROM t2 LIMIT 1;

The Q3 query contains no restrictions for pushing down the LIMIT clause and contains only the WHERE condition and ORDER BY clause on the t1 table. In this case, views V1 and V2 can be created on the t1 and t2 tables, respectively, and the LIMIT clause can be pushed down and encapsulated into the views V1 and V2. The Q4 query is the rewritten query statement. Note that the original LIMIT clause must be retained. The rewritten query plan is as follows:

=====================================================
|ID|OPERATOR                   |NAME |EST. ROWS|COST|
-----------------------------------------------------
|0 |LIMIT                      |     |1        |5   |
|1 | NESTED-LOOP JOIN CARTESIAN|     |1        |5   |
|2 |  SUBPLAN SCAN             |VIEW1|1        |3   |
|3 |   TOP-N SORT              |     |1        |3   |
|4 |    TABLE SCAN             |t1   |3        |3   |
|5 |  MATERIAL                 |     |1        |2   |
|6 |   SUBPLAN SCAN            |VIEW2|1        |2   |
|7 |    TABLE SCAN             |t2   |1        |2   |
=====================================================
Outputs & filters: 
-------------------------------------
  0 - output([1]), filter(nil), rowset=256, limit(1), offset(nil)
  1 - output(nil), filter(nil), rowset=256, 
      conds(nil), nl_params_(nil)
  2 - output(nil), filter(nil), rowset=256, 
      access(nil)
  3 - output([t1.c1]), filter(nil), rowset=256, sort_keys([t1.c1, ASC]), topn(1)
  4 - output([t1.c1]), filter([t1.c1 > 0]), rowset=256, 
      access([t1.c1]), partitions(p0)
  5 - output(nil), filter(nil), rowset=256
  6 - output(nil), filter(nil), rowset=256, 
      access(nil)
  7 - output([1]), filter([t2.c1 > 0]), rowset=256, 
      access([t2.c1]), partitions(p0), 
      limit(1), offset(nil)

Elimination of DISTINCT

If the Select Item contains only constants, you can eliminate the DISTINCT keyword and add the LIMIT 1 clause.

obclient> SELECT DISTINCT 1,2 FROM t1 ;
<==> 
obclient> SELECT 1,2 FROM t1 LIMIT 1;

obclient> CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 INT);
Query OK, 0 rows affected 

obclient> EXPLAIN EXTENDED_NOADDR SELECT DISTINCT 1,2 FROM t1;

Query Plan: 
===================================
|ID|OPERATOR  |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1  |1        |36  |
===================================

Outputs & filters:
-------------------------------------
  0 - output([1], [2]), filter(nil),
      access([t1.c1]), partitions(p0),
      limit(1), offset(nil),
      is_index_back=false,
      range_key([t1.c1]), range(MIN ; MAX)always true

If the Select Item contains a column that enforces a uniqueness constraint, you can eliminate the DISTINCT keyword. For example, in the following example (c1, c2) is the primary key, ensuring the uniqueness of the values in c1, c2, and c3. As a result, the DISTINCT keyword can be eliminated.

obclient> CREATE TABLE t2(c1 INT, c2 INT, c3 INT, PRIMARY KEY(c1, c2));
Query OK, 0 rows affected 

obclient> SELECT DISTINCT c1, c2, c3 FROM t2;
<==>
obclient> SELECT c1, c2 c3 FROM t2;

obclient> EXPLAIN SELECT DISTINCT c1, c2, c3 FROM t2;

Query Plan: 
===================================
|ID|OPERATOR  |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t2  |1000     |455 |
===================================

Outputs & filters:
-------------------------------------
  0 - output([t2.c1], [t2.c2], [t2.c3]), filter(nil),
      access([t2.c1], [t2.c2], [t2.c3]), partitions(p0)

Rewriting of MIN/MAX

If the parameter of the MIN/MAX function is a prefix column of an index and does not contain the GROUP BY clause, you can convert the scalar aggregate function into an index scan that reads only one row, as shown in the following example:

obclient> CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 INT, c3 INT, KEY IDX_c2_c3(c2,c3));
Query OK, 0 rows affected 

obclient> SELECT MIN(c2) FROM t1;
<==>
obclient> SELECT MIN(c2) FROM (SELECT c2 FROM t1 ORDER BY c2 LIMIT 1) AS t;

obclient> EXPLAIN SELECT MIN(c2) FROM t1;

Query Plan: 
==================================================
|ID|OPERATOR       |NAME          |EST. ROWS|COST|
--------------------------------------------------
|0 |SCALAR GROUP BY|              |1        |37  |
|1 | SUBPLAN SCAN  |subquery_table|1        |37  |
|2 |  TABLE SCAN   |t1(idx_c2_c3) |1        |36  |
==================================================

Outputs & filters:
-------------------------------------
  0 - output([T_FUN_MIN(subquery_table.c2)]), filter(nil),
      group(nil), agg_func([T_FUN_MIN(subquery_table.c2)])
  1 - output([subquery_table.c2]), filter(nil),
      access([subquery_table.c2])
  2 - output([t1.c2]), filter([(T_OP_IS_NOT, t1.c2, NULL, 0)]),
      access([t1.c2]), partitions(p0),
      limit(1), offset(nil)

If the parameter of the SELECT MIN/SELECT MAX function is a constant and contains the GROUP BY clause, you can replace the MIN/MAX function with a constant to reduce the computation overhead of the MIN/MAX function.

obclient> SELECT MAX(1) FROM t1 GROUP BY c1;
<==>
obclient> SELECT 1 FROM t1 GROUP BY c1;

obclient> EXPLAIN EXTENDED_NOADDR SELECT MAX(1) FROM t1 GROUP BY c1;

Query Plan: 
===================================
|ID|OPERATOR  |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1  |1000     |411 |
===================================

Outputs & filters:
-------------------------------------
  0 - output([1]), filter(nil),
      access([t1.c1]), partitions(p0),
      is_index_back=false,
      range_key([t1.c1]), range(MIN ; MAX)always true

If the parameter of the SELECT MIN/SELECT MAX function is a constant and does not contain the GROUP BY clause, you can rewrite the query statement as shown in the following example to ensure that an index scan reads only one row.

obclient> SELECT MAX(1) FROM t1;
<==> 
obclient> SELECT MAX(t.a) FROM (SELECT 1 AS a FROM t1 LIMIT 1) t;

obclient> EXPLAIN EXTENDED_NOADDR SELECT MAX(1) FROM t1;

Query Plan: 
==================================================
|ID|OPERATOR       |NAME          |EST. ROWS|COST|
--------------------------------------------------
|0 |SCALAR GROUP BY|              |1        |37  |
|1 | SUBPLAN SCAN  |subquery_table|1        |37  |
|2 |  TABLE SCAN   |t1            |1        |36  |
==================================================

Outputs & filters:
-------------------------------------
  0 - output([T_FUN_MAX(subquery_table.subquery_col_alias)]), filter(nil),
      group(nil), agg_func([T_FUN_MAX(subquery_table.subquery_col_alias)])
  1 - output([subquery_table.subquery_col_alias]), filter(nil),
      access([subquery_table.subquery_col_alias])
  2 - output([1]), filter(nil),
      access([t1.c1]), partitions(p0),
      limit(1), offset(nil),
      is_index_back=false,
      range_key([t1.c1]), range(MIN ; MAX)always true

Rule-based query rewriting includes subquery-related rewriting, outer join elimination, conditional simplification, and rewriting of non-SPJ queries.

Subquery rewriting

The subquery is executed only once.
The optimizer can choose a better join order and join method based on statistics.
After the join conditions and filter conditions of the subquery are rewritten into those of the parent query, the optimizer can further optimize the conditions, such as pushing down the conditions.

Subquery rewriting includes view merging, subquery expansion, and rewriting ANY/ALL predicates into MAX/MIN predicates.

View merging

OceanBase Database supports merging SPJ views. As shown in the following example, query Q1 can be rewritten into Q2:

obclient>CREATE TABLE t1 (c1 INT, c2 INT);
Query OK, 0 rows affected 

obclient>CREATE TABLE t2 (c1 INT PRIMARY KEY, c2 INT);
Query OK, 0 rows affected 

obclient>CREATE TABLE t3 (c1 INT PRIMARY KEY, c2 INT);
Query OK, 0 rows affected 

Q1: 
obclient>SELECT t1.c1, v.c1 
                FROM t1, (SELECT t2.c1, t3.c2 
                FROM t2, t3 
                WHERE t2.c1 = t3.c1) v 
                WHERE t1.c2 = v.c2;
<==>
Q2: 
obclient>SELECT t1.c1, t2.c1 
                FROM t1, t2, t3 
                WHERE t2.c1 = t3.c1 AND t1.c2 = t3.c2;

If Q1 is not rewritten, possible join orders are as follows:

t1, v(t2,t3)
t1, v(t3,t2)
v(t2,t3), t1
v(t3,t2), t1

After view merging, possible join orders are as follows:

t1, t2, t3
t1, t3, t2
t2, t1, t3
t2, t3, t1
t3, t1, t2
t3, t2, t1

Subquery expansion

Subquery expansion can be done in the following ways:

Rewrite the condition so that the generated join statement can return the same rows as the original statement.

Expand the subquery into a semi join or anti join.

In the following example, t2.c2 does not have uniqueness. The semi join-based execution plan is as follows:

obclient> CREATE TABLE t1 (c1 INT, c2 INT);
Query OK, 0 rows affected 

obclient> CREATE TABLE t2 (c1 INT PRIMARY KEY, c2 INT);
Query OK, 0 rows affected 

obclient> EXPLAIN SELECT * FROM t1 WHERE t1.c1 IN (SELECT t2.c2 FROM t2);

Query Plan: 
=======================================
|ID|OPERATOR      |NAME|EST. ROWS|COST|
---------------------------------------
|0 |HASH SEMI JOIN|    |495      |3931|
|1 | TABLE SCAN   |t1  |1000     |499 |
|2 | TABLE SCAN   |t2  |1000     |433 |
=======================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2]), filter(nil),
      equal_conds([t1.c1 = t2.c2]), other_conds(nil)
  1 - output([t1.c1], [t1.c2]), filter(nil),
      access([t1.c1], [t1.c2]), partitions(p0)
  2 - output([t2.c2]), filter(nil),
      access([t2.c2]), partitions(p0)

If you change the preceding operator IN to NOT IN, the anti join-based execution plan is as follows:

obclient> EXPLAIN SELECT * FROM t1 WHERE t1.c1 NOT IN (SELECT t2.c2 FROM t2);

Query Plan:
================================================
|ID|OPERATOR             |NAME|EST. ROWS|COST  |
------------------------------------------------
|0 |NESTED-LOOP ANTI JOIN|    |0        |520245|
|1 | TABLE SCAN          |t1  |1000     |499   |
|2 | TABLE SCAN          |t2  |22       |517   |
================================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2]), filter(nil),
      conds(nil), nl_params_([t1.c1], [(T_OP_IS, t1.c1, NULL, 0)])
  1 - output([t1.c1], [t1.c2], [(T_OP_IS, t1.c1, NULL, 0)]), filter(nil),
      access([t1.c1], [t1.c2]), partitions(p0)
  2 - output([t2.c2]), filter([(T_OP_OR, ? = t2.c2, ?, (T_OP_IS, t2.c2, NULL, 0))]),
      access([t2.c2]), partitions(p0)

Expand the subquery into an inner join

Q1: 
obclient> SELECT * FROM t1 WHERE t1.c1 IN  (SELECT t2.c1 FROM t2);
<==>
Q2: 
obclient> SELECT t1.* FROM t1, t2 WHERE t1.c1 = t2.c1;

The execution plan after Q1 is rewritten is as follows:

obclient> EXPLAIN SELECT * FROM t1 WHERE t1.c1 IN (SELECT t2.c1 FROM t2);

Query Plan:
 ====================================
|ID|OPERATOR   |NAME|EST. ROWS|COST|
------------------------------------
|0 |HASH JOIN  |    |1980     |3725|
|1 | TABLE SCAN|t2  |1000     |411 |
|2 | TABLE SCAN|t1  |1000     |499 |
====================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2]), filter(nil),
      equal_conds([t1.c1 = t2.c1]), other_conds(nil)
  1 - output([t2.c1]), filter(nil),
      access([t2.c1]), partitions(p0)
  2 - output([t1.c1], [t1.c2]), filter(nil),
      access([t1.c1], [t1.c2]), partitions(p0)

You can expand NOT IN, IN, NOT EXISTS, EXISTS, ANY, and ALL subqueries into inner joins.

Rewrite using MAX/MIN after converting ANY/ALL

val > ALL(SELECT col_item ...)  <==> val > (SELECT MAX(col_item) ...);
val >= ALL(SELECT col_item ...) <==> val >= (SELECT MAX(col_item) ...);
val < ALL(SELECT col_item ...)  <==> val < (SELECT MIN(col_item) ...);
val <= ALL(SELECT col_item ...) <==> val <= (SELECT MIN(col_item) ...);
val > ANY(SELECT col_item ...)  <==> val > (SELECT MIN(col_item) ...);
val >= ANY(SELECT col_item ...) <==> val >= (SELECT MIN(col_item) ...);
val < ANY(SELECT col_item ...)  <==> val < (SELECT MAX(col_item) ...);
val <= ANY(SELECT col_item ...) <==> val <= (SELECT MAX(col_item) ...);

obclient> SELECT c1 FROM t1 WHERE c1 > ANY(SELECT c1 FROM t2);
<==>
obclient> SELECT c1 FROM t1 WHERE c1 > (SELECT MIN(c1) FROM t2);

obclient> EXPLAIN SELECT c1 FROM t1 WHERE c1 > ANY(SELECT c1 FROM t2);

Query Plan:
 ===================================================
|ID|OPERATOR        |NAME          |EST. ROWS|COST|
---------------------------------------------------
|0 |SUBPLAN FILTER  |              |1        |73  |
|1 | TABLE SCAN     |t1            |1        |37  |
|2 | SCALAR GROUP BY|              |1        |37  |
|3 |  SUBPLAN SCAN  |subquery_table|1        |37  |
|4 |   TABLE SCAN   |t2            |1        |36  |
===================================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1]), filter([t1.c1 > ANY(subquery(1))]),
      exec_params_(nil), onetime_exprs_(nil), init_plan_idxs_([1])
  1 - output([t1.c1]), filter(nil),
      access([t1.c1]), partitions(p0)
  2 - output([T_FUN_MIN(subquery_table.c1)]), filter(nil),
      group(nil), agg_func([T_FUN_MIN(subquery_table.c1)])
  3 - output([subquery_table.c1]), filter(nil),
      access([subquery_table.c1])
  4 - output([t2.c1]), filter(nil),
      access([t2.c1]), partitions(p0),
      limit(1), offset(nil)

Elimination of outer joins

Here is an example:

obclient>SELECT t1.c1, t2.c2 FROM t1 LEFT JOIN t2 ON t1.c2 = t2.c2;

obclient> SELECT t1.c1, t2.c2 FROM t1 LEFT JOIN t2 ON t1.c2 = t2.c2 WHERE t2.c2 > 5;
<==>
obclient> SELECT t1.c1, t2.c2 FROM t1 INNER JOIN t2 ON t1.c2 = t2.c2 
            WHERE t2.c2 > 5;

Simplify conditions

Elimination of the HAVING condition

obclient>SELECT * FROM t1, t2 WHERE t1.c1 = t2.c1 HAVING t1.c2 > 1;
<==>
obclient>SELECT * FROM t1, t2 WHERE t1.c1 = t2.c1 AND t1.c2 > 1;

The following example shows a rewritten execution plan. The t1.c2 > 1 condition is pushed down to the TABLE SCAN layer.

obclient> EXPLAIN SELECT * FROM t1, t2 WHERE t1.c1 = t2.c1 HAVING t1.c2 > 1;

Query Plan: 
=========================================
|ID|OPERATOR        |NAME|EST. ROWS|COST|
-----------------------------------------
|0 |NESTED-LOOP JOIN|    |1        |59  |
|1 | TABLE SCAN     |t1  |1        |37  |
|2 | TABLE GET      |t2  |1        |36  |
=========================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2], [t2.c1], [t2.c2]), filter(nil),
      conds(nil), nl_params_([t1.c1])
  1 - output([t1.c1], [t1.c2]), filter([t1.c2 > 1]),
      access([t1.c1], [t1.c2]), partitions(p0)
  2 - output([t2.c1], [t2.c2]), filter(nil),
      access([t2.c1], [t2.c2]), partitions(p0)

Derivation of equivalent conditions

Based on the transitivity of comparison operators, new conditions can be derived. This reduces the number of rows to be processed or helps select a more efficient index.

obclient> CREATE TABLE t1(c1 INT PRIMARY KEY, c2 INT);
Query OK, 0 rows affected

obclient> CREATE TABLE t2(c1 INT PRIMARY KEY, c2 INT, c3 INT, KEY IDX_c2(c2));
Query OK, 0 rows affected
/*This command needs to be run in MySQL mode.*/

obclient> EXPLAIN EXTENDED_NOADDR SELECT t1.c1, t2.c2 FROM t1, t2 WHERE t1.c1 = t2.c2 AND t1.c1 > 2;

Query Plan:
==========================================
|ID|OPERATOR   |NAME      |EST. ROWS|COST|
------------------------------------------
|0 |MERGE JOIN |          |5        |78  |
|1 | TABLE SCAN|t2(IDX_c2)|5        |37  |
|2 | TABLE SCAN|t1        |3        |37  |
==========================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t2.c2]), filter(nil),
      equal_conds([t1.c1 = t2.c2]), other_conds(nil)
  1 - output([t2.c2]), filter(nil),
      access([t2.c2]), partitions(p0),
      is_index_back=false,
      range_key([t2.c2], [t2.c1]), range(2,MAX ; MAX,MAX),
      range_cond([t2.c2 > 2])
  2 - output([t1.c1]), filter(nil),
      access([t1.c1]), partitions(p0),
      is_index_back=false,
      range_key([t1.c1]), range(2 ; MAX),
      range_cond([t1.c1 > 2])

Elimination of tautologies and contradictions

The system can eliminate tautologies and contradictions:

false and expr = Always False
true or expr = Always True

obclient> EXPLAIN EXTENDED_NOADDR SELECT * FROM t1 WHERE 0 > 1 AND c1 = 3;

Query Plan: 
===================================
|ID|OPERATOR  |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1  |0        |38  |
===================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2]), filter([0], [t1.c1 = 3]), startup_filter([0]),
      access([t1.c1], [t1.c2]), partitions(p0),
      is_index_back=false, filter_before_indexback[false,false],
      range_key([t1.__pk_increment], [t1.__pk_cluster_id], [t1.__pk_partition_id]),
      range(MAX,MAX,MAX ; MIN,MIN,MIN)always false

Non-SPJ rewriting

Elimination of redundant sorts

The ORDER BY clause contains duplicate columns. In this case, redundant sorts can be eliminated by sorting only once.

obclient> SELECT * FROM t1 WHERE c2 = 5 ORDER BY c1, c1, c2, c3 ;
<==>
obclient> SELECT * FROM t1 WHERE c2 = 5 ORDER BY c1, c2, c3;

The ORDER BY clause contains a column that has a single-valued condition in the WHERE clause. In this case, the sorting on the column can be eliminated.

obclient> SELECT * FROM t1 WHERE c2 = 5 ORDER BY c1, c2, c3;
<==>
obclient> SELECT * FROM t1 WHERE c2 = 5 ORDER BY c1, c3;

If the current query has an ORDER BY clause but no LIMIT clause and is located within a set operation of a parent query, the ORDER BY clause can be eliminated. The reason is that the result of a UNION operation on two ordered sets is unordered. However, if the ORDER BY clause contains a LIMIT clause, it specifies to retrieve the largest or smallest N records. In this case, the ORDER BY clause cannot be eliminated to avoid semantic errors.
```
obclient> (SELECT c1,c2 FROM t1 ORDER BY c1) UNION (SELECT c3,c4 FROM t2 ORDER BY c3);
<==>
obclient> (SELECT c1,c2 FROM t1) UNION (SELECT c3,c4 FROM t2);
```

Pushdown of the LIMIT clause into subqueries

Example 1: Push the LIMIT clause into a view.

obclient> SELECT * FROM (SELECT * FROM t1 ORDER BY c1) a LIMIT 1; 
<==>
obclient> SELECT * FROM (SELECT * FROM t1 ORDER BY c1 LIMIT 1) a LIMIT 1;

Example 2: Push the LIMIT clause into a subquery corresponding to the UNION operator.

obclient> (SELECT c1,c2 FROM t1) UNION ALL (SELECT c3,c4 FROM t2) LIMIT 5;
<==>
obclient> (SELECT c1,c2 FROM t1 LIMIT 5) UNION ALL (SELECT c3,c4 FROM t2 limit 5) LIMIT 5;

Pushdown of the LIMIT clause to outer or cross joins

Q1:
SELECT * FROM t1 LEFT JOIN t2 ON t1.c1 = t2.c1 LIMIT 1;
==>
Q2:
SELECT * FROM V LEFT JOIN t2 ON t1.c1 = t2.c1 LIMIT 1;
V: (SELECT * FROM t1 LIMIT 1);

=============================================
|ID|OPERATOR         |NAME   |EST. ROWS|COST|
---------------------------------------------
|0 |LIMIT            |       |1        |4   |
|1 | MERGE OUTER JOIN|       |1        |4   |
|2 |  SUBPLAN SCAN   |VIEW1  |1        |2   |
|3 |   TABLE SCAN    |t1(idx)|1        |2   |
|4 |  SORT           |       |1        |2   |
|5 |   TABLE SCAN    |t2     |1        |2   |
=============================================
Outputs & filters: 
-------------------------------------
0 - output([1]), filter(nil), rowset=256, limit(1), offset(nil)
1 - output(nil), filter(nil), rowset=256, 
equal_conds([VIEW1.t1.c1 = t2.c1]), other_conds(nil)
2 - output([VIEW1.t1.c1]), filter(nil), rowset=256, 
access([VIEW1.t1.c1])
3 - output([t1.c1]), filter(nil), rowset=256, 
access([t1.c1]), partitions(p0), 
limit(1), offset(nil)
4 - output([t2.c1]), filter(nil), rowset=256, sort_keys([t2.c1, ASC])
5 - output([t2.c1]), filter(nil), rowset=256, 
access([t2.c1]), partitions(p0)

Similarly, if the conditions are met, the LIMIT clause can be pushed down to the right table of a right outer join.

For cross joins, the LIMIT clause can be pushed down to both sides of the join. Take a look at the following example:

Q3:
SELECT 1 FROM t1, t2 WHERE t1.c1 > 0 ORDER BY t1.c1 LIMIT 1;
==>
Q4:
SELECT 1 FROM V1, V2 LIMIT 1;
V1: SELECT 1 FROM t1 WHERE t1.c1 > 0 ORDER BY t1.c1 LIMIT 1; 
V2: SELECT 1 FROM t2 LIMIT 1;

=====================================================
|ID|OPERATOR                   |NAME |EST. ROWS|COST|
-----------------------------------------------------
|0 |LIMIT                      |     |1        |5   |
|1 | NESTED-LOOP JOIN CARTESIAN|     |1        |5   |
|2 |  SUBPLAN SCAN             |VIEW1|1        |3   |
|3 |   TOP-N SORT              |     |1        |3   |
|4 |    TABLE SCAN             |t1   |3        |3   |
|5 |  MATERIAL                 |     |1        |2   |
|6 |   SUBPLAN SCAN            |VIEW2|1        |2   |
|7 |    TABLE SCAN             |t2   |1        |2   |
=====================================================
Outputs & filters: 
-------------------------------------
  0 - output([1]), filter(nil), rowset=256, limit(1), offset(nil)
  1 - output(nil), filter(nil), rowset=256, 
      conds(nil), nl_params_(nil)
  2 - output(nil), filter(nil), rowset=256, 
      access(nil)
  3 - output([t1.c1]), filter(nil), rowset=256, sort_keys([t1.c1, ASC]), topn(1)
  4 - output([t1.c1]), filter([t1.c1 > 0]), rowset=256, 
      access([t1.c1]), partitions(p0)
  5 - output(nil), filter(nil), rowset=256
  6 - output(nil), filter(nil), rowset=256, 
      access(nil)
  7 - output([1]), filter([t2.c1 > 0]), rowset=256, 
      access([t2.c1]), partitions(p0), 
      limit(1), offset(nil)

Elimination of DISTINCT

If the Select Item contains only constants, you can eliminate the DISTINCT keyword and add the LIMIT 1 clause.

obclient> SELECT DISTINCT 1,2 FROM t1 ;
<==> 
obclient> SELECT 1,2 FROM t1 LIMIT 1;

obclient> CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 INT);
Query OK, 0 rows affected 

obclient> EXPLAIN EXTENDED_NOADDR SELECT DISTINCT 1,2 FROM t1;

Query Plan: 
===================================
|ID|OPERATOR  |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1  |1        |36  |
===================================

Outputs & filters:
-------------------------------------
  0 - output([1], [2]), filter(nil),
      access([t1.c1]), partitions(p0),
      limit(1), offset(nil),
      is_index_back=false,
      range_key([t1.c1]), range(MIN ; MAX)always true

obclient> CREATE TABLE t2(c1 INT, c2 INT, c3 INT, PRIMARY KEY(c1, c2));
Query OK, 0 rows affected 

obclient> SELECT DISTINCT c1, c2, c3 FROM t2;
<==>
obclient> SELECT c1, c2 c3 FROM t2;

obclient> EXPLAIN SELECT DISTINCT c1, c2, c3 FROM t2;

Query Plan: 
===================================
|ID|OPERATOR  |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t2  |1000     |455 |
===================================

Outputs & filters:
-------------------------------------
  0 - output([t2.c1], [t2.c2], [t2.c3]), filter(nil),
      access([t2.c1], [t2.c2], [t2.c3]), partitions(p0)

Rewriting of MIN/MAX

obclient> CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 INT, c3 INT, KEY IDX_c2_c3(c2,c3));
Query OK, 0 rows affected 

obclient> SELECT MIN(c2) FROM t1;
<==>
obclient> SELECT MIN(c2) FROM (SELECT c2 FROM t1 ORDER BY c2 LIMIT 1) AS t;

obclient> EXPLAIN SELECT MIN(c2) FROM t1;

Query Plan: 
==================================================
|ID|OPERATOR       |NAME          |EST. ROWS|COST|
--------------------------------------------------
|0 |SCALAR GROUP BY|              |1        |37  |
|1 | SUBPLAN SCAN  |subquery_table|1        |37  |
|2 |  TABLE SCAN   |t1(idx_c2_c3) |1        |36  |
==================================================

Outputs & filters:
-------------------------------------
  0 - output([T_FUN_MIN(subquery_table.c2)]), filter(nil),
      group(nil), agg_func([T_FUN_MIN(subquery_table.c2)])
  1 - output([subquery_table.c2]), filter(nil),
      access([subquery_table.c2])
  2 - output([t1.c2]), filter([(T_OP_IS_NOT, t1.c2, NULL, 0)]),
      access([t1.c2]), partitions(p0),
      limit(1), offset(nil)

obclient> SELECT MAX(1) FROM t1 GROUP BY c1;
<==>
obclient> SELECT 1 FROM t1 GROUP BY c1;

obclient> EXPLAIN EXTENDED_NOADDR SELECT MAX(1) FROM t1 GROUP BY c1;

Query Plan: 
===================================
|ID|OPERATOR  |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1  |1000     |411 |
===================================

Outputs & filters:
-------------------------------------
  0 - output([1]), filter(nil),
      access([t1.c1]), partitions(p0),
      is_index_back=false,
      range_key([t1.c1]), range(MIN ; MAX)always true

obclient> SELECT MAX(1) FROM t1;
<==> 
obclient> SELECT MAX(t.a) FROM (SELECT 1 AS a FROM t1 LIMIT 1) t;

obclient> EXPLAIN EXTENDED_NOADDR SELECT MAX(1) FROM t1;

Query Plan: 
==================================================
|ID|OPERATOR       |NAME          |EST. ROWS|COST|
--------------------------------------------------
|0 |SCALAR GROUP BY|              |1        |37  |
|1 | SUBPLAN SCAN  |subquery_table|1        |37  |
|2 |  TABLE SCAN   |t1            |1        |36  |
==================================================

Outputs & filters:
-------------------------------------
  0 - output([T_FUN_MAX(subquery_table.subquery_col_alias)]), filter(nil),
      group(nil), agg_func([T_FUN_MAX(subquery_table.subquery_col_alias)])
  1 - output([subquery_table.subquery_col_alias]), filter(nil),
      access([subquery_table.subquery_col_alias])
  2 - output([1]), filter(nil),
      access([t1.c1]), partitions(p0),
      limit(1), offset(nil),
      is_index_back=false,
      range_key([t1.c1]), range(MIN ; MAX)always true

OceanBase

Customer Stories

Documentation

Rule-based query rewriting

Subquery rewriting

View merging

Subquery expansion

Rewrite using MAX/MIN after converting ANY/ALL

Elimination of outer joins

Simplify conditions

Elimination of the HAVING condition

Derivation of equivalent conditions

Elimination of tautologies and contradictions

Non-SPJ rewriting

Elimination of redundant sorts

Pushdown of the LIMIT clause into subqueries

Pushdown of the LIMIT clause to outer or cross joins

Elimination of DISTINCT

Rewriting of MIN/MAX

Rule-based query rewriting

Subquery rewriting

View merging

Subquery expansion

Rewrite using MAX/MIN after converting ANY/ALL

Elimination of outer joins

Simplify conditions

Elimination of the HAVING condition

Derivation of equivalent conditions

Elimination of tautologies and contradictions

Non-SPJ rewriting

Elimination of redundant sorts

Pushdown of the LIMIT clause into subqueries

Pushdown of the LIMIT clause to outer or cross joins

Elimination of DISTINCT

Rewriting of MIN/MAX