Rule-based query rewriting|V4.2.2| docs|Distributed Database

Rule-based query rewriting

Last Updated：2025-12-02 02:50:56 Updated

Rule-based query rewriting includes subquery-related rewriting, outer join elimination, condition simplification, and rewriting of non-SPJ queries.

Subquery rewriting

The optimizer usually executes a subquery nestedly, that is, the system needs to execute the subquery as many times as the number of rows generated by the parent query. This execution method can be very inefficient because it executes the subquery multiple times. To improve the execution efficiency, the optimizer can rewrite the subquery into a join operation. The main advantages are as follows:

It avoids the need to execute the subquery multiple times.
The optimizer can choose a better join order and join method based on the statistics.
After the join conditions and filter conditions of the subquery are rewritten into those of the parent query, the optimizer can further optimize the conditions, such as pushing down the conditions.

Subquery rewriting includes view merging, subquery expansion, and rewriting ANY/ALL expressions into MAX/MIN expressions.

View merging

View merging is the process of merging a subquery that represents a view into the query that contains the view. This process helps the optimizer to choose better join orders, access paths, and other rewrite options to generate an optimal execution plan.

OceanBase Database supports merging SPJ views. Take a look at the following example to see how to rewrite query Q1 into Q2:

obclient>CREATE TABLE t1 (c1 INT, c2 INT);
Query OK, 0 rows affected 

obclient>CREATE TABLE t2 (c1 INT PRIMARY KEY, c2 INT);
Query OK, 0 rows affected 

obclient>CREATE TABLE t3 (c1 INT PRIMARY KEY, c2 INT);
Query OK, 0 rows affected 

Q1: 
obclient>SELECT t1.c1, v.c1 
                FROM t1, (SELECT t2.c1, t3.c2 
                FROM t2, t3 
                WHERE t2.c1 = t3.c1) v 
                WHERE t1.c2 = v.c2;
<==>
Q2: 
obclient>SELECT t1.c1, t2.c1 
                FROM t1, t2, t3 
                WHERE t2.c1 = t3.c1 AND t1.c2 = t3.c2;

If Q1 is not rewritten, the possible join orders are as follows:

t1, v(t2,t3)
t1, v(t3,t2)
v(t2,t3), t1
v(t3,t2), t1

After view merging, the possible join orders are as follows:

t1, t2, t3
t1, t3, t2
t2, t1, t3
t2, t3, t1
t3, t1, t2
t3, t2, t1

As we can see, view merging increases the flexibility of join orders. For complex queries, view merging increases the flexibility of path selection and other rewrite options, thereby enabling the optimizer to generate better execution plans.

Subquery expansion

Subquery expansion is the process of elevating a WHERE condition subquery to the parent query and rewriting it as a join condition. The subquery is then removed, and the parent query is transformed into a multi-table join.

This rewriting enables the optimizer to consider tables in the subquery during path selection, join method determination, and join order arrangement, thereby helping it generate better execution plans. Subqueries of the following types are involved: NOT IN, IN, NOT EXISTS, EXISTS, ANY, and ALL.

The subquery can be rewritten in the following ways:

Rewrite the condition so that the generated join statement can return the same rows as the original statement.

Rewrite it into a semi join or anti join.

Take a look at the following example. Since c2 in table t2 does not have uniqueness, the statement is rewritten into a semi join, and the execution plan is as follows:

obclient> CREATE TABLE t1 (c1 INT, c2 INT);
Query OK, 0 rows affected 

obclient> CREATE TABLE t2 (c1 INT PRIMARY KEY, c2 INT);
Query OK, 0 rows affected 

obclient> EXPLAIN SELECT * FROM t1 WHERE t1.c1 IN (SELECT t2.c2 FROM t2);

Query Plan: 
=======================================
|ID|OPERATOR      |NAME|EST. ROWS|COST|
---------------------------------------
|0 |HASH SEMI JOIN|    |495      |3931|
|1 | TABLE SCAN   |t1  |1000     |499 |
|2 | TABLE SCAN   |t2  |1000     |433 |
=======================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2]), filter(nil),
      equal_conds([t1.c1 = t2.c2]), other_conds(nil)
  1 - output([t1.c1], [t1.c2]), filter(nil),
      access([t1.c1], [t1.c2]), partitions(p0)
  2 - output([t2.c2]), filter(nil),
      access([t2.c2]), partitions(p0)

If you change the NOT IN condition at the front of the query into an IN condition, the statement can be rewritten into an anti join, and the execution plan is as follows:

obclient> EXPLAIN SELECT * FROM t1 WHERE t1.c1 NOT IN (SELECT t2.c2 FROM t2);

Query Plan:
================================================
|ID|OPERATOR             |NAME|EST. ROWS|COST  |
------------------------------------------------
|0 |NESTED-LOOP ANTI JOIN|    |0        |520245|
|1 | TABLE SCAN          |t1  |1000     |499   |
|2 | TABLE SCAN          |t2  |22       |517   |
================================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2]), filter(nil),
      conds(nil), nl_params_([t1.c1], [(T_OP_IS, t1.c1, NULL, 0)])
  1 - output([t1.c1], [t1.c2], [(T_OP_IS, t1.c1, NULL, 0)]), filter(nil),
      access([t1.c1], [t1.c2]), partitions(p0)
  2 - output([t2.c2]), filter([(T_OP_OR, ? = t2.c2, ?, (T_OP_IS, t2.c2, NULL, 0))]),
      access([t2.c2]), partitions(p0)

Rewrite it into an inner join

Take a look at the preceding example. In query Q1, if you change c2 into c1 in table t2, since c1 is the primary key and the join output is unique, you can directly rewrite the subquery into an inner join, as shown below:

Q1: 
obclient> SELECT * FROM t1 WHERE t1.c1 IN  (SELECT t2.c1 FROM t2);
<==>
Q2: 
obclient> SELECT t1.* FROM t1, t2 WHERE t1.c1 = t2.c1;

After Q1 is rewritten, the execution plan is as follows:

obclient> EXPLAIN SELECT * FROM t1 WHERE t1.c1 IN (SELECT t2.c1 FROM t2);

Query Plan:
 ====================================
|ID|OPERATOR   |NAME|EST. ROWS|COST|
------------------------------------
|0 |HASH JOIN  |    |1980     |3725|
|1 | TABLE SCAN|t2  |1000     |411 |
|2 | TABLE SCAN|t1  |1000     |499 |
====================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2]), filter(nil),
      equal_conds([t1.c1 = t2.c1]), other_conds(nil)
  1 - output([t2.c1]), filter(nil),
      access([t2.c1]), partitions(p0)
  2 - output([t1.c1], [t1.c2]), filter(nil),
      access([t1.c1], [t1.c2]), partitions(p0)

You can perform similar rewriting operations on subqueries of the NOT IN, IN, NOT EXISTS, EXISTS, ANY, and ALL types.

Rewrite using MAX/MIN after converting ANY/ALL

You can convert the subquery in an ANY/ALL condition into a subquery containing MAX/MIN if the subquery does not contain a GROUP BY clause, aggregate functions, or a HAVING condition. In this case, col_item in the following expression must be a single column with the NULL attribute:

val > ALL(SELECT col_item ...)  <==> val > (SELECT MAX(col_item) ...);
val >= ALL(SELECT col_item ...) <==> val >= (SELECT MAX(col_item) ...);
val < ALL(SELECT col_item ...)  <==> val < (SELECT MIN(col_item) ...);
val <= ALL(SELECT col_item ...) <==> val <= (SELECT MIN(col_item) ...);
val > ANY(SELECT col_item ...)  <==> val > (SELECT MIN(col_item) ...);
val >= ANY(SELECT col_item ...) <==> val >= (SELECT MIN(col_item) ...);
val < ANY(SELECT col_item ...)  <==> val < (SELECT MAX(col_item) ...);
val <= ANY(SELECT col_item ...) <==> val <= (SELECT MAX(col_item) ...);

After you convert the subquery into a subquery containing MAX/MIN, you can apply MAX/MIN to the subquery to reduce the number of scans of the inner table before the conversion. Here is an example:

obclient> SELECT c1 FROM t1 WHERE c1 > ANY(SELECT c1 FROM t2);
<==>
obclient> SELECT c1 FROM t1 WHERE c1 > (SELECT MIN(c1) FROM t2);

After you apply MAX/MIN to the subquery, you can push the LIMIT 1 condition directly down to the TABLE SCAN operation based on the primary key order of t2.c1, and output the minimum value. The execution plan is as follows:

obclient> EXPLAIN SELECT c1 FROM t1 WHERE c1 > ANY(SELECT c1 FROM t2);

Query Plan:
 ===================================================
|ID|OPERATOR        |NAME          |EST. ROWS|COST|
---------------------------------------------------
|0 |SUBPLAN FILTER  |              |1        |73  |
|1 | TABLE SCAN     |t1            |1        |37  |
|2 | SCALAR GROUP BY|              |1        |37  |
|3 |  SUBPLAN SCAN  |subquery_table|1        |37  |
|4 |   TABLE SCAN   |t2            |1        |36  |
===================================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1]), filter([t1.c1 > ANY(subquery(1))]),
      exec_params_(nil), onetime_exprs_(nil), init_plan_idxs_([1])
  1 - output([t1.c1]), filter(nil),
      access([t1.c1]), partitions(p0)
  2 - output([T_FUN_MIN(subquery_table.c1)]), filter(nil),
      group(nil), agg_func([T_FUN_MIN(subquery_table.c1)])
  3 - output([subquery_table.c1]), filter(nil),
      access([subquery_table.c1])
  4 - output([t2.c1]), filter(nil),
      access([t2.c1]), partitions(p0),
      limit(1), offset(nil)

Elimination of outer joins

Outer joins can be left outer joins, right outer joins, and full outer joins. The optimizer cannot freely exchange the order of outer joins, which limits the possible join orders. Elimination of outer joins converts outer joins into inner joins, offering more join orders to the optimizer.

To eliminate an outer join, a "null value rejecting condition" must be met, namely, a WHERE condition exists where the output result is FALSE when a value generated by the inner table is NULL.

Here is an example:

obclient>SELECT t1.c1, t2.c2 FROM t1 LEFT JOIN t2 ON t1.c2 = t2.c2;

This query statement contains an outer join. The value of t2.c2 in the output result set of the join can be NULL. If a condition t2.c2 > 5 is added, the join result set after the condition is filtered contains no NULL value in the t2.c1 column. In this case, the outer join can be converted into an inner join.

obclient> SELECT t1.c1, t2.c2 FROM t1 LEFT JOIN t2 ON t1.c2 = t2.c2 WHERE t2.c2 > 5;
<==>
obclient> SELECT t1.c1, t2.c2 FROM t1 INNER JOIN t2 ON t1.c2 = t2.c2 
            WHERE t2.c2 > 5;

Simplify conditions

Elimination of the HAVING condition

If the query does not contain an aggregate operation or a GROUP BY clause, the HAVING condition can be combined with the WHERE condition, and the HAVING clause can be removed. This way, the HAVING condition can be managed and optimized as part of the WHERE condition.

obclient>SELECT * FROM t1, t2 WHERE t1.c1 = t2.c1 HAVING t1.c2 > 1;
<==>
obclient>SELECT * FROM t1, t2 WHERE t1.c1 = t2.c1 AND t1.c2 > 1;

The following example shows the execution plan after the HAVING condition is eliminated. The t1.c2 > 1 condition is pushed down to the TABLE SCAN layer.

obclient> EXPLAIN SELECT * FROM t1, t2 WHERE t1.c1 = t2.c1 HAVING t1.c2 > 1;

Query Plan: 
=========================================
|ID|OPERATOR        |NAME|EST. ROWS|COST|
-----------------------------------------
|0 |NESTED-LOOP JOIN|    |1        |59  |
|1 | TABLE SCAN     |t1  |1        |37  |
|2 | TABLE GET      |t2  |1        |36  |
=========================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2], [t2.c1], [t2.c2]), filter(nil),
      conds(nil), nl_params_([t1.c1])
  1 - output([t1.c1], [t1.c2]), filter([t1.c2 > 1]),
      access([t1.c1], [t1.c2]), partitions(p0)
  2 - output([t2.c1], [t2.c2]), filter(nil),
      access([t2.c1], [t2.c2]), partitions(p0)

Derivation of equivalent conditions

Based on the transitivity of comparison operators, new conditions can be derived. This reduces the number of rows to be processed or helps select a more efficient index.

OceanBase Database can derive equivalent conditions for value-based joins. For example, in a table that contains the a and b columns, the condition a = b AND a > 1 can be derived into a = b AND a > 1 AND b > 1. If an index exists on the b column and the selectivity of the b > 1 condition is low for the index, the performance of accessing the table for the b column can be significantly improved.

Consider the condition t1.c1 = t2.c2 AND t1.c1 > 2. After equivalent condition derivation, it becomes t1.c1 = t2.c2 AND t1.c1 > 2 AND t2.c2 > 2. As you can see in the plan below, the t2.c2 > 2 condition is pushed down to the TABLE SCAN layer and the t2.c2 column uses the index on the t2.c2 column.

obclient> CREATE TABLE t1(c1 INT PRIMARY KEY, c2 INT);
Query OK, 0 rows affected

obclient> CREATE TABLE t2(c1 INT PRIMARY KEY, c2 INT, c3 INT, KEY IDX_c2(c2));
Query OK, 0 rows affected
/*This command needs to be run in MySQL mode.*/

obclient> EXPLAIN EXTENDED_NOADDR SELECT t1.c1, t2.c2 FROM t1, t2 WHERE t1.c1 = t2.c2 AND t1.c1 > 2;

Query Plan:
==========================================
|ID|OPERATOR   |NAME      |EST. ROWS|COST|
------------------------------------------
|0 |MERGE JOIN |          |5        |78  |
|1 | TABLE SCAN|t2(IDX_c2)|5        |37  |
|2 | TABLE SCAN|t1        |3        |37  |
==========================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t2.c2]), filter(nil),
      equal_conds([t1.c1 = t2.c2]), other_conds(nil)
  1 - output([t2.c2]), filter(nil),
      access([t2.c2]), partitions(p0),
      is_index_back=false,
      range_key([t2.c2], [t2.c1]), range(2,MAX ; MAX,MAX),
      range_cond([t2.c2 > 2])
  2 - output([t1.c1]), filter(nil),
      access([t1.c1]), partitions(p0),
      is_index_back=false,
      range_key([t1.c1]), range(2 ; MAX),
      range_cond([t1.c1 > 2])

Elimination of tautologies and contradictions

You can eliminate tautologies and contradictions:

false and expr = Always False
true or expr = Always True

As shown in the following example, for the WHERE 0 > 1 AND c1 ＝ 3 condition, since 0 > 1 is a contradiction, the AND condition is always false. Therefore, the SQL query does not need to be executed. Instead, it can directly return the result, thus accelerating the query.

obclient> EXPLAIN EXTENDED_NOADDR SELECT * FROM t1 WHERE 0 > 1 AND c1 = 3;

Query Plan: 
===================================
|ID|OPERATOR  |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1  |0        |38  |
===================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2]), filter([0], [t1.c1 = 3]), startup_filter([0]),
      access([t1.c1], [t1.c2]), partitions(p0),
      is_index_back=false, filter_before_indexback[false,false],
      range_key([t1.__pk_increment], [t1.__pk_cluster_id], [t1.__pk_partition_id]),
      range(MAX,MAX,MAX ; MIN,MIN,MIN)always false

Non-SJP rewriting

Elimination of redundant sorts

Elimination of redundant sorts is the process of removing unnecessary items from the order item list in an ORDER BY clause to reduce the sorting overhead. In the following three scenarios, redundant sorts can be eliminated:

The ORDER BY clause contains duplicate columns. In this case, the columns can be sorted after duplicates are removed.

obclient> SELECT * FROM t1 WHERE c2 = 5 ORDER BY c1, c1, c2, c3 ;
<==>
obclient> SELECT * FROM t1 WHERE c2 = 5 ORDER BY c1, c2, c3;

The WHERE condition contains a single-valued condition on a column in the ORDER BY clause. In this case, the column can be sorted.
```
obclient> SELECT * FROM t1 WHERE c2 = 5 ORDER BY c1, c2, c3;
<==>
obclient> SELECT * FROM t1 WHERE c2 = 5 ORDER BY c1, c3;
```
If the current query has an ORDER BY clause but no LIMIT clause and is located within a set operation of a parent query, the ORDER BY clause of the current query can be eliminated. The reason is that the result of a UNION operation on two ordered sets is unordered. However, if the ORDER BY clause contains a LIMIT clause, it means that the first or last N records are to be obtained. In this case, the ORDER BY clause cannot be eliminated to avoid semantic errors.
```
obclient> (SELECT c1,c2 FROM t1 ORDER BY c1) UNION (SELECT c3,c4 FROM t2 ORDER BY c3);
<==>
obclient> (SELECT c1,c2 FROM t1) UNION (SELECT c3,c4 FROM t2);
```

Pushdown of LIMIT to subqueries

Limit pushdown is the process of moving the LIMIT clause to a subquery. OceanBase Database now supports pushing down the LIMIT clause to a view (example 1) or a subquery corresponding to the UNION operation (example 2) without changing the semantics.

Example 1: Push the LIMIT clause to a view.

obclient> SELECT * FROM (SELECT * FROM t1 ORDER BY c1) a LIMIT 1; 
<==>
obclient> SELECT * FROM (SELECT * FROM t1 ORDER BY c1 LIMIT 1) a LIMIT 1;

Example 2: Push the LIMIT clause to a subquery corresponding to the UNION operation.

obclient> (SELECT c1,c2 FROM t1) UNION ALL (SELECT c3,c4 FROM t2) LIMIT 5;
<==>
obclient> (SELECT c1,c2 FROM t1 LIMIT 5) UNION ALL (SELECT c3,c4 FROM t2 limit 5) LIMIT 5;

Pushdown of LIMIT to outer or cross joins

If the SQL query statement does not contain WINDOW FUNCTION, DISTINCT, GROUP BY, or HAVING, and the WHERE condition or ORDER BY clause applies to only one side of the join or does not contain WHERE and ORDER BY, the LIMIT clause can be pushed down to one side (outer join) or both sides (cross join) of the join. This process is called pushing down the LIMIT clause to outer joins or cross joins. By pushing down the LIMIT clause, the number of rows returned by the join can be reduced, thereby lowering the execution overheads of queries.

For outer joins, a view is created on the table to which the LIMIT clause is pushed down. For example, in the following Q1 query statement, which is a left outer join, a view V is created on the t1 table:

Q1:
SELECT * FROM t1 LEFT JOIN t2 ON t1.c1 = t2.c1 LIMIT 1;
==>
Q2:
SELECT * FROM V LEFT JOIN t2 ON t1.c1 = t2.c1 LIMIT 1;
V: (SELECT * FROM t1 LIMIT 1);

If the Q1 query statement meets the aforementioned conditions, the LIMIT 1 clause can be pushed down to the t1 table to generate the Q2 query statement. Note that the original LIMIT clause must be retained after the LIMIT clause is pushed down. The execution plan before and after the pushdown is as follows:

=============================================
|ID|OPERATOR         |NAME   |EST. ROWS|COST|
---------------------------------------------
|0 |LIMIT            |       |1        |4   |
|1 | MERGE OUTER JOIN|       |1        |4   |
|2 |  SUBPLAN SCAN   |VIEW1  |1        |2   |
|3 |   TABLE SCAN    |t1(idx)|1        |2   |
|4 |  SORT           |       |1        |2   |
|5 |   TABLE SCAN    |t2     |1        |2   |
=============================================
Outputs & filters: 
-------------------------------------
0 - output([1]), filter(nil), rowset=256, limit(1), offset(nil)
1 - output(nil), filter(nil), rowset=256, 
equal_conds([VIEW1.t1.c1 = t2.c1]), other_conds(nil)
2 - output([VIEW1.t1.c1]), filter(nil), rowset=256, 
access([VIEW1.t1.c1])
3 - output([t1.c1]), filter(nil), rowset=256, 
access([t1.c1]), partitions(p0), 
limit(1), offset(nil)
4 - output([t2.c1]), filter(nil), rowset=256, sort_keys([t2.c1, ASC])
5 - output([t2.c1]), filter(nil), rowset=256, 
access([t2.c1]), partitions(p0)

The same operation can be performed on right outer joins under the aforementioned conditions.

For cross joins, the LIMIT clause can be pushed down to both sides of the join. Here is an example:

Q3:
SELECT 1 FROM t1, t2 WHERE t1.c1 > 0 ORDER BY t1.c1 LIMIT 1;
==>
Q4:
SELECT 1 FROM V1, V2 LIMIT 1;
V1: SELECT 1 FROM t1 WHERE t1.c1 > 0 ORDER BY t1.c1 LIMIT 1; 
V2: SELECT 1 FROM t2 LIMIT 1;

Q3 query statement contains no limitations for limit pushdown to outer joins, and contains only the WHERE condition and ORDER BY clause on the t1 table. Therefore, views V1 and V2 can be created on the t1 and t2 tables, respectively, and the LIMIT clause can be pushed down to the views to generate the Q4 query statement. Note that the original LIMIT clause must be retained after the LIMIT clause is pushed down. The execution plan before and after the pushdown is as follows:

=====================================================
|ID|OPERATOR                   |NAME |EST. ROWS|COST|
-----------------------------------------------------
|0 |LIMIT                      |     |1        |5   |
|1 | NESTED-LOOP JOIN CARTESIAN|     |1        |5   |
|2 |  SUBPLAN SCAN             |VIEW1|1        |3   |
|3 |   TOP-N SORT              |     |1        |3   |
|4 |    TABLE SCAN             |t1   |3        |3   |
|5 |  MATERIAL                 |     |1        |2   |
|6 |   SUBPLAN SCAN            |VIEW2|1        |2   |
|7 |    TABLE SCAN             |t2   |1        |2   |
=====================================================
Outputs & filters: 
-------------------------------------
  0 - output([1]), filter(nil), rowset=256, limit(1), offset(nil)
  1 - output(nil), filter(nil), rowset=256, 
      conds(nil), nl_params_(nil)
  2 - output(nil), filter(nil), rowset=256, 
      access(nil)
  3 - output([t1.c1]), filter(nil), rowset=256, sort_keys([t1.c1, ASC]), topn(1)
  4 - output([t1.c1]), filter([t1.c1 > 0]), rowset=256, 
      access([t1.c1]), partitions(p0)
  5 - output(nil), filter(nil), rowset=256
  6 - output(nil), filter(nil), rowset=256, 
      access(nil)
  7 - output([1]), filter([t2.c1 > 0]), rowset=256, 
      access([t2.c1]), partitions(p0), 
      limit(1), offset(nil)

For SQL queries with multiple tables joined, the preceding outer join and cross join can be applied to the LIMIT clause each time or multiple times based on the conditions to expand the rewrite space and achieve better rewrite results.

Elimination of DISTINCT

If the Select Item contains only constants, you can eliminate the DISTINCT clause and add the LIMIT 1 clause.

obclient> SELECT DISTINCT 1,2 FROM t1 ;
<==> 
obclient> SELECT 1,2 FROM t1 LIMIT 1;

obclient> CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 INT);
Query OK, 0 rows affected 

obclient> EXPLAIN EXTENDED_NOADDR SELECT DISTINCT 1,2 FROM t1;

Query Plan: 
===================================
|ID|OPERATOR  |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1  |1        |36  |
===================================

Outputs & filters:
-------------------------------------
  0 - output([1], [2]), filter(nil),
      access([t1.c1]), partitions(p0),
      limit(1), offset(nil),
      is_index_back=false,
      range_key([t1.c1]), range(MIN ; MAX)always true

If the Select Item contains a column that enforces a uniqueness constraint, you can eliminate the DISTINCT clause. In the following example, (c1, c2) is the primary key and enforces the uniqueness of c1, c2, and c3.

obclient> CREATE TABLE t2(c1 INT, c2 INT, c3 INT, PRIMARY KEY(c1, c2));
Query OK, 0 rows affected 

obclient> SELECT DISTINCT c1, c2, c3 FROM t2;
<==>
obclient> SELECT c1, c2 c3 FROM t2;

obclient> EXPLAIN SELECT DISTINCT c1, c2, c3 FROM t2;

Query Plan: 
===================================
|ID|OPERATOR  |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t2  |1000     |455 |
===================================

Outputs & filters:
-------------------------------------
  0 - output([t2.c1], [t2.c2], [t2.c3]), filter(nil),
      access([t2.c1], [t2.c2], [t2.c3]), partitions(p0)

Rewriting of MIN/MAX

If the parameter of the MIN/MAX function is a prefix column of an index and does not contain the GROUP BY clause, you can convert the scalar aggregate into an index scan that reads only one row, as shown in the following example:

obclient> CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 INT, c3 INT, KEY IDX_c2_c3(c2,c3));
Query OK, 0 rows affected 

obclient> SELECT MIN(c2) FROM t1;
<==>
obclient> SELECT MIN(c2) FROM (SELECT c2 FROM t1 ORDER BY c2 LIMIT 1) AS t;

obclient> EXPLAIN SELECT MIN(c2) FROM t1;

Query Plan: 
==================================================
|ID|OPERATOR       |NAME          |EST. ROWS|COST|
--------------------------------------------------
|0 |SCALAR GROUP BY|              |1        |37  |
|1 | SUBPLAN SCAN  |subquery_table|1        |37  |
|2 |  TABLE SCAN   |t1(idx_c2_c3) |1        |36  |
==================================================

Outputs & filters:
-------------------------------------
  0 - output([T_FUN_MIN(subquery_table.c2)]), filter(nil),
      group(nil), agg_func([T_FUN_MIN(subquery_table.c2)])
  1 - output([subquery_table.c2]), filter(nil),
      access([subquery_table.c2])
  2 - output([t1.c2]), filter([(T_OP_IS_NOT, t1.c2, NULL, 0)]),
      access([t1.c2]), partitions(p0),
      limit(1), offset(nil)

If the parameter of the SELECT MIN/SELECT MAX statement is a constant and contains the GROUP BY clause, you can replace the MIN/MAX function with a constant to reduce the computation overhead of the MIN/MAX function.

obclient> SELECT MAX(1) FROM t1 GROUP BY c1;
<==>
obclient> SELECT 1 FROM t1 GROUP BY c1;

obclient> EXPLAIN EXTENDED_NOADDR SELECT MAX(1) FROM t1 GROUP BY c1;

Query Plan: 
===================================
|ID|OPERATOR  |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1  |1000     |411 |
===================================

Outputs & filters:
-------------------------------------
  0 - output([1]), filter(nil),
      access([t1.c1]), partitions(p0),
      is_index_back=false,
      range_key([t1.c1]), range(MIN ; MAX)always true

If the parameter of the SELECT MIN/SELECT MAX statement is a constant and does not contain the GROUP BY clause, you can rewrite the statement as shown in the following example to scan only one row during an index scan.

obclient> SELECT MAX(1) FROM t1;
<==> 
obclient> SELECT MAX(t.a) FROM (SELECT 1 AS a FROM t1 LIMIT 1) t;

obclient> EXPLAIN EXTENDED_NOADDR SELECT MAX(1) FROM t1;

Query Plan: 
==================================================
|ID|OPERATOR       |NAME          |EST. ROWS|COST|
--------------------------------------------------
|0 |SCALAR GROUP BY|              |1        |37  |
|1 | SUBPLAN SCAN  |subquery_table|1        |37  |
|2 |  TABLE SCAN   |t1            |1        |36  |
==================================================

Outputs & filters:
-------------------------------------
  0 - output([T_FUN_MAX(subquery_table.subquery_col_alias)]), filter(nil),
      group(nil), agg_func([T_FUN_MAX(subquery_table.subquery_col_alias)])
  1 - output([subquery_table.subquery_col_alias]), filter(nil),
      access([subquery_table.subquery_col_alias])
  2 - output([1]), filter(nil),
      access([t1.c1]), partitions(p0),
      limit(1), offset(nil),
      is_index_back=false,
      range_key([t1.c1]), range(MIN ; MAX)always true