Rule-based query rewriting|V4.3.0| docs|Distributed Database

Rule-based query rewriting

Last Updated：2025-12-02 08:38:05 Updated

Rule-based query rewriting includes subquery-related rewriting, outer join elimination, condition simplification, and rewriting that does not fall under the Select-Project-Join (SPJ) category.

Subquery rewriting

The optimizer usually executes a subquery nestedly, that is, the system needs to execute the subquery for each row generated by the parent query. This method is inefficient because the subquery is executed multiple times. To improve the execution efficiency, the optimizer can rewrite the subquery into a join. The main advantages are as follows:

The subquery is executed only once.
The optimizer can choose a better join order and join method based on statistics.
After the join conditions and filter conditions of the subquery are rewritten into those of the parent query, the optimizer can apply further optimizations, such as condition pushing.

The optimizer can rewrite subqueries through view merging, subquery expansion, and rewriting ANY/ALL as MAX/MIN.

View merging

View merging is the process of merging a subquery that represents a view into the query that contains the view. This helps the optimizer to add more join orders, access paths, and other transformations, thereby enabling it to generate a better execution plan.

OceanBase Database supports merging SPJ views. As shown in the following example, query Q1 can be rewritten into Q2:

obclient>CREATE TABLE t1 (c1 INT, c2 INT);
Query OK, 0 rows affected 

obclient>CREATE TABLE t2 (c1 INT PRIMARY KEY, c2 INT);
Query OK, 0 rows affected 

obclient>CREATE TABLE t3 (c1 INT PRIMARY KEY, c2 INT);
Query OK, 0 rows affected 

Q1: 
obclient>SELECT t1.c1, v.c1 
                FROM t1, (SELECT t2.c1, t3.c2 
                FROM t2, t3 
                WHERE t2.c1 = t3.c1) v 
                WHERE t1.c2 = v.c2;
<==>
Q2: 
obclient>SELECT t1.c1, t2.c1 
                FROM t1, t2, t3 
                WHERE t2.c1 = t3.c1 AND t1.c2 = t3.c2;

If Q1 is not rewritten, it can be executed in the following join orders:

t1, v(t2,t3)
t1, v(t3,t2)
v(t2,t3), t1
v(t3,t2), t1

After view merging, the possible join orders are as follows:

t1, t2, t3
t1, t3, t2
t2, t1, t3
t2, t3, t1
t3, t1, t2
t3, t2, t1

As we can see, view merging increases the selectivity of join orders. For complex queries, view merging enlarges the selection of access paths and transformations, enabling the optimizer to generate better execution plans.

Subquery expansion

Subquery expansion is the process of bringing a subquery in the WHERE condition to the parent query and rewriting it as a join condition. After the subquery is expanded, it is removed, and the parent query becomes a multi-table join.

This rewriting enables the optimizer to consider tables in the subquery during path selection, join method determination, and join order arrangement, thereby helping it generate better execution plans. Subqueries involving the following expressions are generally supported: NOT IN, IN, NOT EXISTS, EXISTS, ANY, and ALL.

The subquery expansion process is as follows:

Rewrite the conditions to ensure that the generated join statement returns the same rows as the original statement.

Expand the subquery into a semi join or anti join.

In the following example, t2.c2 does not have uniqueness. The rewritten join statement is as follows:

obclient> CREATE TABLE t1 (c1 INT, c2 INT);
Query OK, 0 rows affected 

obclient> CREATE TABLE t2 (c1 INT PRIMARY KEY, c2 INT);
Query OK, 0 rows affected 

obclient> EXPLAIN SELECT * FROM t1 WHERE t1.c1 IN (SELECT t2.c2 FROM t2);

Query Plan: 
=======================================
|ID|OPERATOR      |NAME|EST. ROWS|COST|
---------------------------------------
|0 |HASH SEMI JOIN|    |495      |3931|
|1 | TABLE SCAN   |t1  |1000     |499 |
|2 | TABLE SCAN   |t2  |1000     |433 |
=======================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2]), filter(nil),
      equal_conds([t1.c1 = t2.c2]), other_conds(nil)
  1 - output([t1.c1], [t1.c2]), filter(nil),
      access([t1.c1], [t1.c2]), partitions(p0)
  2 - output([t2.c2]), filter(nil),
      access([t2.c2]), partitions(p0)

If you change the preceding operator IN to NOT IN, you can rewrite it into an anti join, as shown in the following example:

obclient> EXPLAIN SELECT * FROM t1 WHERE t1.c1 NOT IN (SELECT t2.c2 FROM t2);

Query Plan:
================================================
|ID|OPERATOR             |NAME|EST. ROWS|COST  |
------------------------------------------------
|0 |NESTED-LOOP ANTI JOIN|    |0        |520245|
|1 | TABLE SCAN          |t1  |1000     |499   |
|2 | TABLE SCAN          |t2  |22       |517   |
================================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2]), filter(nil),
      conds(nil), nl_params_([t1.c1], [(T_OP_IS, t1.c1, NULL, 0)])
  1 - output([t1.c1], [t1.c2], [(T_OP_IS, t1.c1, NULL, 0)]), filter(nil),
      access([t1.c1], [t1.c2]), partitions(p0)
  2 - output([t2.c2]), filter([(T_OP_OR, ? = t2.c2, ?, (T_OP_IS, t2.c2, NULL, 0))]),
      access([t2.c2]), partitions(p0)

Expand the subquery into an inner join.

In the preceding example, if you change t2.c2 into t2.c1, since t2.c1 is the primary key, the join result will have uniqueness. In this case, you can directly convert the subquery into an inner join, as shown in the following example:

Q1: 
obclient> SELECT * FROM t1 WHERE t1.c1 IN  (SELECT t2.c1 FROM t2);
<==>
Q2: 
obclient> SELECT t1.* FROM t1, t2 WHERE t1.c1 = t2.c1;

The rewritten statement is as follows:

obclient> EXPLAIN SELECT * FROM t1 WHERE t1.c1 IN (SELECT t2.c1 FROM t2);

Query Plan:
 ====================================
|ID|OPERATOR   |NAME|EST. ROWS|COST|
------------------------------------
|0 |HASH JOIN  |    |1980     |3725|
|1 | TABLE SCAN|t2  |1000     |411 |
|2 | TABLE SCAN|t1  |1000     |499 |
====================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2]), filter(nil),
      equal_conds([t1.c1 = t2.c1]), other_conds(nil)
  1 - output([t2.c1]), filter(nil),
      access([t2.c1]), partitions(p0)
  2 - output([t1.c1], [t1.c2]), filter(nil),
      access([t1.c1], [t1.c2]), partitions(p0)

The preceding rewriting process applies to subqueries involving NOT IN, IN, NOT EXISTS, EXISTS, ANY, and ALL.

Rewrite using MAX/MIN after converting ANY/ALL

You can convert the subquery in an ANY/ALL condition without a GROUP BY clause, aggregate functions, or a HAVING condition into an equivalent subquery containing the aggregate function MIN or MAX. Specifically, col_item must be a single column with the NULL attribute:

val > ALL(SELECT col_item ...)  <==> val > (SELECT MAX(col_item) ...);
val >= ALL(SELECT col_item ...) <==> val >= (SELECT MAX(col_item) ...);
val < ALL(SELECT col_item ...)  <==> val < (SELECT MIN(col_item) ...);
val <= ALL(SELECT col_item ...) <==> val <= (SELECT MIN(col_item) ...);
val > ANY(SELECT col_item ...)  <==> val > (SELECT MIN(col_item) ...);
val >= ANY(SELECT col_item ...) <==> val >= (SELECT MIN(col_item) ...);
val < ANY(SELECT col_item ...)  <==> val < (SELECT MAX(col_item) ...);
val <= ANY(SELECT col_item ...) <==> val <= (SELECT MAX(col_item) ...);

After you convert the subquery into one containing MAX or MIN, you can apply MAX or MIN to further rewrite the query, thereby reducing the number of scans of the inner table before the conversion. Here is an example:

obclient> SELECT c1 FROM t1 WHERE c1 > ANY(SELECT c1 FROM t2);
<==>
obclient> SELECT c1 FROM t1 WHERE c1 > (SELECT MIN(c1) FROM t2);

After you apply MAX or MIN for rewriting, you can push the LIMIT 1 condition directly down to the TABLE SCAN operation based on the primary key order of t2.c1, and output the MIN value in the execution plan, as shown below:

obclient> EXPLAIN SELECT c1 FROM t1 WHERE c1 > ANY(SELECT c1 FROM t2);

Query Plan:
 ===================================================
|ID|OPERATOR        |NAME          |EST. ROWS|COST|
---------------------------------------------------
|0 |SUBPLAN FILTER  |              |1        |73  |
|1 | TABLE SCAN     |t1            |1        |37  |
|2 | SCALAR GROUP BY|              |1        |37  |
|3 |  SUBPLAN SCAN  |subquery_table|1        |37  |
|4 |   TABLE SCAN   |t2            |1        |36  |
===================================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1]), filter([t1.c1 > ANY(subquery(1))]),
      exec_params_(nil), onetime_exprs_(nil), init_plan_idxs_([1])
  1 - output([t1.c1]), filter(nil),
      access([t1.c1]), partitions(p0)
  2 - output([T_FUN_MIN(subquery_table.c1)]), filter(nil),
      group(nil), agg_func([T_FUN_MIN(subquery_table.c1)])
  3 - output([subquery_table.c1]), filter(nil),
      access([subquery_table.c1])
  4 - output([t2.c1]), filter(nil),
      access([t2.c1]), partitions(p0),
      limit(1), offset(nil)

Elimination of outer joins

Outer joins can be left outer joins, right outer joins, and full outer joins. The optimizer cannot arbitrarily change the order of tables in outer joins, which limits the possible join paths it can explore. Elimination of outer joins is the process of converting outer joins into inner joins, so that the optimizer can explore more join paths.

To eliminate an outer join, a "null value rejection condition" must exist in the WHERE condition. A null value rejection condition is a condition that returns FALSE when a value generated by the inner table is NULL.

Here is an example:

obclient>SELECT t1.c1, t2.c2 FROM t1 LEFT JOIN t2 ON t1.c2 = t2.c2;

This query statement contains an outer join. The value of t2.c2 in the output row can be NULL. If a condition t2.c2 > 5 is added, the value of t2.c1 after the condition is filtered cannot be NULL. In this case, the outer join can be converted into an inner join.

obclient> SELECT t1.c1, t2.c2 FROM t1 LEFT JOIN t2 ON t1.c2 = t2.c2 WHERE t2.c2 > 5;
<==>
obclient> SELECT t1.c1, t2.c2 FROM t1 INNER JOIN t2 ON t1.c2 = t2.c2 
            WHERE t2.c2 > 5;

Simplify condition rewriting

Elimination of the HAVING condition

If the query does not contain an aggregate operation or a GROUP BY clause, the HAVING condition can be combined with the WHERE condition and eliminated. This way, the HAVING condition can be managed and optimized together with the WHERE condition.

obclient>SELECT * FROM t1, t2 WHERE t1.c1 = t2.c1 HAVING t1.c2 > 1;
<==>
obclient>SELECT * FROM t1, t2 WHERE t1.c1 = t2.c1 AND t1.c2 > 1;

The following example shows a rewritten execution plan. The t1.c2 > 1 condition is pushed down to the TABLE SCAN layer.

obclient> EXPLAIN SELECT * FROM t1, t2 WHERE t1.c1 = t2.c1 HAVING t1.c2 > 1;

Query Plan: 
=========================================
|ID|OPERATOR        |NAME|EST. ROWS|COST|
-----------------------------------------
|0 |NESTED-LOOP JOIN|    |1        |59  |
|1 | TABLE SCAN     |t1  |1        |37  |
|2 | TABLE GET      |t2  |1        |36  |
=========================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2], [t2.c1], [t2.c2]), filter(nil),
      conds(nil), nl_params_([t1.c1])
  1 - output([t1.c1], [t1.c2]), filter([t1.c2 > 1]),
      access([t1.c1], [t1.c2]), partitions(p0)
  2 - output([t2.c1], [t2.c2]), filter(nil),
      access([t2.c1], [t2.c2]), partitions(p0)

Derivation of equivalent conditions

Based on the transitivity of comparison operators, new conditions can be derived. This reduces the number of rows to be processed or helps select a more efficient index.

OceanBase Database can derive equivalent conditions for value-based joins. For example, in a table that contains the a and b columns, the condition a = b AND a > 1 can be derived into a = b AND a > 1 AND b > 1. If an index exists on the b column and the selectivity of the b > 1 condition is very low for the index, the performance of accessing the table based on the b column can be greatly improved.

Consider the condition t1.c1 = t2.c2 AND t1.c1 > 2. After equivalent condition derivation, it becomes t1.c1 = t2.c2 AND t1.c1 > 2 AND t2.c2 > 2. As we can see in the following plan, the t2.c2 > 2 condition is pushed down to the TABLE SCAN layer and the t2.c2 column is accessed based on the index corresponding to the t2.c2 column.

obclient> CREATE TABLE t1(c1 INT PRIMARY KEY, c2 INT);
Query OK, 0 rows affected

obclient> CREATE TABLE t2(c1 INT PRIMARY KEY, c2 INT, c3 INT, KEY IDX_c2(c2));
Query OK, 0 rows affected
/*This command needs to be run in MySQL mode.*/

obclient> EXPLAIN EXTENDED_NOADDR SELECT t1.c1, t2.c2 FROM t1, t2 WHERE t1.c1 = t2.c2 AND t1.c1 > 2;

Query Plan:
==========================================
|ID|OPERATOR   |NAME      |EST. ROWS|COST|
------------------------------------------
|0 |MERGE JOIN |          |5        |78  |
|1 | TABLE SCAN|t2(IDX_c2)|5        |37  |
|2 | TABLE SCAN|t1        |3        |37  |
==========================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t2.c2]), filter(nil),
      equal_conds([t1.c1 = t2.c2]), other_conds(nil)
  1 - output([t2.c2]), filter(nil),
      access([t2.c2]), partitions(p0),
      is_index_back=false,
      range_key([t2.c2], [t2.c1]), range(2,MAX ; MAX,MAX),
      range_cond([t2.c2 > 2])
  2 - output([t1.c1]), filter(nil),
      access([t1.c1]), partitions(p0),
      is_index_back=false,
      range_key([t1.c1]), range(2 ; MAX),
      range_cond([t1.c1 > 2])

Elimination of tautologies and contradictions

You can eliminate tautologies and contradictions:

false and expr = Always False
true or expr = Always True

As shown in the following example, for the WHERE 0 > 1 AND c1 = 3 condition, 0 > 1 is a contradiction, making the entire condition a contradiction. Therefore, the SQL query does not need to be executed and can be directly returned, thus accelerating the query execution.

obclient> EXPLAIN EXTENDED_NOADDR SELECT * FROM t1 WHERE 0 > 1 AND c1 = 3;

Query Plan: 
===================================
|ID|OPERATOR  |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1  |0        |38  |
===================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2]), filter([0], [t1.c1 = 3]), startup_filter([0]),
      access([t1.c1], [t1.c2]), partitions(p0),
      is_index_back=false, filter_before_indexback[false,false],
      range_key([t1.__pk_increment], [t1.__pk_cluster_id], [t1.__pk_partition_id]),
      range(MAX,MAX,MAX ; MIN,MIN,MIN)always false

Non-SJP rewriting

Elimination of redundant sorts

Redundant sort elimination removes unnecessary items from the order item list in an ORDER BY clause to reduce the sorting overhead. In the following three scenarios, redundant sorts can be eliminated:

The ORDER BY clause contains duplicate columns. In this case, sorting can be performed after deduplication.

obclient> SELECT * FROM t1 WHERE c2 = 5 ORDER BY c1, c1, c2, c3 ;
<==>
obclient> SELECT * FROM t1 WHERE c2 = 5 ORDER BY c1, c2, c3;

The column for sorting in the ORDER BY clause has a single-valued condition in the WHERE clause. In this case, sorting this column eliminates the need for the WHERE condition.
```
obclient> SELECT * FROM t1 WHERE c2 = 5 ORDER BY c1, c2, c3;
<==>
obclient> SELECT * FROM t1 WHERE c2 = 5 ORDER BY c1, c3;
```
If the current query has an ORDER BY clause but no LIMIT clause and is located within a set operation of a parent query, the ORDER BY clause can be eliminated. The reason is that the result of a UNION operation on two ordered sets is unordered. However, if the ORDER BY clause contains a LIMIT clause, it specifies to retrieve the largest or smallest N records. In this case, the ORDER BY clause cannot be eliminated to avoid semantic errors.
```
obclient> (SELECT c1,c2 FROM t1 ORDER BY c1) UNION (SELECT c3,c4 FROM t2 ORDER BY c3);
<==>
obclient> (SELECT c1,c2 FROM t1) UNION (SELECT c3,c4 FROM t2);
```

Pushdown of the LIMIT clause to subqueries

Limit pushdown rewriting is a process of pushing down the LIMIT clause to a subquery. OceanBase Database now supports pushing down the LIMIT clause to a view (example 1) or a subquery corresponding to the UNION operation (example 2) without changing the semantics.

Example 1: Push the LIMIT clause down to a view.

obclient> SELECT * FROM (SELECT * FROM t1 ORDER BY c1) a LIMIT 1; 
<==>
obclient> SELECT * FROM (SELECT * FROM t1 ORDER BY c1 LIMIT 1) a LIMIT 1;

Example 2: Push the LIMIT clause down to a subquery corresponding to the UNION operation.

obclient> (SELECT c1,c2 FROM t1) UNION ALL (SELECT c3,c4 FROM t2) LIMIT 5;
<==>
obclient> (SELECT c1,c2 FROM t1 LIMIT 5) UNION ALL (SELECT c3,c4 FROM t2 limit 5) LIMIT 5;

Limit pushdown to outer or cross joins

If the SQL query statement does not contain WINDOW FUNCTION, DISTINCT, GROUP BY, or HAVING and the WHERE condition or ORDER BY clause applies to only one side of the join or does not contain WHERE and ORDER BY, the LIMIT clause can be pushed down to one side (outer join) or both sides (cross join) of the join. This process is called limit pushdown to outer or cross joins. By pushing down the LIMIT clause, the number of rows returned by the join can be effectively reduced, thus lowering the execution overheads of the query statement.

For outer joins, a view is generated on the table to which the LIMIT clause is pushed down. Take a left outer join for example. When the LIMIT clause is pushed down to the left table, the Q1 query is rewritten into the Q2 query as follows:

Q1:
SELECT * FROM t1 LEFT JOIN t2 ON t1.c1 = t2.c1 LIMIT 1;
==>
Q2:
SELECT * FROM V LEFT JOIN t2 ON t1.c1 = t2.c1 LIMIT 1;
V: (SELECT * FROM t1 LIMIT 1);

The Q1 query contains no restrictions for limit pushdown to outer joins and does not contain WHERE and ORDER BY clauses. Therefore, a view V is generated on the left table t1, and the LIMIT 1 clause is encapsulated into the view V, thus rewriting the Q1 query into the Q2 query. Note that the original LIMIT clause must be retained when the LIMIT clause is pushed down. The rewritten query plan is as follows:

=============================================
|ID|OPERATOR         |NAME   |EST. ROWS|COST|
---------------------------------------------
|0 |LIMIT            |       |1        |4   |
|1 | MERGE OUTER JOIN|       |1        |4   |
|2 |  SUBPLAN SCAN   |VIEW1  |1        |2   |
|3 |   TABLE SCAN    |t1(idx)|1        |2   |
|4 |  SORT           |       |1        |2   |
|5 |   TABLE SCAN    |t2     |1        |2   |
=============================================
Outputs & filters: 
-------------------------------------
0 - output([1]), filter(nil), rowset=256, limit(1), offset(nil)
1 - output(nil), filter(nil), rowset=256, 
equal_conds([VIEW1.t1.c1 = t2.c1]), other_conds(nil)
2 - output([VIEW1.t1.c1]), filter(nil), rowset=256, 
access([VIEW1.t1.c1])
3 - output([t1.c1]), filter(nil), rowset=256, 
access([t1.c1]), partitions(p0), 
limit(1), offset(nil)
4 - output([t2.c1]), filter(nil), rowset=256, sort_keys([t2.c1, ASC])
5 - output([t2.c1]), filter(nil), rowset=256, 
access([t2.c1]), partitions(p0)

In the same way, the LIMIT clause can be pushed down to the right table of a right outer join if it meets the preceding conditions.

For cross joins, the LIMIT clause can be pushed down to both sides of the join. Here is an example:

Q3:
SELECT 1 FROM t1, t2 WHERE t1.c1 > 0 ORDER BY t1.c1 LIMIT 1;
==>
Q4:
SELECT 1 FROM V1, V2 LIMIT 1;
V1: SELECT 1 FROM t1 WHERE t1.c1 > 0 ORDER BY t1.c1 LIMIT 1; 
V2: SELECT 1 FROM t2 LIMIT 1;

The Q3 query contains no restrictions for limit pushdown and only contains the WHERE condition and ORDER BY clause on the t1 table. Therefore, views V1 and V2 are generated on the t1 and t2 tables, respectively, and the LIMIT clause is pushed down to the views. The Q3 query is rewritten into the Q4 query. Note that the original LIMIT clause must be retained. The rewritten query plan is as follows:

=====================================================
|ID|OPERATOR                   |NAME |EST. ROWS|COST|
-----------------------------------------------------
|0 |LIMIT                      |     |1        |5   |
|1 | NESTED-LOOP JOIN CARTESIAN|     |1        |5   |
|2 |  SUBPLAN SCAN             |VIEW1|1        |3   |
|3 |   TOP-N SORT              |     |1        |3   |
|4 |    TABLE SCAN             |t1   |3        |3   |
|5 |  MATERIAL                 |     |1        |2   |
|6 |   SUBPLAN SCAN            |VIEW2|1        |2   |
|7 |    TABLE SCAN             |t2   |1        |2   |
=====================================================
Outputs & filters: 
-------------------------------------
  0 - output([1]), filter(nil), rowset=256, limit(1), offset(nil)
  1 - output(nil), filter(nil), rowset=256, 
      conds(nil), nl_params_(nil)
  2 - output(nil), filter(nil), rowset=256, 
      access(nil)
  3 - output([t1.c1]), filter(nil), rowset=256, sort_keys([t1.c1, ASC]), topn(1)
  4 - output([t1.c1]), filter([t1.c1 > 0]), rowset=256, 
      access([t1.c1]), partitions(p0)
  5 - output(nil), filter(nil), rowset=256
  6 - output(nil), filter(nil), rowset=256, 
      access(nil)
  7 - output([1]), filter([t2.c1 > 0]), rowset=256, 
      access([t2.c1]), partitions(p0), 
      limit(1), offset(nil)

For SQL queries with multiple tables joined, the LIMIT clause can be pushed down to outer joins and cross joins multiple times to generate more rewrite options, thereby achieving better rewrite results.

Elimination of DISTINCT

If the select item contains only constants, you can eliminate the DISTINCT clause and add the LIMIT 1 clause.

obclient> SELECT DISTINCT 1,2 FROM t1 ;
<==> 
obclient> SELECT 1,2 FROM t1 LIMIT 1;

obclient> CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 INT);
Query OK, 0 rows affected 

obclient> EXPLAIN EXTENDED_NOADDR SELECT DISTINCT 1,2 FROM t1;

Query Plan: 
===================================
|ID|OPERATOR  |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1  |1        |36  |
===================================

Outputs & filters:
-------------------------------------
  0 - output([1], [2]), filter(nil),
      access([t1.c1]), partitions(p0),
      limit(1), offset(nil),
      is_index_back=false,
      range_key([t1.c1]), range(MIN ; MAX)always true

If the select item contains a column that enforces a uniqueness constraint, you can eliminate the DISTINCT clause. In the following example, (c1, c2) is the primary key and ensures the uniqueness of c1, c2, and c3.

obclient> CREATE TABLE t2(c1 INT, c2 INT, c3 INT, PRIMARY KEY(c1, c2));
Query OK, 0 rows affected 

obclient> SELECT DISTINCT c1, c2, c3 FROM t2;
<==>
obclient> SELECT c1, c2 c3 FROM t2;

obclient> EXPLAIN SELECT DISTINCT c1, c2, c3 FROM t2;

Query Plan: 
===================================
|ID|OPERATOR  |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t2  |1000     |455 |
===================================

Outputs & filters:
-------------------------------------
  0 - output([t2.c1], [t2.c2], [t2.c3]), filter(nil),
      access([t2.c1], [t2.c2], [t2.c3]), partitions(p0)

Rewriting of MIN/MAX

If the parameter of the MIN/MAX function is a prefix column of an index and does not contain the GROUP BY clause, you can convert the scalar aggregate into an index scan that reads only one row, as shown in the following example:

obclient> CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 INT, c3 INT, KEY IDX_c2_c3(c2,c3));
Query OK, 0 rows affected 

obclient> SELECT MIN(c2) FROM t1;
<==>
obclient> SELECT MIN(c2) FROM (SELECT c2 FROM t1 ORDER BY c2 LIMIT 1) AS t;

obclient> EXPLAIN SELECT MIN(c2) FROM t1;

Query Plan: 
==================================================
|ID|OPERATOR       |NAME          |EST. ROWS|COST|
--------------------------------------------------
|0 |SCALAR GROUP BY|              |1        |37  |
|1 | SUBPLAN SCAN  |subquery_table|1        |37  |
|2 |  TABLE SCAN   |t1(idx_c2_c3) |1        |36  |
==================================================

Outputs & filters:
-------------------------------------
  0 - output([T_FUN_MIN(subquery_table.c2)]), filter(nil),
      group(nil), agg_func([T_FUN_MIN(subquery_table.c2)])
  1 - output([subquery_table.c2]), filter(nil),
      access([subquery_table.c2])
  2 - output([t1.c2]), filter([(T_OP_IS_NOT, t1.c2, NULL, 0)]),
      access([t1.c2]), partitions(p0),
      limit(1), offset(nil)

If the parameter of SELECT MIN/SELECT MAX is a constant and contains the GROUP BY clause, you can replace MIN/MAX with the constant to reduce the computation overhead of MIN/MAX.

obclient> SELECT MAX(1) FROM t1 GROUP BY c1;
<==>
obclient> SELECT 1 FROM t1 GROUP BY c1;

obclient> EXPLAIN EXTENDED_NOADDR SELECT MAX(1) FROM t1 GROUP BY c1;

Query Plan: 
===================================
|ID|OPERATOR  |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1  |1000     |411 |
===================================

Outputs & filters:
-------------------------------------
  0 - output([1]), filter(nil),
      access([t1.c1]), partitions(p0),
      is_index_back=false,
      range_key([t1.c1]), range(MIN ; MAX)always true

If the parameter of SELECT MIN/SELECT MAX is a constant and does not contain the GROUP BY clause, you can rewrite it to an index scan that reads only one row, as shown in the following example.

obclient> SELECT MAX(1) FROM t1;
<==> 
obclient> SELECT MAX(t.a) FROM (SELECT 1 AS a FROM t1 LIMIT 1) t;

obclient> EXPLAIN EXTENDED_NOADDR SELECT MAX(1) FROM t1;

Query Plan: 
==================================================
|ID|OPERATOR       |NAME          |EST. ROWS|COST|
--------------------------------------------------
|0 |SCALAR GROUP BY|              |1        |37  |
|1 | SUBPLAN SCAN  |subquery_table|1        |37  |
|2 |  TABLE SCAN   |t1            |1        |36  |
==================================================

Outputs & filters:
-------------------------------------
  0 - output([T_FUN_MAX(subquery_table.subquery_col_alias)]), filter(nil),
      group(nil), agg_func([T_FUN_MAX(subquery_table.subquery_col_alias)])
  1 - output([subquery_table.subquery_col_alias]), filter(nil),
      access([subquery_table.subquery_col_alias])
  2 - output([1]), filter(nil),
      access([t1.c1]), partitions(p0),
      limit(1), offset(nil),
      is_index_back=false,
      range_key([t1.c1]), range(MIN ; MAX)always true