Rule-based query rewriting|V4.2.0| docs|Distributed Database

Rule-based query rewriting

Last Updated：2025-12-04 05:37:52 Updated

Rule-based query rewriting includes subquery-related rewriting, outer join elimination, condition simplification, and rewriting of non-SPJ queries.

Subquery rewriting

The optimizer generally executes a subquery nested within a parent query. In this case, the optimizer needs to execute the subquery multiple times as the parent query generates rows of data. This execution method has a low efficiency. To improve the execution efficiency, you can rewrite the subquery into a join operation. The main advantages are as follows:

The subquery is executed only once.
The optimizer can choose a better join order and join method based on statistics.
After the join conditions and filter conditions of the subquery are rewritten into those of the parent query, the optimizer can perform further optimizations, such as predicate pushdown, on the conditions.

There are several subquery rewriting methods, including view merging, subquery expansion, and rewriting of ANY/ALL predicates into MAX/MIN.

View merging

View merging is the process of merging a subquery that represents a view into the query that contains the view. This process helps the optimizer to increase the flexibility of join orders, access paths, and other rewrite options, thereby enabling it to generate better execution plans.

OceanBase Database supports merging SPJ views. As shown in the following example, query Q1 can be rewritten into Q2:

obclient>CREATE TABLE t1 (c1 INT, c2 INT);
Query OK, 0 rows affected 

obclient>CREATE TABLE t2 (c1 INT PRIMARY KEY, c2 INT);
Query OK, 0 rows affected 

obclient>CREATE TABLE t3 (c1 INT PRIMARY KEY, c2 INT);
Query OK, 0 rows affected 

Q1: 
obclient>SELECT t1.c1, v.c1 
                FROM t1, (SELECT t2.c1, t3.c2 
                FROM t2, t3 
                WHERE t2.c1 = t3.c1) v 
                WHERE t1.c2 = v.c2;
<==>
Q2: 
obclient>SELECT t1.c1, t2.c1 
                FROM t1, t2, t3 
                WHERE t2.c1 = t3.c1 AND t1.c2 = t3.c2;

If Q1 is not rewritten, possible join orders are as follows:

t1, v(t2,t3)
t1, v(t3,t2)
v(t2,t3), t1
v(t3,t2), t1

After view merging, possible join orders are as follows:

t1, t2, t3
t1, t3, t2
t2, t1, t3
t2, t3, t1
t3, t1, t2
t3, t2, t1

As we can see, view merging increases the flexibility of join orders. For complex queries, view merging increases the flexibility of path selection and rewrite options, thereby enabling the optimizer to generate better execution plans.

Subquery expansion

Subquery expansion is the process of moving a WHERE condition containing a subquery to the parent query and rewriting it as a join condition. The subquery is eliminated after the transformation, and the parent query is changed into a multi-table join.

This modification enables the optimizer to consider tables in the subquery during path selection, join method determination, and join order arrangement, thereby enabling it to generate better execution plans. Subquery expressions supported are NOT IN, IN, NOT EXISTS, EXISTS, ANY, and ALL.

The subquery expansion process is as follows:

Rewrite the condition so that the generated join statement can return the same rows as the original statement.

Expand the subquery into a semi join or anti join.

In the following example, t2.c2 does not have uniqueness. The original statement is rewritten into a semi join, and the execution plan is as follows:

obclient> CREATE TABLE t1 (c1 INT, c2 INT);
Query OK, 0 rows affected 

obclient> CREATE TABLE t2 (c1 INT PRIMARY KEY, c2 INT);
Query OK, 0 rows affected 

obclient> EXPLAIN SELECT * FROM t1 WHERE t1.c1 IN (SELECT t2.c2 FROM t2);

Query Plan: 
=======================================
|ID|OPERATOR      |NAME|EST. ROWS|COST|
---------------------------------------
|0 |HASH SEMI JOIN|    |495      |3931|
|1 | TABLE SCAN   |t1  |1000     |499 |
|2 | TABLE SCAN   |t2  |1000     |433 |
=======================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2]), filter(nil),
      equal_conds([t1.c1 = t2.c2]), other_conds(nil)
  1 - output([t1.c1], [t1.c2]), filter(nil),
      access([t1.c1], [t1.c2]), partitions(p0)
  2 - output([t2.c2]), filter(nil),
      access([t2.c2]), partitions(p0)

If you change the preceding operator IN to NOT IN, the statement can be rewritten into an anti join, and the execution plan is as follows:

obclient> EXPLAIN SELECT * FROM t1 WHERE t1.c1 NOT IN (SELECT t2.c2 FROM t2);

Query Plan:
================================================
|ID|OPERATOR             |NAME|EST. ROWS|COST  |
------------------------------------------------
|0 |NESTED-LOOP ANTI JOIN|    |0        |520245|
|1 | TABLE SCAN          |t1  |1000     |499   |
|2 | TABLE SCAN          |t2  |22       |517   |
================================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2]), filter(nil),
      conds(nil), nl_params_([t1.c1], [(T_OP_IS, t1.c1, NULL, 0)])
  1 - output([t1.c1], [t1.c2], [(T_OP_IS, t1.c1, NULL, 0)]), filter(nil),
      access([t1.c1], [t1.c2]), partitions(p0)
  2 - output([t2.c2]), filter([(T_OP_OR, ? = t2.c2, ?, (T_OP_IS, t2.c2, NULL, 0))]),
      access([t2.c2]), partitions(p0)

Expand the subquery into an inner join.

In the preceding example, if you change t2.c2 into t2.c1, since t2.c1 is a primary key, the data output by the subquery has uniqueness. In this case, you can directly convert the subquery into an inner join, as shown below:

Q1: 
obclient> SELECT * FROM t1 WHERE t1.c1 IN  (SELECT t2.c1 FROM t2);
<==>
Q2: 
obclient> SELECT t1.* FROM t1, t2 WHERE t1.c1 = t2.c1;

After Q1 is rewritten, the execution plan is as follows:

obclient> EXPLAIN SELECT * FROM t1 WHERE t1.c1 IN (SELECT t2.c1 FROM t2);

Query Plan:
 ====================================
|ID|OPERATOR   |NAME|EST. ROWS|COST|
------------------------------------
|0 |HASH JOIN  |    |1980     |3725|
|1 | TABLE SCAN|t2  |1000     |411 |
|2 | TABLE SCAN|t1  |1000     |499 |
====================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2]), filter(nil),
      equal_conds([t1.c1 = t2.c1]), other_conds(nil)
  1 - output([t2.c1]), filter(nil),
      access([t2.c1]), partitions(p0)
  2 - output([t1.c1], [t1.c2]), filter(nil),
      access([t1.c1], [t1.c2]), partitions(p0)

You can apply this method to NOT IN, IN, NOT EXISTS, EXISTS, ANY, and ALL subquery expressions.

Rewrite using MAX/MIN after converting ANY/ALL

You can convert the subquery in an ANY/ALL condition into a subquery containing MAX/MIN. If the subquery does not contain a GROUP BY clause, aggregate functions, or HAVING conditions, and if col_item is a non-null column, the following expression can be used for conversion:

val > ALL(SELECT col_item ...)  <==> val > (SELECT MAX(col_item) ...);
val >= ALL(SELECT col_item ...) <==> val >= (SELECT MAX(col_item) ...);
val < ALL(SELECT col_item ...)  <==> val < (SELECT MIN(col_item) ...);
val <= ALL(SELECT col_item ...) <==> val <= (SELECT MIN(col_item) ...);
val > ANY(SELECT col_item ...)  <==> val > (SELECT MIN(col_item) ...);
val >= ANY(SELECT col_item ...) <==> val >= (SELECT MIN(col_item) ...);
val < ANY(SELECT col_item ...)  <==> val < (SELECT MAX(col_item) ...);
val <= ANY(SELECT col_item ...) <==> val <= (SELECT MAX(col_item) ...);

After you convert the subquery into a subquery containing MAX/MIN, you can apply MAX/MIN to further rewrite the query statement. This approach reduces the number of scans on the inner table before the conversion. Here is an example:

obclient> SELECT c1 FROM t1 WHERE c1 > ANY(SELECT c1 FROM t2);
<==>
obclient> SELECT c1 FROM t1 WHERE c1 > (SELECT MIN(c1) FROM t2);

After you rewrite the query statement using MAX/MIN, you can push the LIMIT 1 condition, based on the primary key sequence of t2.c1, directly down to the TABLE SCAN operator to output the minimum value. The execution plan is as follows:

obclient> EXPLAIN SELECT c1 FROM t1 WHERE c1 > ANY(SELECT c1 FROM t2);

Query Plan:
 ===================================================
|ID|OPERATOR        |NAME          |EST. ROWS|COST|
---------------------------------------------------
|0 |SUBPLAN FILTER  |              |1        |73  |
|1 | TABLE SCAN     |t1            |1        |37  |
|2 | SCALAR GROUP BY|              |1        |37  |
|3 |  SUBPLAN SCAN  |subquery_table|1        |37  |
|4 |   TABLE SCAN   |t2            |1        |36  |
===================================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1]), filter([t1.c1 > ANY(subquery(1))]),
      exec_params_(nil), onetime_exprs_(nil), init_plan_idxs_([1])
  1 - output([t1.c1]), filter(nil),
      access([t1.c1]), partitions(p0)
  2 - output([T_FUN_MIN(subquery_table.c1)]), filter(nil),
      group(nil), agg_func([T_FUN_MIN(subquery_table.c1)])
  3 - output([subquery_table.c1]), filter(nil),
      access([subquery_table.c1])
  4 - output([t2.c1]), filter(nil),
      access([t2.c1]), partitions(p0),
      limit(1), offset(nil)

Elimination of outer joins

Outer joins can be left outer joins, right outer joins, and full outer joins. The optimizer's flexibility in choosing the join order is limited because the left and right operands of an outer join cannot be swapped. Outer join elimination is the process of converting outer joins into inner joins to provide the optimizer with more join orders to choose from.

To perform outer join elimination, a "null value rejection condition" must be met, namely, a condition in the WHERE clause that returns FALSE when a value generated by the inner table is NULL.

Here is an example:

obclient>SELECT t1.c1, t2.c2 FROM t1 LEFT JOIN t2 ON t1.c2 = t2.c2;

This query statement contains an outer join. The value of t2.c2 in the output row can be NULL. If a condition t2.c2 > 5 is added, the output value of t2.c1 cannot be NULL after the condition filters the results, and therefore the outer join can be converted into an inner join.

obclient> SELECT t1.c1, t2.c2 FROM t1 LEFT JOIN t2 ON t1.c2 = t2.c2 WHERE t2.c2 > 5;
<==>
obclient> SELECT t1.c1, t2.c2 FROM t1 INNER JOIN t2 ON t1.c2 = t2.c2 
            WHERE t2.c2 > 5;

Simplify condition rewriting

Elimination of the HAVING condition

If the query does not contain an aggregate operation or a GROUP BY clause, the HAVING condition can be merged with the WHERE condition and removed. This way, the HAVING condition can be managed and optimized uniformly with the WHERE condition.

obclient>SELECT * FROM t1, t2 WHERE t1.c1 = t2.c1 HAVING t1.c2 > 1;
<==>
obclient>SELECT * FROM t1, t2 WHERE t1.c1 = t2.c1 AND t1.c2 > 1;

The following example shows a rewritten execution plan. The t1.c2 > 1 condition is pushed down to the TABLE SCAN layer.

obclient> EXPLAIN SELECT * FROM t1, t2 WHERE t1.c1 = t2.c1 HAVING t1.c2 > 1;

Query Plan: 
=========================================
|ID|OPERATOR        |NAME|EST. ROWS|COST|
-----------------------------------------
|0 |NESTED-LOOP JOIN|    |1        |59  |
|1 | TABLE SCAN     |t1  |1        |37  |
|2 | TABLE GET      |t2  |1        |36  |
=========================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2], [t2.c1], [t2.c2]), filter(nil),
      conds(nil), nl_params_([t1.c1])
  1 - output([t1.c1], [t1.c2]), filter([t1.c2 > 1]),
      access([t1.c1], [t1.c2]), partitions(p0)
  2 - output([t2.c1], [t2.c2]), filter(nil),
      access([t2.c1], [t2.c2]), partitions(p0)

Derivation of equivalent conditions

Based on the transitivity of comparison operators, equivalent condition derivation deduces new condition expressions, reducing the number of rows to be processed or enabling the selection of more efficient indexes.

OceanBase Database can derive equivalent conditions for value-based joins. For example, in a table with columns a and b, the condition a = b AND a > 1 can be derived into a = b AND a > 1 AND b > 1. If an index exists on column b and the selectivity of the b > 1 condition is low for the index, the performance of accessing the table with column b can be significantly improved.

Consider the condition t1.c1 = t2.c2 AND t1.c1 > 2. After equivalent condition derivation, the condition is transformed into t1.c1 = t2.c2 AND t1.c1 > 2 AND t2.c2 > 2. As we can see in the following execution plan, the t2.c2 > 2 condition is pushed down to the TABLE SCAN layer and the IDX_c2 index is used.

obclient> CREATE TABLE t1(c1 INT PRIMARY KEY, c2 INT);
Query OK, 0 rows affected

obclient> CREATE TABLE t2(c1 INT PRIMARY KEY, c2 INT, c3 INT, KEY IDX_c2(c2));
Query OK, 0 rows affected
/*This command needs to be run in MySQL mode.*/

obclient> EXPLAIN EXTENDED_NOADDR SELECT t1.c1, t2.c2 FROM t1, t2 WHERE t1.c1 = t2.c2 AND t1.c1 > 2;

Query Plan:
==========================================
|ID|OPERATOR   |NAME      |EST. ROWS|COST|
------------------------------------------
|0 |MERGE JOIN |          |5        |78  |
|1 | TABLE SCAN|t2(IDX_c2)|5        |37  |
|2 | TABLE SCAN|t1        |3        |37  |
==========================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t2.c2]), filter(nil),
      equal_conds([t1.c1 = t2.c2]), other_conds(nil)
  1 - output([t2.c2]), filter(nil),
      access([t2.c2]), partitions(p0),
      is_index_back=false,
      range_key([t2.c2], [t2.c1]), range(2,MAX ; MAX,MAX),
      range_cond([t2.c2 > 2])
  2 - output([t1.c1]), filter(nil),
      access([t1.c1]), partitions(p0),
      is_index_back=false,
      range_key([t1.c1]), range(2 ; MAX),
      range_cond([t1.c1 > 2])

Elimination of tautologies and contradictions

You can eliminate the following tautologies and contradictions:

false and expr = Always False
true or expr = Always True

As shown in the following example, for the WHERE 0 > 1 AND c1 = 3 condition, since 0 > 1 is a contradiction, the AND condition is always false. Therefore, the SQL query does not need to be executed and can be directly returned, thus accelerating the query execution.

obclient> EXPLAIN EXTENDED_NOADDR SELECT * FROM t1 WHERE 0 > 1 AND c1 = 3;

Query Plan: 
===================================
|ID|OPERATOR  |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1  |0        |38  |
===================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2]), filter([0], [t1.c1 = 3]), startup_filter([0]),
      access([t1.c1], [t1.c2]), partitions(p0),
      is_index_back=false, filter_before_indexback[false,false],
      range_key([t1.__pk_increment], [t1.__pk_cluster_id], [t1.__pk_partition_id]),
      range(MAX,MAX,MAX ; MIN,MIN,MIN)always false

Rewrite without using a subquery

Elimination of redundant sorts

Redundant sort elimination removes unnecessary items from the order item, thereby reducing the sorting overhead. Sort elimination can be performed under the following conditions:

The ORDER BY clause contains duplicate columns. In this case, the system can eliminate duplicates and sort the data.

obclient> SELECT * FROM t1 WHERE c2 = 5 ORDER BY c1, c1, c2, c3 ;
<==>
obclient> SELECT * FROM t1 WHERE c2 = 5 ORDER BY c1, c2, c3;

The ORDER BY clause contains a column that has a single-valued condition in the WHERE clause. In this case, the system can sort the column and eliminate the other columns.
```
obclient> SELECT * FROM t1 WHERE c2 = 5 ORDER BY c1, c2, c3;
<==>
obclient> SELECT * FROM t1 WHERE c2 = 5 ORDER BY c1, c3;
```
If the current query has an ORDER BY clause but no LIMIT clause and is located within a set operation of a parent query, the ORDER BY clause can be eliminated. The reason is that the result of a UNION operation on two ordered sets is unordered. However, if the ORDER BY clause contains a LIMIT clause, it specifies to retrieve the largest or smallest N records. In this case, the ORDER BY clause cannot be eliminated to avoid semantic errors.
```
obclient> (SELECT c1,c2 FROM t1 ORDER BY c1) UNION (SELECT c3,c4 FROM t2 ORDER BY c3);
<==>
obclient> (SELECT c1,c2 FROM t1) UNION (SELECT c3,c4 FROM t2);
```

Pushdown of LIMIT to subqueries

Limit pushdown is the process of moving the LIMIT clause from the parent query to a subquery. OceanBase Database now supports pushing down the LIMIT clause to a view (example 1) or a subquery corresponding to the UNION operator (example 2) without changing the semantics of the query.

Example 1: Push the LIMIT clause to a view.

obclient> SELECT * FROM (SELECT * FROM t1 ORDER BY c1) a LIMIT 1; 
<==>
obclient> SELECT * FROM (SELECT * FROM t1 ORDER BY c1 LIMIT 1) a LIMIT 1;

Example 2: Push the LIMIT clause to a subquery corresponding to the UNION operator.

obclient> (SELECT c1,c2 FROM t1) UNION ALL (SELECT c3,c4 FROM t2) LIMIT 5;
<==>
obclient> (SELECT c1,c2 FROM t1 LIMIT 5) UNION ALL (SELECT c3,c4 FROM t2 limit 5) LIMIT 5;

Pushdown of LIMIT to outer or cross joins

If the SQL query statement does not contain WINDOW FUNCTION, DISTINCT, GROUP BY, or HAVING and the WHERE condition or ORDER BY clause applies to only one side of the join or does not contain WHERE and ORDER BY, the LIMIT clause can be pushed down to one side (outer join) or both sides (cross join) of the join. This process is called pushing down the LIMIT clause to outer joins or cross joins. By pushing down the LIMIT clause, the number of rows processed in the join can be effectively reduced, thereby lowering the execution overheads of the query.

For outer joins, a view is created on the table to which the LIMIT clause is pushed down. Take a left outer join as an example. When the LIMIT clause is pushed down to the left table, the following Q1 query is transformed into the Q2 query:

Q1:
SELECT * FROM t1 LEFT JOIN t2 ON t1.c1 = t2.c1 LIMIT 1;
==>
Q2:
SELECT * FROM V LEFT JOIN t2 ON t1.c1 = t2.c1 LIMIT 1;
V: (SELECT * FROM t1 LIMIT 1);

The Q1 query does not contain the preceding limitations and does not have WHERE and ORDER BY clauses. Therefore, a view V is created on the left table t1, and the LIMIT 1 clause is incorporated into the view V, thereby rewriting the Q1 query into the Q2 query. Note that the original LIMIT clause must be retained after the LIMIT clause is pushed down. The rewritten query plan is as follows:

=============================================
|ID|OPERATOR         |NAME   |EST. ROWS|COST|
---------------------------------------------
|0 |LIMIT            |       |1        |4   |
|1 | MERGE OUTER JOIN|       |1        |4   |
|2 |  SUBPLAN SCAN   |VIEW1  |1        |2   |
|3 |   TABLE SCAN    |t1(idx)|1        |2   |
|4 |  SORT           |       |1        |2   |
|5 |   TABLE SCAN    |t2     |1        |2   |
=============================================
Outputs & filters: 
-------------------------------------
0 - output([1]), filter(nil), rowset=256, limit(1), offset(nil)
1 - output(nil), filter(nil), rowset=256, 
equal_conds([VIEW1.t1.c1 = t2.c1]), other_conds(nil)
2 - output([VIEW1.t1.c1]), filter(nil), rowset=256, 
access([VIEW1.t1.c1])
3 - output([t1.c1]), filter(nil), rowset=256, 
access([t1.c1]), partitions(p0), 
limit(1), offset(nil)
4 - output([t2.c1]), filter(nil), rowset=256, sort_keys([t2.c1, ASC])
5 - output([t2.c1]), filter(nil), rowset=256, 
access([t2.c1]), partitions(p0)

In the same way, the LIMIT clause can be pushed down to the right table of a right outer join if the preceding conditions are met.

For cross joins, the LIMIT clause can be pushed down to both sides of the join. Here is an example:

Q3:
SELECT 1 FROM t1, t2 WHERE t1.c1 > 0 ORDER BY t1.c1 LIMIT 1;
==>
Q4:
SELECT 1 FROM V1, V2 LIMIT 1;
V1: SELECT 1 FROM t1 WHERE t1.c1 > 0 ORDER BY t1.c1 LIMIT 1; 
V2: SELECT 1 FROM t2 LIMIT 1;

The Q3 query does not contain the preceding limitations on limit pushing for outer joins and contains only the WHERE condition and ORDER BY clause on the t1 table. Therefore, views V1 and V2 are created on the t1 and t2 tables, respectively, and the LIMIT clause is incorporated into the views V1 and V2, thereby rewriting the Q3 query into the Q4 query. Note that the original LIMIT clause must be retained. The rewritten query plan is as follows:

=====================================================
|ID|OPERATOR                   |NAME |EST. ROWS|COST|
-----------------------------------------------------
|0 |LIMIT                      |     |1        |5   |
|1 | NESTED-LOOP JOIN CARTESIAN|     |1        |5   |
|2 |  SUBPLAN SCAN             |VIEW1|1        |3   |
|3 |   TOP-N SORT              |     |1        |3   |
|4 |    TABLE SCAN             |t1   |3        |3   |
|5 |  MATERIAL                 |     |1        |2   |
|6 |   SUBPLAN SCAN            |VIEW2|1        |2   |
|7 |    TABLE SCAN             |t2   |1        |2   |
=====================================================
Outputs & filters: 
-------------------------------------
  0 - output([1]), filter(nil), rowset=256, limit(1), offset(nil)
  1 - output(nil), filter(nil), rowset=256, 
      conds(nil), nl_params_(nil)
  2 - output(nil), filter(nil), rowset=256, 
      access(nil)
  3 - output([t1.c1]), filter(nil), rowset=256, sort_keys([t1.c1, ASC]), topn(1)
  4 - output([t1.c1]), filter([t1.c1 > 0]), rowset=256, 
      access([t1.c1]), partitions(p0)
  5 - output(nil), filter(nil), rowset=256
  6 - output(nil), filter(nil), rowset=256, 
      access(nil)
  7 - output([1]), filter([t2.c1 > 0]), rowset=256, 
      access([t2.c1]), partitions(p0), 
      limit(1), offset(nil)

For SQL queries with multiple tables joined, you can apply limit pushing to outer joins and cross joins multiple times to achieve better query rewriting results.

Elimination of DISTINCT

If the Select Item contains only constants, you can eliminate the DISTINCT keyword and add the LIMIT 1 clause.

obclient> SELECT DISTINCT 1,2 FROM t1 ;
<==> 
obclient> SELECT 1,2 FROM t1 LIMIT 1;

obclient> CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 INT);
Query OK, 0 rows affected 

obclient> EXPLAIN EXTENDED_NOADDR SELECT DISTINCT 1,2 FROM t1;

Query Plan: 
===================================
|ID|OPERATOR  |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1  |1        |36  |
===================================

Outputs & filters:
-------------------------------------
  0 - output([1], [2]), filter(nil),
      access([t1.c1]), partitions(p0),
      limit(1), offset(nil),
      is_index_back=false,
      range_key([t1.c1]), range(MIN ; MAX)always true

If the Select Item contains a column that enforces a uniqueness constraint, you can eliminate the DISTINCT keyword. In the following example, (c1, c2) is the primary key and enforces the uniqueness of c1, c2, and c3.

obclient> CREATE TABLE t2(c1 INT, c2 INT, c3 INT, PRIMARY KEY(c1, c2));
Query OK, 0 rows affected 

obclient> SELECT DISTINCT c1, c2, c3 FROM t2;
<==>
obclient> SELECT c1, c2 c3 FROM t2;

obclient> EXPLAIN SELECT DISTINCT c1, c2, c3 FROM t2;

Query Plan: 
===================================
|ID|OPERATOR  |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t2  |1000     |455 |
===================================

Outputs & filters:
-------------------------------------
  0 - output([t2.c1], [t2.c2], [t2.c3]), filter(nil),
      access([t2.c1], [t2.c2], [t2.c3]), partitions(p0)

Rewriting of MIN/MAX

If the parameter of the MIN/MAX function is a prefix column of an index and does not contain the GROUP BY clause, you can convert the scalar aggregate function into an indexed scan that reads only one row, as shown in the following example:

obclient> CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 INT, c3 INT, KEY IDX_c2_c3(c2,c3));
Query OK, 0 rows affected 

obclient> SELECT MIN(c2) FROM t1;
<==>
obclient> SELECT MIN(c2) FROM (SELECT c2 FROM t1 ORDER BY c2 LIMIT 1) AS t;

obclient> EXPLAIN SELECT MIN(c2) FROM t1;

Query Plan: 
==================================================
|ID|OPERATOR       |NAME          |EST. ROWS|COST|
--------------------------------------------------
|0 |SCALAR GROUP BY|              |1        |37  |
|1 | SUBPLAN SCAN  |subquery_table|1        |37  |
|2 |  TABLE SCAN   |t1(idx_c2_c3) |1        |36  |
==================================================

Outputs & filters:
-------------------------------------
  0 - output([T_FUN_MIN(subquery_table.c2)]), filter(nil),
      group(nil), agg_func([T_FUN_MIN(subquery_table.c2)])
  1 - output([subquery_table.c2]), filter(nil),
      access([subquery_table.c2])
  2 - output([t1.c2]), filter([(T_OP_IS_NOT, t1.c2, NULL, 0)]),
      access([t1.c2]), partitions(p0),
      limit(1), offset(nil)

If the SELECT MIN/SELECT MAX statement's parameter is a constant and contains the GROUP BY clause, you can replace the MIN/MAX function with a constant to reduce the computation overhead of the MIN/MAX function.

obclient> SELECT MAX(1) FROM t1 GROUP BY c1;
<==>
obclient> SELECT 1 FROM t1 GROUP BY c1;

obclient> EXPLAIN EXTENDED_NOADDR SELECT MAX(1) FROM t1 GROUP BY c1;

Query Plan: 
===================================
|ID|OPERATOR  |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1  |1000     |411 |
===================================

Outputs & filters:
-------------------------------------
  0 - output([1]), filter(nil),
      access([t1.c1]), partitions(p0),
      is_index_back=false,
      range_key([t1.c1]), range(MIN ; MAX)always true

If the SELECT MIN/SELECT MAX statement's parameter is a constant and does not contain the GROUP BY clause, you can rewrite the statement as shown in the following example to perform an indexed scan that reads only one row.

obclient> SELECT MAX(1) FROM t1;
<==> 
obclient> SELECT MAX(t.a) FROM (SELECT 1 AS a FROM t1 LIMIT 1) t;

obclient> EXPLAIN EXTENDED_NOADDR SELECT MAX(1) FROM t1;

Query Plan: 
==================================================
|ID|OPERATOR       |NAME          |EST. ROWS|COST|
--------------------------------------------------
|0 |SCALAR GROUP BY|              |1        |37  |
|1 | SUBPLAN SCAN  |subquery_table|1        |37  |
|2 |  TABLE SCAN   |t1            |1        |36  |
==================================================

Outputs & filters:
-------------------------------------
  0 - output([T_FUN_MAX(subquery_table.subquery_col_alias)]), filter(nil),
      group(nil), agg_func([T_FUN_MAX(subquery_table.subquery_col_alias)])
  1 - output([subquery_table.subquery_col_alias]), filter(nil),
      access([subquery_table.subquery_col_alias])
  2 - output([1]), filter(nil),
      access([t1.c1]), partitions(p0),
      limit(1), offset(nil),
      is_index_back=false,
      range_key([t1.c1]), range(MIN ; MAX)always true