Cost-based query rewrite|V4.1.0| docs|Distributed Database

Cost-based query rewrite

Last Updated：2023-07-28 02:55:42 Updated

OceanBase Database supports only one type of cost-based query rewrite: OR-EXPANSION.

Its later versions will support advanced cost-based rewriting rules for database administration such as complex view merge and window function rewrite.

OR-EXPANSION

OR-EXPANSION rewrites a query to several subqueries, and the result sets of these subqueries are combined into the same result set through the UNION operator. This allows you to optimize each subquery. However, the rewrite also results in the execution of multiple subqueries. You must determine whether to perform the rewrite based on the cost analysis.

Purposes of the OR-EXPANSION rewrite:

To allow subqueries to use different indexes to speed up the query.

In the following example, query Q1 is rewritten to Q2, where LNNVL(t1.a = 1), the predicate in Q2, ensures that the two subqueries do not generate duplicate results. Before the rewrite, Q1 generally accesses the primary table. After the rewrite, if indexes (a) and (b) are created on table t1, subqueries of Q2 are allowed to use different indexes for data access.

Q1:
obclient> SELECT * FROM t1 WHERE t1.a = 1 OR t1.b = 1;
Q2:
obclient> SELECT * FROM t1 WHERE t1.a = 1 UNION ALL SELECT * FROM t1.b = 1
       AND LNNVL(t1.a = 1);

Example:

obclient> CREATE TABLE t1(a INT, b INT, c INT, d INT, e INT, INDEX IDX_a(a), INDEX IDX_b(b));
Query OK, 0 rows affected

/*Without OR-EXPANSION rewrite, primary access path is the only option for the query*/
obclient> EXPLAIN SELECT/*+NO_REWRITE()*/ * FROM t1 WHERE t1.a = 1 OR t1.b = 1;
+--------------------------------------------------------------+
| Query Plan                                                                         |
+--------------------------------------------------------------+
| ===================================
|ID|OPERATOR  |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1  |4        |649 |
===================================

Outputs & filters:
-------------------------------------
  0 - output([t1.a], [t1.b], [t1.c], [t1.d], [t1.e]), filter([t1.a = 1 OR t1.b = 1]),
      access([t1.a], [t1.b], [t1.c], [t1.d], [t1.e]), partitions(p0)

/*After the rewrite, different index access paths are available for each subquery*/
obclient>EXPLAIN SELECT * FROM t1 WHERE t1.a = 1 OR t1.b = 1;
+------------------------------------------------------------------------+
| Query Plan                                                                                         |
+------------------------------------------------------------------------+
| =========================================
|ID|OPERATOR   |NAME     |EST. ROWS|COST|
-----------------------------------------
|0 |UNION ALL  |         |3        |190 |
|1 | TABLE SCAN|t1(idx_a)|2        |94  |
|2 | TABLE SCAN|t1(idx_b)|1        |95  |
=========================================

Outputs & filters:
-------------------------------------
  0 - output([UNION(t1.a, t1.a)], [UNION(t1.b, t1.b)], [UNION(t1.c, t1.c)], [UNION(t1.d, t1.d)], [UNION(t1.e, t1.e)]), filter(nil)
  1 - output([t1.a], [t1.b], [t1.c], [t1.d], [t1.e]), filter(nil),
      access([t1.a], [t1.b], [t1.c], [t1.d], [t1.e]), partitions(p0)
  2 - output([t1.a], [t1.b], [t1.c], [t1.d], [t1.e]), filter([lnnvl(t1.a = 1)]),
      access([t1.a], [t1.b], [t1.c], [t1.d], [t1.e]), partitions(p02

To allow subqueries to use different JOIN algorithms to speed up the query and to avoid the use of CARTESIAN JOIN.

In the following example, query Q1 is rewritten to Q2. For Q1, the nested loop join, which results in a Cartesian product, is the only join option available. After the rewrite, nested loop join, hash join, and merge join are available for each subquery, providing more options for optimization.

Q1:
obclient> SELECT * FROM t1, t2 WHERE t1.a = t2.a OR t1.b = t2.b;

Q2:
obclient> SELECT * FROM t1, t2 WHERE t1.a = t2.a UNION ALL
     SELECT * FROM t1, t2 WHERE t1.b = t2.b AND LNNVL(t1.a = t2.a);

Example:

obclient> CREATE TABLE t1(a INT, b INT);
Query OK, 0 rows affected

obclient> CREATE TABLE t2(a INT, b INT);
Query OK, 0 rows affected

/*Without the rewrite, the nested loop join is the only available option.*/
obclient> EXPLAIN SELECT/*+NO_REWRITE()*/ * FROM t1, t2
       WHERE t1.a = t2.a OR t1.b = t2.b;
+--------------------------------------------------------------------------+
| Query Plan                                                                                          |
+--------------------------------------------------------------------------+
| ===========================================
|ID|OPERATOR        |NAME|EST. ROWS|COST  |
-------------------------------------------
|0 |NESTED-LOOP JOIN|    |3957     |585457|
|1 | TABLE SCAN     |t1  |1000     |499   |
|2 | TABLE SCAN     |t2  |4        |583   |
===========================================

Outputs & filters:
-------------------------------------
  0 - output([t1.a], [t1.b], [t2.a], [t2.b]), filter(nil),
      conds(nil), nl_params_([t1.a], [t1.b])
  1 - output([t1.a], [t1.b]), filter(nil),
      access([t1.a], [t1.b]), partitions(p0)
  2 - output([t2.a], [t2.b]), filter([? = t2.a OR ? = t2.b]),
      access([t2.a], [t2.b]), partitions(p0)

/*After the rewrite, every subquery can use HASH JOIN.*/
obclient>  EXPLAIN SELECT * FROM t1, t2 WHERE t1.a = t2.a OR t1.b = t2.b;
+--------------------------------------------------------------------------+
| Query Plan                                                                                         |
+--------------------------------------------------------------------------+
|ID|OPERATOR    |NAME|EST. ROWS|COST|
-------------------------------------
|0 |UNION ALL   |    |2970     |9105|
|1 | HASH JOIN  |    |1980     |3997|
|2 |  TABLE SCAN|t1  |1000     |499 |
|3 |  TABLE SCAN|t2  |1000     |499 |
|4 | HASH JOIN  |    |990      |3659|
|5 |  TABLE SCAN|t1  |1000     |499 |
|6 |  TABLE SCAN|t2  |1000     |499 |
=====================================

Outputs & filters:
-------------------------------------
  0 - output([UNION(t1.a, t1.a)], [UNION(t1.b, t1.b)], [UNION(t2.a, t2.a)], [UNION(t2.b, t2.b)]), filter(nil)
  1 - output([t1.a], [t1.b], [t2.a], [t2.b]), filter(nil),
      equal_conds([t1.a = t2.a]), other_conds(nil)
  2 - output([t1.a], [t1.b]), filter(nil),
      access([t1.a], [t1.b]), partitions(p0)
  3 - output([t2.a], [t2.b]), filter(nil),
      access([t2.a], [t2.b]), partitions(p0)
  4 - output([t1.a], [t1.b], [t2.a], [t2.b]), filter(nil),
      equal_conds([t1.b = t2.b]), other_conds([lnnvl(t1.a = t2.a)])
  5 - output([t1.a], [t1.b]), filter(nil),
      access([t1.a], [t1.b]), partitions(p0)
  6 - output([t2.a], [t2.b]), filter(nil),
      access([t2.a], [t2.b]), partitions(p0)

To allow each subquery to separately perform sorting elimination, accelerating the retrieval of the TOP K results.

In the following example, query Q1 is rewritten to Q2. For Q1, the only way of execution is to find the rows that fit the condition, sort them, and then retrieve the TOP 10 results. Assume that Q2 has two subqueries. If indexes a and b are available, each subquery of Q2 can eliminate redundancy by using an index and retrieve the TOP 10 results. Finally, sort the 20 rows retrieved by these subqueries to retrieve the final TOP 10 rows.

Q1:
obclient> SELECT * FROM t1 WHERE t1.a = 1 OR t1.a = 2 ORDER BY b LIMIT 10;

Q2:
obclient> SELECT * FROM  
    (SELECT * FROM  t1 WHERE t1.a = 1 ORDER BY b LIMIT 10 UNION ALL
     SELECT * FROM  t1 WHERE t1.a = 2 ORDER BY b LIMIT 10) AS TEMP
    ORDER BY temp.b LIMIT 10;

Example:

obclient> CREATE TABLE t1(a INT, b INT, INDEX IDX_a(a, b));
Query OK, 0 rows affected

/*Before the rewrite, data is sorted to retrieve the TOP K results*/
obclient> EXPLAIN SELECT/*+NO_REWRITE()*/ * FROM t1 WHERE t1.a = 1 OR t1.a = 2 ORDER BY b LIMIT 10;
+-------------------------------------------------------------------------+
| Query Plan                                                                                         |
+-------------------------------------------------------------------------+
| ==========================================
|ID|OPERATOR    |NAME     |EST. ROWS|COST|
------------------------------------------
|0 |LIMIT       |         |4        |77  |
|1 | TOP-N SORT |         |4        |76  |
|2 |  TABLE SCAN|t1(idx_a)|4        |73  |
==========================================

Outputs & filters:
-------------------------------------
  0 - output([t1.a], [t1.b]), filter(nil), limit(10), offset(nil)
  1 - output([t1.a], [t1.b]), filter(nil), sort_keys([t1.b, ASC]), topn(10)
  2 - output([t1.a], [t1.b]), filter(nil),
      access([t1.a], [t1.b]), partitions(p0)

/*After the rewrite, the subqueries remove the SORT operator and eventually retrieve the TOP K results*/
obclient>EXPLAIN SELECT * FROM t1 WHERE t1.a = 1 OR t1.a = 2
        ORDER BY b LIMIT 10;
+-------------------------------------------------------------------------+
| Query Plan                                                                                          |
+-------------------------------------------------------------------------+
| ===========================================
|ID|OPERATOR     |NAME     |EST. ROWS|COST|
-------------------------------------------
|0 |LIMIT        |         |3        |76  |
|1 | TOP-N SORT  |         |3        |76  |
|2 |  UNION ALL  |         |3        |74  |
|3 |   TABLE SCAN|t1(idx_a)|2        |37  |
|4 |   TABLE SCAN|t1(idx_a)|1        |37  |
===========================================

Outputs & filters:
-------------------------------------
  0 - output([UNION(t1.a, t1.a)], [UNION(t1.b, t1.b)]), filter(nil), limit(10), offset(nil)
  1 - output([UNION(t1.a, t1.a)], [UNION(t1.b, t1.b)]), filter(nil), sort_keys([UNION(t1.b, t1.b), ASC]), topn(10)
  2 - output([UNION(t1.a, t1.a)], [UNION(t1.b, t1.b)]), filter(nil)
  3 - output([t1.a], [t1.b]), filter(nil),
      access([t1.a], [t1.b]), partitions(p0),
      limit(10), offset(nil)
  4 - output([t1.a], [t1.b]), filter([lnnvl(t1.a = 1)]),
      access([t1.a], [t1.b]), partitions(p0),
      limit(10), offset(nil)