When a query contains a LIMIT clause, whether the generated execution plan contains a blocking operator can significantly impact performance. SORT, as a common blocking operator, is included in the query when an ORDER BY clause is used or when the generated plan requires the assignment of operators based on sorting algorithms (such as MERGE DISTINCT, MERGE GROUP BY, and MERGE JOIN). In these cases, the system attempts to assign SORT operators to obtain an intermediate result set sorted according to the specified sort key. To avoid unnecessary SORT operators and fully utilize already sorted intermediate result sets, the optimizer maintains certain sequence-related information and performs optimizations when attempting to assign SORT operators.
Order in a plan tree
In an execution plan, the input of an operator may be sorted. For base table scans, the order comes from indexed columns. For other operators, this order may come from the output order of child operators. Different from the output order explicitly required by ORDER BY, the input order required by most operators that depend on sorting are adjustable, and can be inherited by the output of the operators. By optimizing the order used for operators that depend on sorting, the optimizer can avoid unnecessary sorts.
When the optimizer determines the order for an operator that depends on sorting, it checks the existing input order and the order that may be required later. For example, in the following query, a MERGE GROUP BY operator is assigned for the GROUP BY clause, and the optimizer can swap the positions in c1, c2 and change the sorting direction of each column.
- In the
Q2_1_1query, the use of theidxindex produces thec1, c2order, which avoids assigning aSORToperator fororder by c1, c2. - Compared with
Q2_1_1, theQ2_1_2query has a differentorder byclause and uses the primary key. TheMERGE GROUP BYoperator directly uses the order required byorder by c2, c1as the input order for the merge, and aSORToperator is assigned below theMERGE GROUP BYoperator. - Compared with
Q2_1_2, theQ2_1_3query does not use the primary key. A plan that contains aMERGE GROUP BYoperator is generated based on cost analysis. TheMERGE GROUP BYoperator can directly use the order ofc1, c2produced by theidxindex, and aSORToperator is assigned fororder by c2, c1after the merge.
create table t1(c1 int, c2 int, c3 int, c4 int, pk int primary key);
create index idx on t1(c1, c2, c3);
Q2_1_1:
select /*+no_use_hash_aggregation*/ c1, c2, sum(c3)
from t1 t
group by c2, c1
order by c1, c2;
==========================================
|ID|OPERATOR |NAME |EST. ROWS|COST |
------------------------------------------
|0 |MERGE GROUP BY| |7072 |10965|
|1 | TABLE SCAN |t(idx)|78858 |8124 |
==========================================
Outputs & filters:
-------------------------------------
0 - output([t.c1], [t.c2], [T_FUN_SUM(t.c3)]), filter(nil), rowset=256,
group([t.c1], [t.c2]), agg_func([T_FUN_SUM(t.c3)])
1 - output([t.c1], [t.c2], [t.c3]), filter(nil), rowset=256,
access([t.c1], [t.c2], [t.c3]), partitions(p0)
Q2_1_2:
select /*+no_use_hash_aggregation full(t)*/ c1, c2, sum(c3)
from t1 t
group by c1, c2
order by c2, c1;
========================================
|ID|OPERATOR |NAME|EST. ROWS|COST |
----------------------------------------
|0 |MERGE GROUP BY| |7072 |41410|
|1 | SORT | |100000 |37988|
|2 | TABLE SCAN |t |100000 |8026 |
========================================
Outputs & filters:
-------------------------------------
0 - output([t.c1], [t.c2], [T_FUN_SUM(t.c3)]), filter(nil), rowset=256,
group([t.c2], [t.c1]), agg_func([T_FUN_SUM(t.c3)])
1 - output([t.c2], [t.c1], [t.c3]), filter(nil), rowset=256,
sort_keys([t.c2, ASC], [t.c1, ASC])
2 - output([t.c1], [t.c2], [t.c3]), filter(nil), rowset=256,
access([t.c1], [t.c2], [t.c3]), partitions(p0)
Q2_1_3:
select /*+no_use_hash_aggregation index(t idx)*/ c1, c2, sum(c3)
from t1 t
group by c1, c2
order by c2, c1;
===========================================
|ID|OPERATOR |NAME |EST. ROWS|COST |
-------------------------------------------
|0 |SORT | |7072 |12771|
|1 | MERGE GROUP BY| |7072 |10965|
|2 | TABLE SCAN |t(idx)|78858 |8124 |
===========================================
Outputs & filters:
-------------------------------------
0 - output([t.c1], [t.c2], [T_FUN_SUM(t.c3)]), filter(nil), rowset=256,
sort_keys([t.c2, ASC], [t.c1, ASC])
1 - output([t.c2], [t.c1], [T_FUN_SUM(t.c3)]), filter(nil), rowset=256,
group([t.c1], [t.c2]), agg_func([T_FUN_SUM(t.c3)])
2 - output([t.c1], [t.c2], [t.c3]), filter(nil), rowset=256,
access([t.c1], [t.c2], [t.c3]), partitions(p0)
Optimize SORT assignment
When the optimizer tries to assign a SORT operator to obtain the expected order, if the output of a child operator is sorted, the optimizer checks the output order, which is also the input order of the parent operator, and other related information to see whether the order meets the expectation. The following three queries all contain an ORDER BY clause that requires the final result to be sorted. However, because the order provided by the index or primary key matches the order expected by this clause, the optimizer finally does not need to assign a SORT operator.
Q2_2_1: Theidxindex provides the(c1, c2, c3)order, which matches the output order expected byorder by c1, c2.Q2_2_2: Theidxindex provides the(c1, c2, c3)order, which matches the output order expected byorder by c2due to the inclusion of thec1 = 4predicate.Q2_2_3: The primary key provides the(pk)order. Due to the uniqueness of the primary key, the input order matches the output order expected byorder by pk, c3, c2, c1.
create table t1(c1 int, c2 int, c3 int, c4 int, pk int primary key);
create index idx on t1(c1, c2, c3);
Q2_2_1: select /*+index(t idx)*/ * from t1 t order by c1, c2;
Q2_2_2: select /*+index(t idx)*/ * from t1 t where c1 = 4 order by c2;
Q2_2_3: select /*+index(t primary)*/ * from t1 t order by pk, c3, c2, c1;
Execution plan for Q2_2_1:
=======================================
|ID|OPERATOR |NAME |EST. ROWS|COST |
---------------------------------------
|0 |TABLE SCAN|t(idx)|76557 |216994|
=======================================
Outputs & filters:
-------------------------------------
0 - output([t.c1], [t.c2], [t.c3], [t.c4], [t.pk]), filter(nil), rowset=256,
access([t.pk], [t.c1], [t.c2], [t.c3], [t.c4]), partitions(p0)
For sorts that cannot be eliminated, the optimizer optimizes the sort columns before assigning a SORT operator.
Prefix sorting
If a SORT operator must be assigned, when the input order of the operator matches part of the expected order, the optimizer performs prefix sorting to utilize part of the input order.
In the following query, after the idx index is used, the output order of the TABLE SCAN operator is (c1, c2, c3). When the optimizer performs prefix sorting, data in c1 returned by TABLE SCAN is sorted only by pk.
Q2_2_4:
select /*+index(t idx)*/ c1 from t1 t order by c1, pk;
=======================================
|ID|OPERATOR |NAME |EST. ROWS|COST |
---------------------------------------
|0 |SORT | |76557 |18874|
|1 | TABLE SCAN|t(idx)|76557 |5319 |
=======================================
Outputs & filters:
-------------------------------------
0 - output([t.c1]), filter(nil), rowset=256,
sort_keys([t.c1, ASC], [t.pk, ASC]), prefix_pos(1)
1 - output([t.pk], [t.c1]), filter(nil), rowset=256,
access([t.pk], [t.c1]), partitions(p0)
Simplify sort columns
The optimizer can simplify sort columns, thus reducing the number of the involved columns. The following methods are available:
Make use of equality conditions:
select c1 from t1 t where c3 = 1 and c2 = c1 order by c3, c2, c1;For the preceding query, the
c3 = 1andc2 = c1conditions can be used for simplification, and the rows returned are finally sorted byc2.Make use of the primary key or a unique index:
select c1 from t1 t order by c1, pk, c3, c2;For the preceding query,
pkis the primary key, and the rows returned are finally sorted byc1andpk.