When a query contains a LIMIT clause, whether the generated execution plan contains a blocking operator can significantly impact performance. SORT, as a common blocking operator, is included in the query when an ORDER BY clause is used or when the generated plan requires the assignment of operators based on sorting algorithms (such as MERGE DISTINCT, MERGE GROUP BY, and MERGE JOIN). In these cases, the system attempts to assign SORT operators to obtain an intermediate result set sorted according to the specified sort key. To avoid unnecessary SORT operators and fully utilize already sorted intermediate result sets, the optimizer maintains certain sequence-related information and performs optimizations when attempting to assign SORT operators.
Order in a plan tree
In an execution plan, the input order of operators may be sorted. For base table scans, this order is determined by the sequence produced by the table’s index structure. For other operators, the input order may come from the output order of their child operators. This is different from the explicit output ordering required by an ORDER BY clause. However, most operators that utilize sorting algorithms allow flexibility in their input order, and this order can often be preserved and passed on as the operator’s output order. By optimizing how sorting operators use input order, unnecessary multiple sorts can be avoided.
When the optimizer decides which order a sorting operator should use, it considers both the existing input order and any order that may be required later. For example, in the query below, when the GROUP BY clause uses a merge group by operator, the order and sorting direction of c1 and c2 in the operator’s input can be adjusted as needed.
- In
Q2_1_1, the indexidxis used to generate the order ofc1andc2, avoiding the need for a sorting operator and the sorting operator required byorder by c1, c2. - In
Q2_1_2, theorder byclause is adjusted relative toQ2_1_1, and the primary key is specified. Themerge group byoperator directly uses the order required byorder by c2, c1as the input order for aggregation and assigns a sorting operator undergroup by. - Compared with
Q2_1_2,Q2_1_3does not specify the primary key. It generates themerge group byoperator that can directly use the order ofc1andc2generated byidx, and assigns a sorting operator fororder by c2, c1after aggregation.
create table t1(c1 int, c2 int, c3 int, c4 int, pk int primary key);
create index idx on t1(c1, c2, c3);
Q2_1_1:
select /*+no_use_hash_aggregation*/ c1, c2, sum(c3)
from t1 t
group by c2, c1
order by c1, c2;
==========================================
|ID|OPERATOR |NAME |EST. ROWS|COST |
------------------------------------------
|0 |MERGE GROUP BY| |7072 |10965|
|1 | TABLE SCAN |t(idx)|78858 |8124 |
==========================================
Outputs & filters:
-------------------------------------
0 - output([t.c1], [t.c2], [T_FUN_SUM(t.c3)]), filter(nil), rowset=256,
group([t.c1], [t.c2]), agg_func([T_FUN_SUM(t.c3)])
1 - output([t.c1], [t.c2], [t.c3]), filter(nil), rowset=256,
access([t.c1], [t.c2], [t.c3]), partitions(p0)
Q2_1_2:
select /*+no_use_hash_aggregation full(t)*/ c1, c2, sum(c3)
from t1 t
group by c1, c2
order by c2, c1;
========================================
|ID|OPERATOR |NAME|EST. ROWS|COST |
----------------------------------------
|0 |MERGE GROUP BY| |7072 |41410|
|1 | SORT | |100000 |37988|
|2 | TABLE SCAN |t |100000 |8026 |
========================================
Outputs & filters:
-------------------------------------
0 - output([t.c1], [t.c2], [T_FUN_SUM(t.c3)]), filter(nil), rowset=256,
group([t.c2], [t.c1]), agg_func([T_FUN_SUM(t.c3)])
1 - output([t.c2], [t.c1], [t.c3]), filter(nil), rowset=256,
sort_keys([t.c2, ASC], [t.c1, ASC])
2 - output([t.c1], [t.c2], [t.c3]), filter(nil), rowset=256,
access([t.c1], [t.c2], [t.c3]), partitions(p0)
Q2_1_3:
select /*+no_use_hash_aggregation index(t idx)*/ c1, c2, sum(c3)
from t1 t
group by c1, c2
order by c2, c1;
===========================================
|ID|OPERATOR |NAME |EST. ROWS|COST |
-------------------------------------------
|0 |SORT | |7072 |12771|
|1 | MERGE GROUP BY| |7072 |10965|
|2 | TABLE SCAN |t(idx)|78858 |8124 |
===========================================
Outputs & filters:
-------------------------------------
0 - output([t.c1], [t.c2], [T_FUN_SUM(t.c3)]), filter(nil), rowset=256,
sort_keys([t.c2, ASC], [t.c1, ASC])
1 - output([t.c2], [t.c1], [T_FUN_SUM(t.c3)]), filter(nil), rowset=256,
group([t.c1], [t.c2]), agg_func([T_FUN_SUM(t.c3)])
2 - output([t.c1], [t.c2], [t.c3]), filter(nil), rowset=256,
access([t.c1], [t.c2], [t.c3]), partitions(p0)
Allocation and optimization of the sort operator
When attempting to assign a sorting operator to achieve the desired order, the output of a child operator may already be sorted. The system first checks the input order and other relevant information to determine whether the input order can satisfy the expected output order. In the three queries below, the presence of an ORDER BY clause requires the final result to be ordered. However, because the order is already provided by a matching index or primary key, there is no need to assign an additional sorting operator in the end.
Q2_2_1: Theidxindex can provide the sequence(c1, c2, c3), and the output sequence matchesorder by c1, c2.Q2_2_2: Theidxindex can provide the sequence(c1, c2, c3). Due to thec1 = 4predicate, the output sequence matchesorder by c2.Q2_2_3: The primary key provides the sequence(pk). Due to the uniqueness of the primary key, the output sequence matchesorder by pk, c3, c2, c1.
create table t1(c1 int, c2 int, c3 int, c4 int, pk int primary key);
create index idx on t1(c1, c2, c3);
Q2_2_1: select /*+index(t idx)*/ * from t1 t order by c1, c2;
Q2_2_2: select /*+index(t idx)*/ * from t1 t where c1 = 4 order by c2;
Q2_2_3: select /*+index(t primary)*/ * from t1 t order by pk, c3, c2, c1;
Q2_2_1 execution plan:
=======================================
|ID|OPERATOR |NAME |EST. ROWS|COST |
---------------------------------------
|0 |TABLE SCAN|t(idx)|76557 |216994|
=======================================
Outputs & filters:
-------------------------------------
0 - output([t.c1], [t.c2], [t.c3], [t.c4], [t.pk]), filter(nil), rowset=256,
access([t.pk], [t.c1], [t.c2], [t.c3], [t.c4]), partitions(p0)
For the remaining sorting operations that cannot be eliminated, the optimizer optimizes the columns to be sorted before allocating the sort operator.
Prefix sort
When a sort operator needs to be allocated, if the input sequence of the operator can match part of the expected sequence, the prefix sort can be used to optimize the result by leveraging the input sequence.
After using the idx index, the table scan operator outputs the sequence (c1, c2, c3) for the following query. The prefix sort is applied, and for each group of c1, only the pk values are sorted.
Q2_2_4:
select /*+index(t idx)*/ c1 from t1 t order by c1, pk;
=======================================
|ID|OPERATOR |NAME |EST. ROWS|COST |
---------------------------------------
|0 |SORT | |76557 |18874|
|1 | TABLE SCAN|t(idx)|76557 |5319 |
=======================================
Outputs & filters:
-------------------------------------
0 - output([t.c1]), filter(nil), rowset=256,
sort_keys([t.c1, ASC], [t.pk, ASC]), prefix_pos(1)
1 - output([t.pk], [t.c1]), filter(nil), rowset=256,
access([t.pk], [t.c1]), partitions(p0)
Simplified sort columns
A certain amount of simplification is performed on the columns to be sorted to reduce the number of columns that need to be sorted. The main methods used for simplifying the sort columns are as follows:
Simplify the sort columns using equality conditions
select c1 from t1 t where c3 = 1 and c2 = c1 order by c3, c2, c1;For the above query, the columns are simplified using the conditions
c3 = 1andc2 = c1, and the data is sorted based on thec2column.Simplify the sort columns using the primary key or unique index
select c1 from t1 t order by c1, pk, c3, c2;For the above query, since the primary key is used, the data is sorted based on the
c1andpkcolumns.