To optimize the performance of a query that contains the ORDER BY and LIMIT clauses, the first choice is to generate a streaming execution plan to reduce unnecessary data reads or processing. When a blocking operator must be assigned or there exist scenarios that cause high overhead, the optimizer can make use of the LIMIT clause to reduce unnecessary overhead in execution.
Eliminate sorting
For a simple single-table query that contains the ORDER BY and LIMIT clauses, you can create a composite index to eliminate sorting while using the index for fast scanning. This implements a streaming plan and avoids scanning a large number of rows that meet the filter conditions.
Q3_1_1:
create table t1(c1 int, c2 int, c3 int, c4 int, pk int primary key);
create index idx on t1(c1, c2, c3);
select * from t1 where c1 = 1 and c3 > 0 order by c2 limit 10;
For the Q3_1_1 query, a query range can be extracted from the equality condition c1 = 1 for a fast index scan. Due to the equality condition on c1, the scan on the idx index returns a result set sorted by c2. After 10 rows are scanned, the execution of the plan is completed in advance.
Note that the Q3_1_1 query has a c3 > 0 predicate. If you create an index based on the (c1, c3, c2) order, the query range can be directly extracted from the index for c3 > 0 filtering, but the result set is not sorted by c2. When the c3 > 0 predicate is highly selective, the elimination of sorting for a streaming plan and reduced number of rows to scan basically do not bring any performance improvement. In this case, the use of idx(c1, c2, c3) only makes the plan unable to fully utilize the selectivity of c3 > 0.
Reduce scanning and computing overhead
When a query on a wide table or a table with LOB columns contains the ORDER BY and LIMIT clauses, the optimizer uses the late materialization strategy to reduce the number of rows to scan.
For the following query Q3_2_1, both the filter column and the sort column are included in the idx index. The execution engine first scans the sort column, filter column, and primary key based on the idx index, performs a sort to determine the rows to return, and then scans the primary table by primary key to return the qualifying rows with all columns. For a complex query that contains the ORDER BY and LIMIT clauses, you can also rewrite the query by using the same strategy.
In the Q3_2_2 query, no_use_late_materialization prohibits late materialization. In this case, the execution engine has to access all rows that meet the filter condition in the primary table by index primary key.
Q3_2_1:
create table t1(c1 int, c2 int, c3 int, c4 int, pk int primary key);
create index idx on t1(c1, c2, c3);
select /*+index(t idx) */ * from t1 t where c1 > 0 order by c2 limit 10;
==============================================
|ID|OPERATOR |NAME |EST. ROWS|COST |
----------------------------------------------
|0 |NESTED-LOOP JOIN| |10 |14855|
|1 | TOP-N SORT | |10 |14842|
|2 | TABLE SCAN |t(idx) |93114 |11972|
|3 | TABLE GET |t1_alias|1 |2 |
==============================================
Outputs & filters:
-------------------------------------
0 - output([t.c1], [t.c2], [t1_alias.c3], [t1_alias.c4], [t.pk]), filter(nil), rowset=256,
conds(nil), nl_params_([t.pk])
1 - output([t.pk], [t.c1], [t.c2]), filter(nil), rowset=256, sort_keys([t.c2, ASC]), topn(10)
2 - output([t.pk], [t.c1], [t.c2]), filter(nil), rowset=256,
access([t.pk], [t.c1], [t.c2]), partitions(p0)
3 - output([t1_alias.c3], [t1_alias.c4]), filter(nil), rowset=256,
access([t1_alias.c3], [t1_alias.c4]), partitions(p0)
Q3_2_2:
select /*+index(t idx) no_use_late_materialization*/ * from t1 t
where c1 > 0 order by c2 limit 10;
========================================
|ID|OPERATOR |NAME |EST. ROWS|COST |
----------------------------------------
|0 |TOP-N SORT | |10 |265257|
|1 | TABLE SCAN|t(idx)|93114 |262387|
========================================
Outputs & filters:
-------------------------------------
0 - output([t.c1], [t.c2], [t.c3], [t.c4], [t.pk]), filter(nil), rowset=256, sort_keys([t.c2, ASC]), topn(10)
1 - output([t.pk], [t.c1], [t.c2], [t.c3], [t.c4]), filter(nil), rowset=256,
access([t.pk], [t.c1], [t.c2], [t.c3], [t.c4]), partitions(p0)