Generally, the results of an index scan are sorted by indexed columns. For example, given an index IDX(C1, C2, C3), scanning this index will yield results that are ordered by C1, C2, C3. This ordered nature can be used to optimize queries that include an ORDER BY clause. This topic describes how to construct an index for optimization by taking the following query as an example.
CREATE TABLE T1 (C1 INT, C2 INT, C3 INT, C4 INT);
SELECT * FROM T1 WHERE C1 = 1 ORDER BY C3 LIMIT 5;
For the query, you can create the IDX_C1 index on the C1 column based on the filter predicate.
CREATE INDEX IDX_C1 ON T1(C1);
EXPLAIN SELECT * FROM T1 WHERE C1 = 1 ORDER BY C3 LIMIT 5;
| ===========================================
|ID|OPERATOR |NAME |EST. ROWS|COST|
-------------------------------------------
|0 |LIMIT | |5 |6527|
|1 | TOP-N SORT | |5 |6527|
|2 | TABLE SCAN|t1(IDX_C1)|990 |5832|
===========================================
Outputs & filters:
-------------------------------------
0 - output([t1.C1], [t1.C2], [t1.C3], [t1.C4]), filter(nil), limit(5), offset(nil)
1 - output([t1.C1], [t1.C2], [t1.C3], [t1.C4]), filter(nil), sort_keys([t1.C3, ASC]), topn(5)
2 - output([t1.C1], [t1.C2], [t1.C3], [t1.C4]), filter(nil),
access([t1.C1], [t1.C2], [t1.C3], [t1.C4]), partitions(p0)
By using the IDX_C1 index, the database scans all rows that meet the C1 = 1 condition. Then, the database performs heap sort by C3 and retains the five rows with the smallest C3 values. If there are many rows that meet the C1 = 1 condition, scan and heap sort will take more time. In this case, you can take further advantage of the sorted index scan result. You can create the IDX_C1_C3 index on the C1, C3 columns.
CREATE INDEX IDX_C1_C3 ON T1(C1, C3);
EXPLAIN SELECT * FROM T1 WHERE C1 = 1 ORDER BY C3 LIMIT 5;
| ============================================
|ID|OPERATOR |NAME |EST. ROWS|COST|
--------------------------------------------
|0 |TABLE SCAN|t1(IDX_C1_C3)|5 |111 |
============================================
Outputs & filters:
-------------------------------------
0 - output([t1.C1], [t1.C2], [t1.C3], [t1.C4]), filter(nil),
access([t1.C1], [t1.C2], [t1.C3], [t1.C4]), partitions(p0),
limit(5), offset(nil)
This index has two benefits:
- The
C1 = 1predicate can be used to determine the query range on the index. - The scan result is sorted by
C1, C3. Since the values ofC1are fixed, the result is also sorted byC3. The execution engine only needs to scan the first five rows to obtain the final query result. The sort operation is thereby eliminated.
For an index on (C1, C2, C3, C4), if a query has equality predicates on C1, C2, the scan result is sorted by C3, C4. However, if the query has a non-equality predicate on the C2 column, the scan result is sorted by C2, C3, C4.
For the following query, the optimization can be performed in two ways:
- Use an index on
C1, C2to narrow down the query range with more predicates. - Use an index on
C1, C3to avoid sorting based on the sorted scan result.
To determine which optimization method is better, you need to consider the selectivity of the predicates C1 = 1 and C2 > 1.
EXPLAIN SELECT * FROM T1 WHERE C1 = 1 AND C2 > 1 ORDER BY C3 LIMIT 5;
The rules are as follows:
- If neither of
C1 = 1andC2 > 1is selective enough, the index onC1, C3will be better. - If
C1 = 1is not selective enough butC2 > 1is highly selective, the index onC1, C2will be better. - If
C1 = 1is highly selective, the performance of the two indexes may be close. The performance of the index onC1, C2is more stable. The performance of the index onC1, C3depends on the actual data distribution, and may be extremely good in some scenarios but poor in others.