In some extreme scenarios, we may encounter a situation where a single query reads only a few rows, but the queries per second (QPS) is extremely high, resulting in a very large number of data reads for this query. If the I/O capacity of a server is limited, the disk I/O may be overloaded in such cases.
Here is an example:
CREATE TABLE T1 (C1 INT, C2 INT, C3 INT, C4 INT, ... ,C16 INT);
CREATE INDEX IDX_C1 ON T1 (C1);
EXPLAIN SELECT C3, C4 FROM T1 WHERE C1 = 1;
| =========================================
|ID|OPERATOR |NAME |EST. ROWS|COST|
-----------------------------------------
|0 |TABLE SCAN|T1(IDX_C1)|4 |229 |
=========================================
Outputs & filters:
-------------------------------------
0 - output([T1.C3], [T1.C4]), filter(nil),
access([T1.C3], [T1.C4]), partitions(p0)
After using an index on column C1 for the above query, the optimal query range is obtained. The amount of data scanned on IDX_C1 during each query is small. Furthermore, the query needs to get values from columns C3 and C4, which are not stored on IDX_C1. Therefore, the query still needs to access the primary table by index primary key. For each query execution, there will be a small amount of sequential reads on IDX_C1 and a small amount of random reads on the primary table. When the QPS of the query is very high, a significant amount of random reads will be generated on the primary table, consuming a large amount of system I/O resources. For this scenario, an additional optimization approach needs to be considered: creating a covering index to avoid access to the primary table. Using the above query as an example, we can create a covering index to optimize performance.
CREATE INDEX IDX_C1_C3_C4 ON T1 (C1, C3, C4);
EXPLAIN SELECT C3, C4 FROM T1 WHERE C1 = 1;
| ===============================================
|ID|OPERATOR |NAME |EST. ROWS|COST|
-----------------------------------------------
|0 |TABLE SCAN|T1(IDX_C1_C3_C4)|4 |46 |
===============================================
Outputs & filters:
-------------------------------------
0 - output([T1.C3], [T1.C4]), filter(nil),
access([T1.C3], [T1.C4]), partitions(p0)
When the query accesses the IDX_C1_C3_C4 index, it reads data in the optimal query range without the need to access the primary table. When the query is executed, only a small number of sequential scans are performed on the index. This greatly reduces the query I/O.
Generally, we need to create a covering index in scenarios with the following characteristics:
- The I/O pressure on the database is very high.
- The high I/O is mainly caused by random I/O for access to the primary table in queries with high QPS.
- Queries with high QPS actually do not need to read data from many columns.
In such scenarios, we can create covering indexes for the queries with high QPS to optimize their execution. It is rarely required to use covering indexes to avoid table access by index primary key.