The partition pruning feature helps avoid accessing irrelevant partitions, which significantly improves SQL execution efficiency. This topic introduces the principles and application of partition pruning.
When you access a partitioned table, you often need to access only some of the partitions. The process in which an optimizer eliminates access to irrelevant partitions is called partition pruning. Partition pruning is an important optimization technique for partitioned tables. It can significantly improve SQL execution efficiency. You can use the characteristics of partition pruning to add conditions in your query to avoid accessing irrelevant data and optimize query performance.
Partition pruning is a complex process whereby the optimizer extracts relevant partition information from the partition information of a table and the conditions specified in the SQL statement. Usually, the conditions in an SQL statement are complex, making the extraction logic more complex. This procedure is performed by the Query Range module of OceanBase Database.
Assume that you want to access data with col1 set to 1 and all data that meets this condition is located in partition 1 (p1). In this case, you need to access only p1 and do not need to access p0, p2, p3, or p4. A sample statement is as follows:
obclient> CREATE TABLE tbl1(col1 INT,col2 INT) PARTITION BY HASH(col1) PARTITIONS 5;
obclient> SELECT * FROM tbl1 WHERE col1 = 1;
Execute the EXPLAIN statement to query the execution plan and the partition pruning result.
obclient> EXPLAIN SELECT * FROM tbl1 WHERE col1 = 1;
The result is as follows:
+------------------------------------------------------------------------------------+
| Query Plan |
+------------------------------------------------------------------------------------+
| =============================================== |
| |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| |
| ----------------------------------------------- |
| |0 |TABLE FULL SCAN|TBL1|1 |4 | |
| =============================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([TBL1.COL1], [TBL1.COL2]), filter([TBL1.COL1 = 1]), rowset=16 |
| access([TBL1.COL1], [TBL1.COL2]), partitions(p1) |
| is_index_back=false, is_global_index=false, filter_before_indexback[false], |
| range_key([TBL1.__pk_increment]), range(MIN ; MAX)always true |
+------------------------------------------------------------------------------------+
11 rows in set
Principles of partition pruning
HASH or LIST partitioning
In partition pruning, the values in the columns of partitions are calculated based on conditions specified in the WHERE clause. The values are then used to determine which partitions to access. If the partitioning condition is an expression that can be used as a whole in an equation condition, partition pruning can also be performed.
In the following example of partition pruning, the partitioning condition is the expression c1 + c2 that is used as a whole in an equation condition.
obclient> CREATE TABLE t1(c1 INT,c2 INT) PARTITION BY HASH(c1 + c2) PARTITIONS 5;
obclient> EXPLAIN SELECT * FROM t1 WHERE c1 + c2 = 1 \G
*************************** 1. row ***************************
Query Plan: ===================================
|ID|OPERATOR |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1 |5 |1303|
===================================
Outputs & filters:
-------------------------------------
0 - output([t1.c1], [t1.c2]), filter([t1.c1 + t1.c2 = 1]),
access([t1.c1], [t1.c2]), partitions(p1)
RANGE partitioning
For a RANGE-partitioned table, the partitions to access are the intersection of the range defined by the partitioning key in the WHERE clause and the partition range defined by the table. For RANGE partitioning, because of the monotonicity of functions, if the partitioning condition is a function and the WHERE clause specifies a range, partition pruning is not supported.
In the following example, partition pruning is not supported because the partitioning expression is the function c1 + 1 and the query condition is c1 < 150 and c1 > 100 instead of an equation. Here is an example:
obclient> CREATE TABLE t1(c1 INT,c2 INT) PARTITION BY RANGE(c1)
(PARTITION p0 VALUES LESS THAN(100),
PARTITION p1 VALUES LESS THAN(200)
);
obclient> SELECT * FROM t1 WHERE c1 < 150 and c1 > 110;
Execute the EXPLAIN statement to query the partition pruning result.
obclient> EXPLAIN SELECT * FROM t1 WHERE c1 < 150 and c1 > 110;
The result is as follows:
| Query Plan |
+------------------------------------------------------------------------------------------+
| =============================================== |
| |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| |
| ----------------------------------------------- |
| |0 |TABLE FULL SCAN|T1 |1 |4 | |
| =============================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([T1.C1], [T1.C2]), filter([T1.C1 < 150], [T1.C1 > 110]), rowset=16 |
| access([T1.C1], [T1.C2]), partitions(p1) |
| is_index_back=false, is_global_index=false, filter_before_indexback[false,false], |
| range_key([T1.__pk_increment]), range(MIN ; MAX)always true |
+------------------------------------------------------------------------------------------+
11 rows in set
Principles of subpartition pruning
In subpartition pruning, the partitions to access are first determined based on the partitioning key, and the subpartitions to access are determined based on the subpartitioning key. Then, the results are combined to determine all physical partitions to access.
In the following example, p0 is the partition pruning result, and sp0 is the subpartition pruning result. Therefore, the final physical partition to access is p0sp0.
obclient> CREATE TABLE tbl2_rr(col1 INT,col2 INT)
PARTITION BY RANGE(col1)
SUBPARTITION BY RANGE(col2)
SUBPARTITION TEMPLATE
(SUBPARTITION sp0 VALUES LESS THAN(1000),
SUBPARTITION sp1 VALUES LESS THAN(2000)
)
(PARTITION p0 VALUES LESS THAN(100),
PARTITION p1 VALUES LESS THAN(200)
);
obclient> SELECT * FROM tbl2_rr
WHERE (col1 = 1 or col1 = 2) and (col2 > 101 and col2 < 150);
Execute the EXPLAIN statement to query the partition pruning result.
obclient> EXPLAIN SELECT * FROM tbl2_rr
WHERE (col1 = 1 or col1 = 2) and (col2 > 101 and col2 < 150);
The result is as follows:
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan |
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| ================================================== |
| |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| |
| -------------------------------------------------- |
| |0 |TABLE FULL SCAN|TBL2_RR|1 |4 | |
| ================================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([TBL2_RR.COL1], [TBL2_RR.COL2]), filter([TBL2_RR.COL2 > 101], [TBL2_RR.COL2 < 150], [TBL2_RR.COL1 = 1 OR TBL2_RR.COL1 = 2]), rowset=16 |
| access([TBL2_RR.COL1], [TBL2_RR.COL2]), partitions(p0sp0) |
| is_index_back=false, is_global_index=false, filter_before_indexback[false,false,false], |
| range_key([TBL2_RR.__pk_increment]), range(MIN ; MAX)always true |
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
11 rows in set
In some cases, the result set of partition pruning may be large, but the optimizer can ensure that this set is a superset of the data to be accessed and no data is lost.