The partition pruning feature helps avoid accessing irrelevant partitions, which significantly improves SQL execution efficiency. This topic introduces the principles and application of partition pruning.
When you access a partitioned table, you often need to access only some of the partitions. The process in which an optimizer eliminates access to irrelevant partitions is called partition pruning. Partition pruning is an important optimization technique for partitioned tables. It can significantly improve SQL execution efficiency. You can use the characteristics of partition pruning to add conditions in your query to avoid accessing irrelevant data and optimize query performance.
Partition pruning is a complex process whereby the optimizer extracts relevant partition information from the partition information of a table and the conditions specified in the SQL statement. Usually, the conditions in an SQL statement are complex, making the extraction logic more complex. This procedure is performed by the Query Range module of OceanBase Database.
Assume that you want to access data with col1 set to 1 and all data that meets this condition is located in partition 1 (p1). In this case, you need to access only p1 and do not need to access p0, p2, p3, or p4. A sample statement is as follows:
obclient> CREATE TABLE tbl1(col1 INT,col2 INT) PARTITION BY HASH(col1) PARTITIONS 5;
obclient> SELECT * FROM tbl1 WHERE col1 = 1;
Execute the EXPLAIN statement to query the execution plan and the partition pruning result:
obclient> EXPLAIN SELECT * FROM tbl1 WHERE col1 = 1;
The result is as follows:
+------------------------------------------------------------------------------------+
| Query Plan |
+------------------------------------------------------------------------------------+
| =============================================== |
| |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| |
| ----------------------------------------------- |
| |0 |TABLE FULL SCAN|TBL1|1 |4 | |
| =============================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([TBL1.COL1], [TBL1.COL2]), filter([TBL1.COL1 = 1]), rowset=16 |
| access([TBL1.COL1], [TBL1.COL2]), partitions(p1) |
| is_index_back=false, is_global_index=false, filter_before_indexback[false], |
| range_key([TBL1.__pk_increment]), range(MIN ; MAX)always true |
+------------------------------------------------------------------------------------+
11 rows in set
Principles of partition pruning
HASH or LIST partitioning
Partition pruning is based on the conditions in the WHERE clause, which calculate the values of the partition columns and then determine which partitions need to be accessed based on the results. Partition pruning can only be performed when the query condition is an equality condition.
For example, when creating a partitioned table with the partition key as c1 and the query condition as an equality condition, the following actions can be observed:
obclient> CREATE TABLE t1(c1 INT,c2 INT) PARTITION BY HASH(c1) PARTITIONS 5;
obclient> SELECT * FROM t1 WHERE c1 = 1;
Execute the EXPLAIN statement to query the partition pruning result:
obclient> EXPLAIN SELECT * FROM t1 WHERE c1 = 1;
The execution result is as follows:
+------------------------------------------------------------------------------------+
| Query Plan |
+------------------------------------------------------------------------------------+
| =============================================== |
| |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| |
| ----------------------------------------------- |
| |0 |TABLE FULL SCAN|T1 |1 |4 | |
| =============================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([T1.C1], [T1.C2]), filter([T1.C1 = 1]), rowset=16 |
| access([T1.C1], [T1.C2]), partitions(p1) |
| is_index_back=false, is_global_index=false, filter_before_indexback[false], |
| range_key([T1.__pk_increment]), range(MIN ; MAX)always true |
+------------------------------------------------------------------------------------+
11 rows in set
If the query condition includes a non-equality condition, partition pruning cannot be performed:
obclient> EXPLAIN SELECT * FROM t1 WHERE c1 > 1;
The execution result is as follows:
+------------------------------------------------------------------------------------+
| Query Plan |
+------------------------------------------------------------------------------------+
| ============================================================= |
| |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| |
| ------------------------------------------------------------- |
| |0 |PX COORDINATOR | |1 |21 | |
| |1 |└─EXCHANGE OUT DISTR |:EX10000|1 |21 | |
| |2 | └─PX PARTITION ITERATOR| |1 |19 | |
| |3 | └─TABLE FULL SCAN |T1 |1 |19 | |
| ============================================================= |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([INTERNAL_FUNCTION(T1.C1, T1.C2)]), filter(nil), rowset=16 |
| 1 - output([INTERNAL_FUNCTION(T1.C1, T1.C2)]), filter(nil), rowset=16 |
| dop=1 |
| 2 - output([T1.C1], [T1.C2]), filter(nil), rowset=16 |
| force partition granule |
| 3 - output([T1.C1], [T1.C2]), filter([T1.C1 > 1]), rowset=16 |
| access([T1.C1], [T1.C2]), partitions(p[0-4]) |
| is_index_back=false, is_global_index=false, filter_before_indexback[false], |
| range_key([T1.__pk_increment]), range(MIN ; MAX)always true |
+------------------------------------------------------------------------------------+
19 rows in set
RANGE partitioning
Partition pruning determines the partitions that need to be accessed by intersecting the partition key in the WHERE clause with the partition range defined in the table. Partition pruning can be performed regardless of whether the query condition is an equality condition or a non-equality condition.
For example, when the partition key of the following partitioned table is c1, and the query condition is a non-equality condition c1 < 150 and c1 > 110, partition pruning can be performed:
obclient> CREATE TABLE t1(c1 INT,c2 INT) PARTITION BY RANGE(c1)
(PARTITION p0 VALUES LESS THAN(100),
PARTITION p1 VALUES LESS THAN(200)
);
obclient> SELECT * FROM t1 WHERE c1 < 150 and c1 > 110;
Execute the EXPLAIN statement to query the partition pruning result:
obclient> EXPLAIN SELECT * FROM t1 WHERE c1 < 150 and c1 > 110;
The execution result is as follows:
| Query Plan |
+------------------------------------------------------------------------------------------+
| =============================================== |
| |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| |
| ----------------------------------------------- |
| |0 |TABLE FULL SCAN|T1 |1 |4 | |
| =============================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([T1.C1], [T1.C2]), filter([T1.C1 < 150], [T1.C1 > 110]), rowset=16 |
| access([T1.C1], [T1.C2]), partitions(p1) |
| is_index_back=false, is_global_index=false, filter_before_indexback[false,false], |
| range_key([T1.__pk_increment]), range(MIN ; MAX)always true |
+------------------------------------------------------------------------------------------+
11 rows in set
Principles of subpartition pruning
In subpartition pruning, the partitions to access are first determined based on the partitioning key, and the subpartitions to access are determined based on the subpartitioning key. Then, the results are combined to determine all physical partitions to access.
In the following example, p0 is the partition pruning result, and sp0 is the subpartition pruning result. Therefore, the final physical partition to access is p0sp0.
Notice
In this example, the physical partition identifier to access is p0sp0. This identifier is not the name of a subpartition. For a table with template-based subpartitions, the subpartition naming rule is ($part_name)s($subpart_name). For example, p0ssp0.
obclient> CREATE TABLE tbl2_rr(col1 INT,col2 INT)
PARTITION BY RANGE(col1)
SUBPARTITION BY RANGE(col2)
SUBPARTITION TEMPLATE
(SUBPARTITION sp0 VALUES LESS THAN(1000),
SUBPARTITION sp1 VALUES LESS THAN(2000)
)
(PARTITION p0 VALUES LESS THAN(100),
PARTITION p1 VALUES LESS THAN(200)
);
obclient> SELECT * FROM tbl2_rr
WHERE (col1 = 1 or col1 = 2) and (col2 > 101 and col2 < 150);
Execute the EXPLAIN statement to query the partition pruning result:
obclient> EXPLAIN SELECT * FROM tbl2_rr
WHERE (col1 = 1 or col1 = 2) and (col2 > 101 and col2 < 150);
The result is as follows:
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan |
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| ================================================== |
| |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| |
| -------------------------------------------------- |
| |0 |TABLE FULL SCAN|TBL2_RR|1 |4 | |
| ================================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([TBL2_RR.COL1], [TBL2_RR.COL2]), filter([TBL2_RR.COL2 > 101], [TBL2_RR.COL2 < 150], [TBL2_RR.COL1 = 1 OR TBL2_RR.COL1 = 2]), rowset=16 |
| access([TBL2_RR.COL1], [TBL2_RR.COL2]), partitions(p0sp0) |
| is_index_back=false, is_global_index=false, filter_before_indexback[false,false,false], |
| range_key([TBL2_RR.__pk_increment]), range(MIN ; MAX)always true |
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
11 rows in set
In some cases, the result set of partition pruning may be large, but the optimizer can ensure that this set is a superset of the data to be accessed and no data is lost.