Partition pruning is a feature that can avoid accessing irrelevant partitions, significantly improving the execution efficiency of SQL statements. This topic describes the principles and applications of partition pruning.
When you access a partitioned table, you often need to access only some of its partitions. The process by which the optimizer avoids accessing irrelevant partitions is called partition pruning. As an important optimization method for partitioned tables, partition pruning can significantly improve the execution efficiency of SQL statements. You can leverage the partition pruning feature to add conditions that locate specific partitions, thereby avoiding irrelevant data access and optimizing query performance.
Partition pruning is a complex process. The optimizer needs to extract relevant partition information based on the partition details of the user's table and the conditions specified in the SQL statement. The complexity of the SQL conditions often increases the complexity of the extraction logic, which is handled by the Query Range sub-module in OceanBase Database.
When you access a partitioned table, all data where col1 is 1 is located in partition p1. Therefore, you only need to access p1, and there is no need to access partitions 0, 2, 3, or 4. The following code shows an example:
obclient> CREATE TABLE tbl1(col1 INT,col2 INT) PARTITION BY HASH(col1) PARTITIONS 5;
obclient> SELECT * FROM tbl1 WHERE col1 = 1;
By using EXPLAIN to view the execution plan, you can see the result of partition pruning.
obclient> EXPLAIN SELECT * FROM tbl1 WHERE col1 = 1;
The result is as follows:
+------------------------------------------------------------------------------------+
| Query Plan |
+------------------------------------------------------------------------------------+
| =============================================== |
| |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| |
| ----------------------------------------------- |
| |0 |TABLE FULL SCAN|TBL1|1 |4 | |
| =============================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([TBL1.COL1], [TBL1.COL2]), filter([TBL1.COL1 = 1]), rowset=16 |
| access([TBL1.COL1], [TBL1.COL2]), partitions(p1) |
| is_index_back=false, is_global_index=false, filter_before_indexback[false], |
| range_key([TBL1.__pk_increment]), range(MIN ; MAX)always true |
+------------------------------------------------------------------------------------+
11 rows in set
Basic principles of partition pruning
Hash or list partitioning
Partition pruning is based on calculating the values of the partitioning key using the conditions in the where clause and then determining which partitions need to be accessed based on the results. Partition pruning can only be performed when the query condition is an equality condition.
Create a partitioned table with the partitioning key c1 and an equality condition as the query condition. Example:
obclient> CREATE TABLE t1(c1 INT,c2 INT) PARTITION BY HASH(c1) PARTITIONS 5;
obclient> SELECT * FROM t1 WHERE c1 = 1;
You can run the EXPLAIN statement to view the execution plan and see the partition pruning result.
obclient> EXPLAIN SELECT * FROM t1 WHERE c1 = 1;
The execution result is as follows:
+------------------------------------------------------------------------------------+
| Query Plan |
+------------------------------------------------------------------------------------+
| =============================================== |
| |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| |
| ----------------------------------------------- |
| |0 |TABLE FULL SCAN|T1 |1 |4 | |
| =============================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([T1.C1], [T1.C2]), filter([T1.C1 = 1]), rowset=16 |
| access([T1.C1], [T1.C2]), partitions(p1) |
| is_index_back=false, is_global_index=false, filter_before_indexback[false], |
| range_key([T1.__pk_increment]), range(MIN ; MAX)always true |
+------------------------------------------------------------------------------------+
11 rows in set
If the query condition is not an equality condition, partition pruning cannot be performed.
obclient> EXPLAIN SELECT * FROM t1 WHERE c1 > 1;
The execution result is as follows:
+------------------------------------------------------------------------------------+
| Query Plan |
+------------------------------------------------------------------------------------+
| ============================================================= |
| |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| |
| ------------------------------------------------------------- |
| |0 |PX COORDINATOR | |1 |21 | |
| |1 |└─EXCHANGE OUT DISTR |:EX10000|1 |21 | |
| |2 | └─PX PARTITION ITERATOR| |1 |19 | |
| |3 | └─TABLE FULL SCAN |T1 |1 |19 | |
| ============================================================= |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([INTERNAL_FUNCTION(T1.C1, T1.C2)]), filter(nil), rowset=16 |
| 1 - output([INTERNAL_FUNCTION(T1.C1, T1.C2)]), filter(nil), rowset=16 |
| dop=1 |
| 2 - output([T1.C1], [T1.C2]), filter(nil), rowset=16 |
| force partition granule |
| 3 - output([T1.C1], [T1.C2]), filter([T1.C1 > 1]), rowset=16 |
| access([T1.C1], [T1.C2]), partitions(p[0-4]) |
| is_index_back=false, is_global_index=false, filter_before_indexback[false], |
| range_key([T1.__pk_increment]), range(MIN ; MAX)always true |
+------------------------------------------------------------------------------------+
19 rows in set
Range partitioning
The partitions to be accessed are determined based on the intersection of the partitioning key range in the where clause and the partition range defined in the table. Partition pruning can be performed regardless of whether the query condition is an equality condition or a non-equality condition.
For example, the partitioning key of the following partitioned table is c1. When the query condition is a non-equality condition c1 < 150 and c1 > 100, partition pruning can be performed. Example:
obclient> CREATE TABLE t1(c1 INT,c2 INT) PARTITION BY RANGE(c1)
(PARTITION p0 VALUES LESS THAN(100),
PARTITION p1 VALUES LESS THAN(200)
);
obclient> SELECT * FROM t1 WHERE c1 < 150 and c1 > 110;
You can run the EXPLAIN statement to view the partition pruning result.
obclient> EXPLAIN SELECT * FROM t1 WHERE c1 < 150 and c1 > 110;
The execution result is as follows:
+------------------------------------------------------------------------------------------+
| Query Plan |
+------------------------------------------------------------------------------------------+
| =============================================== |
| |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| |
| ----------------------------------------------- |
| |0 |TABLE FULL SCAN|T1 |1 |4 | |
| =============================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([T1.C1], [T1.C2]), filter([T1.C1 < 150], [T1.C1 > 110]), rowset=16 |
| access([T1.C1], [T1.C2]), partitions(p1) |
| is_index_back=false, is_global_index=false, filter_before_indexback[false,false], |
| range_key([T1.__pk_increment]), range(MIN ; MAX)always true |
+------------------------------------------------------------------------------------------+
11 rows in set
Basic principles of subpartition pruning
For subpartition pruning, first determine the partitions to be accessed based on the primary partitioning key, and then determine the subpartitions to be accessed based on the subpartitioning key. Finally, compute the product of the two sets to find all the physical subpartitions to be accessed.
In the following example, after calculations, the result of primary partition pruning is p0, and the result of subpartition pruning is sp0, so the physical subpartition to be accessed is p0sp0.
Notice
In this example, the physical subpartition identifier to be accessed is p0sp0. This identifier is not the name of a subpartition. For a template-based subpartitioned table, the naming rule for subpartitions is ($part_name)s($subpart_name). For example, p0ssp0.
obclient> CREATE TABLE tbl2_rr(col1 INT,col2 INT)
PARTITION BY RANGE(col1)
SUBPARTITION BY RANGE(col2)
SUBPARTITION TEMPLATE
(SUBPARTITION sp0 VALUES LESS THAN(1000),
SUBPARTITION sp1 VALUES LESS THAN(2000)
)
(PARTITION p0 VALUES LESS THAN(100),
PARTITION p1 VALUES LESS THAN(200)
) ;
obclient> SELECT * FROM tbl2_rr
WHERE (col1 = 1 or col1 = 2) and (col2 > 101 and col2 < 150);
You can run the EXPLAIN statement to view the partition pruning result.
obclient> EXPLAIN SELECT * FROM tbl2_rr
WHERE (col1 = 1 or col1 = 2) and (col2 > 101 and col2 < 150);
The execution result is as follows:
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan |
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| ================================================== |
| |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| |
| -------------------------------------------------- |
| |0 |TABLE FULL SCAN|TBL2_RR|1 |4 | |
| ================================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([TBL2_RR.COL1], [TBL2_RR.COL2]), filter([TBL2_RR.COL2 > 101], [TBL2_RR.COL2 < 150], [TBL2_RR.COL1 = 1 OR TBL2_RR.COL1 = 2]), rowset=16 |
| access([TBL2_RR.COL1], [TBL2_RR.COL2]), partitions(p0sp0) |
| is_index_back=false, is_global_index=false, filter_before_indexback[false,false,false], |
| range_key([TBL2_RR.__pk_increment]), range(MIN ; MAX)always true |
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
11 rows in set
In some scenarios, partition pruning may result in a certain degree of amplification, but the optimizer can ensure that the pruning result is a superset of the data to be accessed, without any data loss.