The partition pruning feature can avoid accessing irrelevant partitions, significantly improving SQL execution efficiency. This topic describes the principles and applications of partition pruning.
When you access a partitioned table, you often need to access only some of its partitions. The process by which the optimizer avoids accessing irrelevant partitions is called partition pruning. Partition pruning is an important optimization technique for partitioned tables, significantly enhancing SQL execution efficiency. You can leverage the partition pruning feature to add conditions that locate specific partitions, avoiding irrelevant data access and optimizing query performance.
Partition pruning is a complex process. The optimizer must extract relevant partition information based on the partition details of the user's table and the conditions specified in the SQL statement. Due to the complexity of the conditions in the SQL statement, the extraction logic becomes more complex. This process is handled by the Query Range sub-module in OceanBase Database.
When you access a partitioned table, all data where col1 is 1 is located in partition p1. Therefore, you only need to access partition p1 and not partitions 0, 2, 3, or 4. Here is an example:
obclient> CREATE TABLE tbl1(col1 INT,col2 INT) PARTITION BY HASH(col1) PARTITIONS 5;
obclient> SELECT * FROM tbl1 WHERE col1 = 1;
You can use the EXPLAIN statement to view the execution plan and see the results of partition pruning:
obclient> EXPLAIN SELECT * FROM tbl1 WHERE col1 = 1 \G
*************************** 1. row ***************************
Query Plan: ===================================
|ID|OPERATOR |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|TBL1|990 |383 |
===================================
Outputs & filters:
-------------------------------------
0 - output([TBL1.COL1], [TBL1.COL2]), filter(nil),
access([TBL1.COL1], [TBL1.COL2]), partitions(p1)
1 row in set
Principles of partition pruning
HASH and LIST partitioning
Partition pruning involves using the conditions in the WHERE clause to calculate the values of the partitioning columns and then determining which partitions need to be accessed based on the results. If the partitioning condition is an expression that appears as a whole in an equality condition, partition pruning can also be performed.
For example, if the partitioning condition is the expression c1 + c2 and this expression appears as a whole in an equality condition, partition pruning can be performed. Here is an example:
obclient> CREATE TABLE t1(c1 INT,c2 INT) PARTITION BY HASH(c1 + c2) PARTITIONS 5;
obclient> EXPLAIN SELECT * FROM t1 WHERE c1 + c2 = 1 \G
*************************** 1. row ***************************
Query Plan: ===================================
|ID|OPERATOR |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1 |5 |1303|
===================================
Outputs & filters:
-------------------------------------
0 - output([t1.c1], [t1.c2]), filter([t1.c1 + t1.c2 = 1]),
access([t1.c1], [t1.c2]), partitions(p1)
RANGE partitioning
You can determine the partitions to access by finding the intersection of the range of the partitioning key in the WHERE clause and the range of the table definition's partitions.
For RANGE partitions:
When the partitioning condition is a column, partition pruning is supported whether the query condition is an equality condition or an inequality condition.
When the partitioning condition is an expression and the query condition is an equality condition, partition pruning is supported.
For example, if the partitioning condition is the expression c1 and the query condition is an inequality condition c1 < 150 and c1 > 100, partition pruning can be performed. Here is an example:
obclient> CREATE TABLE t1(c1 INT,c2 INT) PARTITION BY RANGE(c1)
(PARTITION p0 VALUES LESS THAN(100),
PARTITION p1 VALUES LESS THAN(200)
);
Query OK, 0 rows affected
obclient> EXPLAIN SELECT * FROM t1 WHERE c1 < 150 and c1 > 110\G
*************************** 1. row ***************************
Query Plan: ===================================
|ID|OPERATOR |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1 |1 |46 |
===================================
Outputs & filters:
-------------------------------------
0 - output([t1.c1], [t1.c2]), filter([t1.c1 < 150], [t1.c1 > 110]),
access([t1.c1], [t1.c2]), partitions(p1)
1 row in set
Additionally, if the partitioning expression uses time-related functions such as YEAR(), TO_DAYS(), or TO_SECONDS(), partition pruning is supported based on the range specified in the query condition.
Here is an example:
obclient> CREATE TABLE tbl_r (log_id BIGINT NOT NULL,log_value VARCHAR(50),log_date datetime NOT NULL)
PARTITION BY RANGE(to_days(log_date))
(PARTITION M202001 VALUES LESS THAN(to_days('2020/02/01'))
, PARTITION M202002 VALUES LESS THAN(to_days('2020/03/01'))
, PARTITION M202003 VALUES LESS THAN(to_days('2020/04/01'))
, PARTITION M202004 VALUES LESS THAN(to_days('2020/05/01'))
, PARTITION M202005 VALUES LESS THAN(to_days('2020/06/01'))
, PARTITION M202006 VALUES LESS THAN(to_days('2020/07/01'))
, PARTITION M202007 VALUES LESS THAN(to_days('2020/08/01'))
, PARTITION M202008 VALUES LESS THAN(to_days('2020/09/01'))
, PARTITION M202009 VALUES LESS THAN(to_days('2020/10/01'))
, PARTITION M202010 VALUES LESS THAN(to_days('2020/11/01'))
, PARTITION M202011 VALUES LESS THAN(to_days('2020/12/01'))
, PARTITION M202012 VALUES LESS THAN(to_days('2021/01/01'))
);
Query OK, 0 rows affected
obclient> EXPLAIN SELECT * FROM tbl_r WHERE log_date > '2020/07/15' and log_date <'2020/10/07'\G
*************************** 1. row ***************************
Query Plan: ====================================================
|ID|OPERATOR |NAME |EST. ROWS|COST|
----------------------------------------------------
|0 |PX COORDINATOR | |1 |183 |
|1 | EXCHANGE OUT DISTR |:EX10000|1 |183 |
|2 | PX PARTITION ITERATOR| |1 |183 |
|3 | TABLE SCAN |tbl_r |1 |183 |
====================================================
Outputs & filters:
-------------------------------------
0 - output([INTERNAL_FUNCTION(tbl_r.log_id, tbl_r.log_value, tbl_r.log_date)]), filter(nil)
1 - output([INTERNAL_FUNCTION(tbl_r.log_id, tbl_r.log_value, tbl_r.log_date)]), filter(nil), dop=1
2 - output([tbl_r.log_date], [tbl_r.log_id], [tbl_r.log_value]), filter(nil)
3 - output([tbl_r.log_date], [tbl_r.log_id], [tbl_r.log_value]), filter([tbl_r.log_date > ?], [tbl_r.log_date < ?]),
access([tbl_r.log_date], [tbl_r.log_id], [tbl_r.log_value]), partitions(p[6-9])
1 row in set
Principles of sub-partition pruning
For sub-partitions, first determine the partitions to access based on the primary partitioning key, and then determine the sub-partitions to access based on the sub-partitioning key. Finally, calculate the product to determine all physical sub-partitions to access.
In the following example, after calculations, the primary partition pruning result is p0, and the sub-partition pruning result is sp0, so the physical partition to access is p0sp0.
Notice
In this example, the physical partition identifier to access is p0sp0, which is not the name of the sub-partition. For template-based sub-partitioned tables, the naming convention for sub-partitions is ($part_name)s($subpart_name), such as p0ssp0.
obclient> CREATE TABLE tbl2_rr(col1 INT,col2 INT)
PARTITION BY RANGE(col1)
SUBPARTITION BY RANGE(col2)
SUBPARTITION TEMPLATE
(SUBPARTITION sp0 VALUES LESS THAN(1000),
SUBPARTITION sp1 VALUES LESS THAN(2000)
)
(PARTITION p0 VALUES LESS THAN(100),
PARTITION p1 VALUES LESS THAN(200)
) ;
Query OK, 0 rows affected
obclient> EXPLAIN SELECT * FROM tbl2_rr
WHERE (col1 = 1 or col1 = 2) and (col2 > 101 and col2 < 150) \G
*************************** 1. row ***************************
Query Plan: ======================================
|ID|OPERATOR |NAME |EST. ROWS|COST|
--------------------------------------
|0 |TABLE SCAN|TBL2_RR|99 |53 |
======================================
Outputs & filters:
-------------------------------------
0 - output([TBL2_RR.COL1], [TBL2_RR.COL2]), filter(nil),
access([TBL2_RR.COL1], [TBL2_RR.COL2]), partitions(p0sp0)
1 row in set
In some scenarios, partition pruning may result in a slight amplification, but the optimizer ensures that the pruning result is a superset of the data to be accessed, without any data loss.