Partition pruning can avoid accessing irrelevant partitions, significantly improving SQL execution efficiency. This topic describes the principles and applications of partition pruning.
When you access a partitioned table, you often need to access only some of its partitions. The process of avoiding irrelevant partitions by the optimizer is called partition pruning. As an important optimization method for partitioned tables, partition pruning can significantly improve SQL execution efficiency. You can use the partition pruning feature to add conditions for locating partitions and avoid accessing irrelevant data to optimize query performance.
Partition pruning is a complex process. The optimizer needs to extract relevant partition information based on the partition information of the user table and the conditions specified in the SQL statement. Due to the complexity of the conditions in the SQL statement, the entire extraction logic becomes more complex, and this process is handled by the Query Range sub-module in OceanBase Database.
When you access a partitioned table, all data where col1 is 1 is in partition p1. Therefore, you only need to access partition p1, and you do not need to access partitions 0, 2, 3, or 4. The following code shows an example:
obclient> CREATE TABLE tbl1(col1 INT,col2 INT) PARTITION BY HASH(col1) PARTITIONS 5;
obclient> SELECT * FROM tbl1 WHERE col1 = 1;
You can run the EXPLAIN command to view the execution plan and see the result of partition pruning:
obclient> EXPLAIN SELECT * FROM tbl1 WHERE col1 = 1 \G
*************************** 1. row ***************************
Query Plan: ===================================
|ID|OPERATOR |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|TBL1|990 |383 |
===================================
Outputs & filters:
-------------------------------------
0 - output([TBL1.COL1], [TBL1.COL2]), filter(nil),
access([TBL1.COL1], [TBL1.COL2]), partitions(p1)
1 row in set
Principles of partition pruning
HASH and LIST partitioning
Partition pruning is based on the conditions in the WHERE clause and calculates the values of the partitioning columns to determine which partitions need to be accessed. If the partitioning condition is an expression that appears as a whole in an equality condition, partition pruning can also be performed.
An example of partition pruning based on the partitioning condition c1 + c2, which appears as a whole in an equality condition, is as follows:
obclient> CREATE TABLE t1(c1 INT,c2 INT) PARTITION BY HASH(c1 + c2) PARTITIONS 5;
obclient> EXPLAIN SELECT * FROM t1 WHERE c1 + c2 = 1 \G
*************************** 1. row ***************************
Query Plan: ===================================
|ID|OPERATOR |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1 |5 |1303|
===================================
Outputs & filters:
-------------------------------------
0 - output([t1.c1], [t1.c2]), filter([t1.c1 + t1.c2 = 1]),
access([t1.c1], [t1.c2]), partitions(p1)
RANGE partitioning
You can determine the partitions to be accessed based on the intersection of the partitioning key range in the WHERE clause and the partition range defined in the table.
For RANGE partitioning:
When the partitioning condition is a column, partition pruning is supported whether the query condition is an equality condition or an inequality condition.
When the partitioning condition is an expression and the query condition is an equality condition, partition pruning is supported.
For example, if the partitioning condition is an expression c1 and the query condition is an inequality condition c1 < 150 and c1 > 100, partition pruning is supported. An example is as follows:
obclient> CREATE TABLE t1(c1 INT,c2 INT) PARTITION BY RANGE(c1)
(PARTITION p0 VALUES LESS THAN(100),
PARTITION p1 VALUES LESS THAN(200)
);
Query OK, 0 rows affected
obclient> EXPLAIN SELECT * FROM t1 WHERE c1 < 150 and c1 > 110\G
*************************** 1. row ***************************
Query Plan: ===================================
|ID|OPERATOR |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1 |1 |46 |
===================================
Outputs & filters:
-------------------------------------
0 - output([t1.c1], [t1.c2]), filter([t1.c1 < 150], [t1.c1 > 110]),
access([t1.c1], [t1.c2]), partitions(p1)
1 row in set
Additionally, if the partitioning condition's expression uses time functions such as YEAR(), TO_DAYS(), or TO_SECONDS(), partition pruning is also supported based on the range specified in the query condition.
An example is as follows:
obclient> CREATE TABLE tbl_r (log_id BIGINT NOT NULL,log_value VARCHAR(50),log_date datetime NOT NULL)
PARTITION BY RANGE(to_days(log_date))
(PARTITION M202001 VALUES LESS THAN(to_days('2020/02/01'))
, PARTITION M202002 VALUES LESS THAN(to_days('2020/03/01'))
, PARTITION M202003 VALUES LESS THAN(to_days('2020/04/01'))
, PARTITION M202004 VALUES LESS THAN(to_days('2020/05/01'))
, PARTITION M202005 VALUES LESS THAN(to_days('2020/06/01'))
, PARTITION M202006 VALUES LESS THAN(to_days('2020/07/01'))
, PARTITION M202007 VALUES LESS THAN(to_days('2020/08/01'))
, PARTITION M202008 VALUES LESS THAN(to_days('2020/09/01'))
, PARTITION M202009 VALUES LESS THAN(to_days('2020/10/01'))
, PARTITION M202010 VALUES LESS THAN(to_days('2020/11/01'))
, PARTITION M202011 VALUES LESS THAN(to_days('2020/12/01'))
, PARTITION M202012 VALUES LESS THAN(to_days('2021/01/01'))
);
Query OK, 0 rows affected
obclient> EXPLAIN SELECT * FROM tbl_r WHERE log_date > '2020/07/15' and log_date <'2020/10/07'\G
*************************** 1. row ***************************
Query Plan: ====================================================
|ID|OPERATOR |NAME |EST. ROWS|COST|
----------------------------------------------------
|0 |PX COORDINATOR | |1 |183 |
|1 | EXCHANGE OUT DISTR |:EX10000|1 |183 |
|2 | PX PARTITION ITERATOR| |1 |183 |
|3 | TABLE SCAN |tbl_r |1 |183 |
====================================================
Outputs & filters:
-------------------------------------
0 - output([INTERNAL_FUNCTION(tbl_r.log_id, tbl_r.log_value, tbl_r.log_date)]), filter(nil)
1 - output([INTERNAL_FUNCTION(tbl_r.log_id, tbl_r.log_value, tbl_r.log_date)]), filter(nil), dop=1
2 - output([tbl_r.log_date], [tbl_r.log_id], [tbl_r.log_value]), filter(nil)
3 - output([tbl_r.log_date], [tbl_r.log_id], [tbl_r.log_value]), filter([tbl_r.log_date > ?], [tbl_r.log_date < ?]),
access([tbl_r.log_date], [tbl_r.log_id], [tbl_r.log_value]), partitions(p[6-9])
1 row in set
Principles of sub-partition pruning
For sub-partition pruning, first determine the partitions to be accessed based on the primary partitioning key, and then determine the sub-partitions to be accessed based on the sub-partitioning key. Finally, determine all physical sub-partitions to be accessed by taking the product of the primary partitions and sub-partitions.
In the following example, after calculation, the primary partition pruning result is p0, and the sub-partition pruning result is sp0, so the physical partition to be accessed is p0sp0.
Notice
In this example, the physical partition identifier to be accessed is p0sp0, which is not the name of the sub-partition. For a templated sub-partitioned table, the naming convention for sub-partitions is ($part_name)s($subpart_name), such as p0ssp0.
obclient> CREATE TABLE tbl2_rr(col1 INT,col2 INT)
PARTITION BY RANGE(col1)
SUBPARTITION BY RANGE(col2)
SUBPARTITION TEMPLATE
(SUBPARTITION sp0 VALUES LESS THAN(1000),
SUBPARTITION sp1 VALUES LESS THAN(2000)
)
(PARTITION p0 VALUES LESS THAN(100),
PARTITION p1 VALUES LESS THAN(200)
) ;
Query OK, 0 rows affected
obclient> EXPLAIN SELECT * FROM tbl2_rr
WHERE (col1 = 1 or col1 = 2) and (col2 > 101 and col2 < 150) \G
*************************** 1. row ***************************
Query Plan: ======================================
|ID|OPERATOR |NAME |EST. ROWS|COST|
--------------------------------------
|0 |TABLE SCAN|TBL2_RR|99 |53 |
======================================
Outputs & filters:
-------------------------------------
0 - output([TBL2_RR.COL1], [TBL2_RR.COL2]), filter(nil),
access([TBL2_RR.COL1], [TBL2_RR.COL2]), partitions(p0sp0)
1 row in set
In some scenarios, partition pruning may result in a certain degree of amplification, but the optimizer ensures that the result is a superset of the data to be accessed, without any data loss.