The partition pruning feature can avoid accessing irrelevant partitions, significantly improving SQL execution efficiency. This topic describes the principles and applications of partition pruning.
When you access a partitioned table, you often need to access only some of its partitions. The process of avoiding irrelevant partitions through the optimizer is called partition pruning. Partition pruning is an important optimization method for partitioned tables, significantly improving SQL execution efficiency. You can use the partition pruning feature to add conditions for locating partitions and avoid accessing irrelevant data, optimizing query performance.
Partition pruning is a complex process. The optimizer needs to extract relevant partition information based on the partition information of the user table and the conditions specified in the SQL statement. The complexity of the SQL conditions increases the complexity of the entire extraction logic, which is completed by the Query Range sub-module in OceanBase Database.
When you access a partitioned table, all data where col1 is 1 is in partition p1. Therefore, you only need to access p1 and not partitions 0, 2, 3, or 4. The following code shows this:
obclient> CREATE TABLE tbl1(col1 INT,col2 INT) PARTITION BY HASH(col1) PARTITIONS 5;
obclient> SELECT * FROM tbl1 WHERE col1 = 1;
You can use EXPLAIN to view the execution plan and see the results of partition pruning:
obclient> EXPLAIN SELECT * FROM tbl1 WHERE col1 = 1 \G
*************************** 1. row ***************************
Query Plan: ===================================
|ID|OPERATOR |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|TBL1|990 |383 |
===================================
Outputs & filters:
-------------------------------------
0 - output([TBL1.COL1], [TBL1.COL2]), filter(nil),
access([TBL1.COL1], [TBL1.COL2]), partitions(p1)
1 row in set
Principles of partition pruning
Hash and list partitioning
Partition pruning is based on the conditions in the where clause and calculates the values of the partitioning columns to determine which partitions need to be accessed. If the partitioning condition is an expression that appears as a whole in an equality condition, partition pruning can also be performed.
For example, if the partitioning condition is the expression c1 + c2, which appears as a whole in an equality condition, partition pruning can be performed. The following code shows this:
obclient> CREATE TABLE t1(c1 INT,c2 INT) PARTITION BY HASH(c1 + c2) PARTITIONS 5;
obclient> EXPLAIN SELECT * FROM t1 WHERE c1 + c2 = 1 \G
*************************** 1. row ***************************
Query Plan: ===================================
|ID|OPERATOR |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1 |5 |1303|
===================================
Outputs & filters:
-------------------------------------
0 - output([t1.c1], [t1.c2]), filter([t1.c1 + t1.c2 = 1]),
access([t1.c1], [t1.c2]), partitions(p1)
Range partitioning
You can determine the partitions to access based on the intersection of the range of the partitioning key in the where clause and the range of the defined partitions.
For range-partitioned tables:
When the partitioning condition is a column, partition pruning is supported whether the query condition is an equality condition or an inequality condition.
When the partitioning condition is an expression and the query condition is an equality condition, partition pruning is supported.
For example, if the partitioning condition is the expression c1 and the query condition is an inequality condition c1 < 150 and c1 > 100, partition pruning can be performed. The following code shows this:
obclient> CREATE TABLE t1(c1 INT,c2 INT) PARTITION BY RANGE(c1)
(PARTITION p0 VALUES LESS THAN(100),
PARTITION p1 VALUES LESS THAN(200)
);
Query OK, 0 rows affected
obclient> EXPLAIN SELECT * FROM t1 WHERE c1 < 150 and c1 > 110\G
*************************** 1. row ***************************
Query Plan: ===================================
|ID|OPERATOR |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1 |1 |46 |
===================================
Outputs & filters:
-------------------------------------
0 - output([t1.c1], [t1.c2]), filter([t1.c1 < 150], [t1.c1 > 110]),
access([t1.c1], [t1.c2]), partitions(p1)
1 row in set
Additionally, if the partitioning condition expression uses time-related functions such as YEAR(), TO_DAYS(), or TO_SECONDS(), partition pruning is also supported based on the range specified in the query condition.
The following code shows an example:
obclient> CREATE TABLE tbl_r (log_id BIGINT NOT NULL,log_value VARCHAR(50),log_date datetime NOT NULL)
PARTITION BY RANGE(to_days(log_date))
(PARTITION M202001 VALUES LESS THAN(to_days('2020/02/01'))
, PARTITION M202002 VALUES LESS THAN(to_days('2020/03/01'))
, PARTITION M202003 VALUES LESS THAN(to_days('2020/04/01'))
, PARTITION M202004 VALUES LESS THAN(to_days('2020/05/01'))
, PARTITION M202005 VALUES LESS THAN(to_days('2020/06/01'))
, PARTITION M202006 VALUES LESS THAN(to_days('2020/07/01'))
, PARTITION M202007 VALUES LESS THAN(to_days('2020/08/01'))
, PARTITION M202008 VALUES LESS THAN(to_days('2020/09/01'))
, PARTITION M202009 VALUES LESS THAN(to_days('2020/10/01'))
, PARTITION M202010 VALUES LESS THAN(to_days('2020/11/01'))
, PARTITION M202011 VALUES LESS THAN(to_days('2020/12/01'))
, PARTITION M202012 VALUES LESS THAN(to_days('2021/01/01'))
);
Query OK, 0 rows affected
obclient> EXPLAIN SELECT * FROM tbl_r WHERE log_date > '2020/07/15' and log_date <'2020/10/07'\G
*************************** 1. row ***************************
Query Plan: ====================================================
|ID|OPERATOR |NAME |EST. ROWS|COST|
----------------------------------------------------
|0 |PX COORDINATOR | |1 |183 |
|1 | EXCHANGE OUT DISTR |:EX10000|1 |183 |
|2 | PX PARTITION ITERATOR| |1 |183 |
|3 | TABLE SCAN |tbl_r |1 |183 |
====================================================
Outputs & filters:
-------------------------------------
0 - output([INTERNAL_FUNCTION(tbl_r.log_id, tbl_r.log_value, tbl_r.log_date)]), filter(nil)
1 - output([INTERNAL_FUNCTION(tbl_r.log_id, tbl_r.log_value, tbl_r.log_date)]), filter(nil), dop=1
2 - output([tbl_r.log_date], [tbl_r.log_id], [tbl_r.log_value]), filter(nil)
3 - output([tbl_r.log_date], [tbl_r.log_id], [tbl_r.log_value]), filter([tbl_r.log_date > ?], [tbl_r.log_date < ?]),
access([tbl_r.log_date], [tbl_r.log_id], [tbl_r.log_value]), partitions(p[6-9])
1 row in set
Principles of subpartition pruning
For subpartitions, first determine the partitions to access based on the primary partitioning key, and then determine the subpartitions to access based on the subpartitioning key. Finally, determine all the physical subpartitions to access based on the product of the partitions and subpartitions.
In the following example, after calculations, the primary partition pruning result is p0, and the subpartition pruning result is sp0. Therefore, the physical partition to access is p0sp0.
Notice
In this example, the physical partition identifier to access is p0sp0, which is not the name of the subpartition. For a templated subpartitioned table, the naming convention for subpartitions is ($part_name)s($subpart_name), such as p0ssp0.
obclient> CREATE TABLE tbl2_rr(col1 INT,col2 INT)
PARTITION BY RANGE(col1)
SUBPARTITION BY RANGE(col2)
SUBPARTITION TEMPLATE
(SUBPARTITION sp0 VALUES LESS THAN(1000),
SUBPARTITION sp1 VALUES LESS THAN(2000)
)
(PARTITION p0 VALUES LESS THAN(100),
PARTITION p1 VALUES LESS THAN(200)
) ;
Query OK, 0 rows affected
obclient> EXPLAIN SELECT * FROM tbl2_rr
WHERE (col1 = 1 or col1 = 2) and (col2 > 101 and col2 < 150) \G
*************************** 1. row ***************************
Query Plan: ======================================
|ID|OPERATOR |NAME |EST. ROWS|COST|
--------------------------------------
|0 |TABLE SCAN|TBL2_RR|99 |53 |
======================================
Outputs & filters:
-------------------------------------
0 - output([TBL2_RR.COL1], [TBL2_RR.COL2]), filter(nil),
access([TBL2_RR.COL1], [TBL2_RR.COL2]), partitions(p0sp0)
1 row in set
In some scenarios, partition pruning may result in a certain degree of amplification, but the optimizer ensures that the pruning result is a superset of the data to be accessed, without any data loss.