Partition pruning is a process that avoids accessing irrelevant partitions and significantly improves the execution efficiency of SQL statements. This topic describes the principles and applications of partition pruning.
When you access a partitioned table, you often need to access only some of its partitions. The process of avoiding accessing irrelevant partitions by the optimizer is called partition pruning. Partition pruning is an important optimization technique for partitioned tables. It significantly improves the execution efficiency of SQL statements. You can use the partition pruning feature to add conditions that locate specific partitions and avoid accessing irrelevant data, thereby optimizing query performance.
Partition pruning is a complex process. The optimizer needs to extract relevant partition information based on the partition information of the user table and the conditions specified in the SQL statement. The complexity of the SQL conditions increases the complexity of the entire extraction logic, which is completed by the Query Range sub-module in OceanBase Database.
When you access a partitioned table, all data where col1 is 1 is stored in partition p1. Therefore, you only need to access partition p1, and do not need to access partitions p0, p2, p3, or p4. The following code shows an example:
obclient> CREATE TABLE tbl1(col1 INT,col2 INT) PARTITION BY HASH(col1) PARTITIONS 5;
obclient> SELECT * FROM tbl1 WHERE col1 = 1;
You can run the EXPLAIN statement to view the execution plan and see the result of partition pruning:
obclient> EXPLAIN SELECT * FROM tbl1 WHERE col1 = 1 \G
*************************** 1. row ***************************
Query Plan: ===================================
|ID|OPERATOR |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|TBL1|990 |383 |
===================================
Outputs & filters:
-------------------------------------
0 - output([TBL1.COL1], [TBL1.COL2]), filter(nil),
access([TBL1.COL1], [TBL1.COL2]), partitions(p1)
1 row in set
Principles of partition pruning
HASH or LIST partitioning
Partition pruning is based on the conditions in the WHERE clause and the calculated values of the partitioning columns. It determines which partitions need to be accessed based on the results. If the partitioning condition is an expression that appears as a whole in an equality condition, partition pruning is also supported.
The following example shows partition pruning when the partitioning condition is the expression c1 + c2, which appears as a whole in an equality condition:
obclient> CREATE TABLE t1(c1 INT,c2 INT) PARTITION BY HASH(c1 + c2) PARTITIONS 5;
obclient> EXPLAIN SELECT * FROM t1 WHERE c1 + c2 = 1 \G
*************************** 1. row ***************************
Query Plan: ===================================
|ID|OPERATOR |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1 |5 |1303|
===================================
Outputs & filters:
-------------------------------------
0 - output([t1.c1], [t1.c2]), filter([t1.c1 + t1.c2 = 1]),
access([t1.c1], [t1.c2]), partitions(p1)
RANGE partitioning
You can determine the partitions to be accessed based on the intersection of the partitioning key range in the WHERE clause and the partition range defined in the table.
For RANGE partitioned tables:
If the partitioning condition is a column, partition pruning is supported regardless of whether the query condition is an equality condition or an inequality condition.
If the partitioning condition is an expression and the query condition is an equality condition, partition pruning is supported.
For example, if the partitioning condition is the expression c1 and the query condition is an inequality condition c1 < 150 and c1 > 100, partition pruning is supported. The following code shows an example:
obclient> CREATE TABLE t1(c1 INT,c2 INT) PARTITION BY RANGE(c1)
(PARTITION p0 VALUES LESS THAN(100),
PARTITION p1 VALUES LESS THAN(200)
);
Query OK, 0 rows affected
obclient> EXPLAIN SELECT * FROM t1 WHERE c1 < 150 and c1 > 110\G
*************************** 1. row ***************************
Query Plan: ===================================
|ID|OPERATOR |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1 |1 |46 |
===================================
Outputs & filters:
-------------------------------------
0 - output([t1.c1], [t1.c2]), filter([t1.c1 < 150], [t1.c1 > 110]),
access([t1.c1], [t1.c2]), partitions(p1)
1 row in set
Additionally, if the partitioning condition expression uses time functions such as YEAR(), TO_DAYS(), or TO_SECONDS(), partition pruning is supported based on the range specified in the query condition.
The following code shows an example:
obclient> CREATE TABLE tbl_r (log_id BIGINT NOT NULL,log_value VARCHAR(50),log_date datetime NOT NULL)
PARTITION BY RANGE(to_days(log_date))
(PARTITION M202001 VALUES LESS THAN(to_days('2020/02/01'))
, PARTITION M202002 VALUES LESS THAN(to_days('2020/03/01'))
, PARTITION M202003 VALUES LESS THAN(to_days('2020/04/01'))
, PARTITION M202004 VALUES LESS THAN(to_days('2020/05/01'))
, PARTITION M202005 VALUES LESS THAN(to_days('2020/06/01'))
, PARTITION M202006 VALUES LESS THAN(to_days('2020/07/01'))
, PARTITION M202007 VALUES LESS THAN(to_days('2020/08/01'))
, PARTITION M202008 VALUES LESS THAN(to_days('2020/09/01'))
, PARTITION M202009 VALUES LESS THAN(to_days('2020/10/01'))
, PARTITION M202010 VALUES LESS THAN(to_days('2020/11/01'))
, PARTITION M202011 VALUES LESS THAN(to_days('2020/12/01'))
, PARTITION M202012 VALUES LESS THAN(to_days('2021/01/01'))
);
Query OK, 0 rows affected
obclient> EXPLAIN SELECT * FROM tbl_r WHERE log_date > '2020/07/15' and log_date <'2020/10/07'\G
*************************** 1. row ***************************
Query Plan: ====================================================
|ID|OPERATOR |NAME |EST. ROWS|COST|
----------------------------------------------------
|0 |PX COORDINATOR | |1 |183 |
|1 | EXCHANGE OUT DISTR |:EX10000|1 |183 |
|2 | PX PARTITION ITERATOR| |1 |183 |
|3 | TABLE SCAN |tbl_r |1 |183 |
====================================================
Outputs & filters:
-------------------------------------
0 - output([INTERNAL_FUNCTION(tbl_r.log_id, tbl_r.log_value, tbl_r.log_date)]), filter(nil)
1 - output([INTERNAL_FUNCTION(tbl_r.log_id, tbl_r.log_value, tbl_r.log_date)]), filter(nil), dop=1
2 - output([tbl_r.log_date], [tbl_r.log_id], [tbl_r.log_value]), filter(nil)
3 - output([tbl_r.log_date], [tbl_r.log_id], [tbl_r.log_value]), filter([tbl_r.log_date > ?], [tbl_r.log_date < ?]),
access([tbl_r.log_date], [tbl_r.log_id], [tbl_r.log_value]), partitions(p[6-9])
1 row in set
Principles of sub-partition pruning
For sub-partitions, first determine the partitions to be accessed based on the primary partitioning key, and then determine the sub-partitions to be accessed based on the sub-partitioning key. Finally, determine all physical sub-partitions to be accessed based on the product of the partitions and sub-partitions.
In the following example, the primary partition pruning result is p0, and the sub-partition pruning result is sp0. Therefore, the physical partition to be accessed is p0sp0.
Notice
In this example, the physical partition identifier to be accessed is p0sp0. This identifier is not the name of the sub-partition. For a templated sub-partitioned table, the naming rule for sub-partitions is ($part_name)s($subpart_name), such as p0ssp0.
obclient> CREATE TABLE tbl2_rr(col1 INT,col2 INT)
PARTITION BY RANGE(col1)
SUBPARTITION BY RANGE(col2)
SUBPARTITION TEMPLATE
(SUBPARTITION sp0 VALUES LESS THAN(1000),
SUBPARTITION sp1 VALUES LESS THAN(2000)
)
(PARTITION p0 VALUES LESS THAN(100),
PARTITION p1 VALUES LESS THAN(200)
) ;
Query OK, 0 rows affected
obclient> EXPLAIN SELECT * FROM tbl2_rr
WHERE (col1 = 1 or col1 = 2) and (col2 > 101 and col2 < 150) \G
*************************** 1. row ***************************
Query Plan: ======================================
|ID|OPERATOR |NAME |EST. ROWS|COST|
--------------------------------------
|0 |TABLE SCAN|TBL2_RR|99 |53 |
======================================
Outputs & filters:
-------------------------------------
0 - output([TBL2_RR.COL1], [TBL2_RR.COL2]), filter(nil),
access([TBL2_RR.COL1], [TBL2_RR.COL2]), partitions(p0sp0)
1 row in set
In some scenarios, partition pruning may result in a certain degree of amplification. However, the optimizer ensures that the pruning result is a superset of the data to be accessed and does not result in data loss.