The partition pruning feature helps avoid accessing irrelevant partitions, which significantly improves SQL execution efficiency. This topic describes the principles and application of partition pruning.
When you access a partitioned table, you often need to access only some of the partitions. The process in which an optimizer eliminates access to irrelevant partitions is called partition pruning. Partition pruning is an important optimization technique for partitioned tables. It can significantly improve SQL execution efficiency. You can use the characteristics of partition pruning to add conditions in your query to avoid accessing irrelevant data and optimize query performance.
Partition pruning is a complex process whereby the optimizer extracts relevant partition information from the partition information of a table and the conditions specified in the SQL statement. Usually, the conditions in an SQL statement are complex, making the extraction logic more complex. This procedure is performed by the Query Range module of OceanBase Database.
Assume that you want to access data with col1 set to 1 and all data that meets this condition is located in partition 1 (p1). In this case, you need to access only p1 and do not need to access p0, p2, p3, or p4. Sample statement:
obclient> CREATE TABLE tbl1(col1 INT,col2 INT) PARTITION BY HASH(col1) PARTITIONS 5;
obclient> SELECT * FROM tbl1 WHERE col1 = 1;
Execute the EXPLAIN statement to query the execution plan and view the partition pruning result.
obclient> EXPLAIN SELECT * FROM tbl1 WHERE col1 = 1 \G
*************************** 1. row ***************************
Query Plan: ===================================
|ID|OPERATOR |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|TBL1|990 |383 |
===================================
Outputs & filters:
-------------------------------------
0 - output([TBL1.COL1], [TBL1.COL2]), filter(nil),
access([TBL1.COL1], [TBL1.COL2]), partitions(p1)
1 row in set
Principles of partition pruning
HASH or LIST partitioning
In partition pruning, the values in the columns of partitions are calculated based on conditions specified in the WHERE clause. The values are then used to determine which partitions to access. If the partitioning condition is an expression that can be used as a whole in an equation condition, partition pruning can also be performed.
In the following example of partition pruning, the partitioning condition is the expression c1 + c2 that is used as a whole in an equation condition.
obclient> CREATE TABLE t1(c1 INT,c2 INT) PARTITION BY HASH(c1 + c2) PARTITIONS 5;
obclient> EXPLAIN SELECT * FROM t1 WHERE c1 + c2 = 1 \G
*************************** 1. row ***************************
Query Plan: ===================================
|ID|OPERATOR |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1 |5 |1303|
===================================
Outputs & filters:
-------------------------------------
0 - output([t1.c1], [t1.c2]), filter([t1.c1 + t1.c2 = 1]),
access([t1.c1], [t1.c2]), partitions(p1)
RANGE partitioning
For a RANGE-partitioned table, the partitions to access are the intersection of the range defined by the partitioning key in the WHERE clause and the partition range defined by the table.
For RANGE partitioning:
If the partitioning condition is a column, partition pruning is supported regardless of whether the query condition is an equation.
If the partitioning condition is an expression and the query condition is an equation, partition pruning is supported.
In the following example, partition pruning is supported because the partitioning condition is the expression c1 and the query condition is c1 < 150 and c1 > 100 instead of an equation. Sample code:
obclient> CREATE TABLE t1(c1 INT,c2 INT) PARTITION BY RANGE(c1)
(PARTITION p0 VALUES LESS THAN(100),
PARTITION p1 VALUES LESS THAN(200)
);
Query OK, 0 rows affected
obclient> EXPLAIN SELECT * FROM t1 WHERE c1 < 150 and c1 > 110\G
*************************** 1. row ***************************
Query Plan: ===================================
|ID|OPERATOR |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1 |1 |46 |
===================================
Outputs & filters:
-------------------------------------
0 - output([t1.c1], [t1.c2]), filter([t1.c1 < 150], [t1.c1 > 110]),
access([t1.c1], [t1.c2]), partitions(p1)
1 row in set
If the partitioning condition is a time type function such as YEAR(), TO_DAYS(), or TO_SECONDS(), partition pruning based on the scope specified by the query condition is also supported.
Sample code:
obclient> CREATE TABLE tbl_r (log_id BIGINT NOT NULL,log_value VARCHAR(50),log_date datetime NOT NULL)
PARTITION BY RANGE(to_days(log_date))
(PARTITION M202001 VALUES LESS THAN(to_days('2020/02/01'))
, PARTITION M202002 VALUES LESS THAN(to_days('2020/03/01'))
, PARTITION M202003 VALUES LESS THAN(to_days('2020/04/01'))
, PARTITION M202004 VALUES LESS THAN(to_days('2020/05/01'))
, PARTITION M202005 VALUES LESS THAN(to_days('2020/06/01'))
, PARTITION M202006 VALUES LESS THAN(to_days('2020/07/01'))
, PARTITION M202007 VALUES LESS THAN(to_days('2020/08/01'))
, PARTITION M202008 VALUES LESS THAN(to_days('2020/09/01'))
, PARTITION M202009 VALUES LESS THAN(to_days('2020/10/01'))
, PARTITION M202010 VALUES LESS THAN(to_days('2020/11/01'))
, PARTITION M202011 VALUES LESS THAN(to_days('2020/12/01'))
, PARTITION M202012 VALUES LESS THAN(to_days('2021/01/01'))
);
Query OK, 0 rows affected
obclient> EXPLAIN SELECT * FROM tbl_r WHERE log_date > '2020/07/15' and log_date <'2020/10/07'\G
*************************** 1. row ***************************
Query Plan: ====================================================
|ID|OPERATOR |NAME |EST. ROWS|COST|
----------------------------------------------------
|0 |PX COORDINATOR | |1 |183 |
|1 | EXCHANGE OUT DISTR |:EX10000|1 |183 |
|2 | PX PARTITION ITERATOR| |1 |183 |
|3 | TABLE SCAN |tbl_r |1 |183 |
====================================================
Outputs & filters:
-------------------------------------
0 - output([INTERNAL_FUNCTION(tbl_r.log_id, tbl_r.log_value, tbl_r.log_date)]), filter(nil)
1 - output([INTERNAL_FUNCTION(tbl_r.log_id, tbl_r.log_value, tbl_r.log_date)]), filter(nil), dop=1
2 - output([tbl_r.log_date], [tbl_r.log_id], [tbl_r.log_value]), filter(nil)
3 - output([tbl_r.log_date], [tbl_r.log_id], [tbl_r.log_value]), filter([tbl_r.log_date > ?], [tbl_r.log_date < ?]),
access([tbl_r.log_date], [tbl_r.log_id], [tbl_r.log_value]), partitions(p[6-9])
1 row in set
Principles of subpartition pruning
In subpartition pruning, the partitions to access are first determined based on the partitioning key, and the subpartitions to access are determined based on the subpartitioning key. Then, the results are combined to determine all physical partitions to access.
In the following example, p0 is the partition pruning result, and sp0 is the subpartition pruning result. Therefore, the final physical partition to access is p0sp0.
obclient> CREATE TABLE tbl2_rr(col1 INT,col2 INT)
PARTITION BY RANGE(col1)
SUBPARTITION BY RANGE(col2)
SUBPARTITION TEMPLATE
(SUBPARTITION sp0 VALUES LESS THAN(1000),
SUBPARTITION sp1 VALUES LESS THAN(2000)
)
(PARTITION p0 VALUES LESS THAN(100),
PARTITION p1 VALUES LESS THAN(200)
) ;
Query OK, 0 rows affected
obclient> EXPLAIN SELECT * FROM tbl2_rr
WHERE (col1 = 1 or col1 = 2) and (col2 > 101 and col2 < 150) \G
*************************** 1. row ***************************
Query Plan: ======================================
|ID|OPERATOR |NAME |EST. ROWS|COST|
--------------------------------------
|0 |TABLE SCAN|TBL2_RR|99 |53 |
======================================
Outputs & filters:
-------------------------------------
0 - output([TBL2_RR.COL1], [TBL2_RR.COL2]), filter(nil),
access([TBL2_RR.COL1], [TBL2_RR.COL2]), partitions(p0sp0)
1 row in set
In some cases, the result set of partition pruning may be large, but the optimizer can ensure that this set is a superset of the data to be accessed and no data is lost.