Partition pruning|V2.2.77|OceanBase Database| docs|Distributed Database

Partition pruning

Last Updated：2023-08-18 09:26:34 Updated

When you access a partitioned table, you often only need to access some of the partitions. The optimizer eliminates access to irrelevant partitions. This process is called partition pruning. Partition pruning is an important optimization technique for partitioned tables. It can vastly increase the execution efficiency of SQL statements. You can use the characteristics of partition pruning to add conditions in your query to avoid accessing irrelevant data and optimize query performance.

Partition pruning is a complex process whereby the optimizer extracts relevant partition information from the partition information of a table and the conditions specified in the SQL statement. Usually, the conditions in an SQL statement are complex, making the extraction logic more complex. This procedure is performed by the Query Range module of OceanBase Database.

In following example, all the data selected (c1=1) are in the first partition (p1). Therefore, only p1 must be accessed. The other partitions (p0, p2, p3, and p4) are not accessed.

obclient> CREATE TABLE t1
(
  c1 INT,
  c2 INT
)
PARTITION BY HASH(c1) partitions 5;

obclient> SELECT * FROM t1 WHERE c1 = 1;

To view the result of partition pruning, execute the EXPLAIN command to query the execution plan:

obclient> EXPLAIN SELECT * FROM t1 WHERE c1 = 1 \G
*************************** 1. row ***************************
Query Plan: ===================================
|ID|OPERATOR  |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1  |1        |1303|
===================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2]), filter([t1.c1 = 1]),
      access([t1.c1], [t1.c2]), partitions(p1)

Principles of partition pruning

HASH or LIST partitioning

In partition pruning, the values in the columns of partitions are calculated based on conditions specified in the WHERE clause. The values are then used to determine which partitions to access. If a partitioning key is an expression that can be used as a whole in an equation in the WHERE clause, partition pruning can also be performed.

Example:

obclient> CREATE TABLE t1
(
  c1 INT,
  c2 INT
)
PARTITION BY HASH(c1 + c2) partitions 5;

obclient> EXPLAIN SELECT * FROM t1 WHERE c1 + c2 = 1 \G
*************************** 1. row ***************************
Query Plan: ===================================
|ID|OPERATOR  |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1  |5        |1303|
===================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2]), filter([t1.c1 + t1.c2 = 1]),
      access([t1.c1], [t1.c2]), partitions(p1)

RANGE partitioning

For a RANGE-partitioned table, the partitions to access are the intersection of the range defined by the partitioning key in the WHERE clause and the partition range defined by the table. For RANGE partitioning, because of the monotonicity of functions, if the partitioning expression is a function and the WHERE clause specifies a range, then partition pruning is not supported.

In the following example, partition pruning is not supported because the partitioning expression is a function and the condition that the WHERE clause specifies is not an equation (c1 < 150 and c1 > 100).

obclient> CREATE TABLE t1
(
  c1 INT,
  c2 INT
)
PARTITION BY RANGE(c1 + 1)
(
  PARTITION p0 VALUES less than (100),
  PARTITION p1 VALUES less than (200)
);

obclient> EXPLAIN SELECT * FROM t1 WHERE c1 < 150 and c1 > 110 \G
*************************** 1. row ***************************
Query Plan: ============================================
|ID|OPERATOR           |NAME|EST. ROWS|COST|
--------------------------------------------
|0 |EXCHANGE IN DISTR  |    |19       |1410|
|1 | EXCHANGE OUT DISTR|    |19       |1303|
|2 |  TABLE SCAN       |t1  |19       |1303|
============================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2]), filter(nil)
  1 - output([t1.c1], [t1.c2]), filter(nil)
  2 - output([t1.c1], [t1.c2]), filter([t1.c1 < 150], [t1.c1 > 110]),
      access([t1.c1], [t1.c2]), partitions(p[0-1])

If the condition that the WHERE clause specifies is an equation, then partition pruning is supported. Example:

obclient> EXPLAIN SELECT * FROM t1 WHERE c1 = 150  \G
*************************** 1. row ***************************
Query Plan: ===================================
|ID|OPERATOR  |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1  |1        |1303|
===================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2]), filter([t1.c1 = 150]),
      access([t1.c1], [t1.c2]), partitions(p1)

Principles of subpartition pruning

In subpartition pruning, the partitions to access is first determined based on the partitioning key, and the subpartitions to access are determined based on the subpartitioning key. Then, the results are combined to determine all the physical partitions to access.

In the following example, p1 and p2 are the result of partition pruning, sp1 is the result of subpartition pruning. Therefore, the final physical partition to access is p1sp1 and p2sp1.

obclient> CREATE TABLE t1
(
  c1 INT ,
  c2 INT
)
PARTITION BY hash(c1)
SUBPARTITION BY RANGE(c2)
SUBPARTITION template
(
  SUBPARTITION sp0 VALUES less than(100),
  SUBPARTITION sp1 VALUES less than(200)
) partitions 5
SELECT * FROM T1
WHERE
(c1 = 1 OR c1 = 2) AND
(c2 > 101 AND c2 < 150)

obclient> EXPLAIN SELECT * FROM t1 WHERE (c1 = 1 or c1 = 2) and (c2 > 101 and c2 < 150) \G
*************************** 1. row ***************************
Query Plan: ============================================
|ID|OPERATOR           |NAME|EST. ROWS|COST|
--------------------------------------------
|0 |EXCHANGE IN DISTR  |     |1        |1403|
|1 |EXCHANGE OUT DISTR|    |1        |1303|
|2 |         TABLE SCAN       |t1  |1        |1303|
============================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2]), filter(nil)
  1 - output([t1.c1], [t1.c2]), filter(nil)
  2 - output([t1.c1], [t1.c2]), filter([t1.c1 = 1 OR t1.c1 = 2], [t1.c2 > 101], [t1.c2 < 150]),
      access([t1.c1], [t1.c2]), partitions(p1sp1, p2sp1)

In some cases, the result set of partition pruning may be large, but the optimizer can ensure that this set is a superset of the data to be accessed and no data is lost.

OceanBase

Customer Stories

Documentation

Partition pruning

Principles of partition pruning

Principles of subpartition pruning