Partitioning strategies |V3.2.4|OceanBase Database| docs|Distributed Database

Partitioning strategies

Last Updated：2023-10-24 09:23:04 Updated

Overview

OceanBase Database provides multiple partitioning strategies that control how to distribute data of a database into different partitions.

The basic partitioning strategies are RANGE partitioning, LIST partitioning, and HASH partitioning.

A partition supports only one data partitioning strategy. For example, you can choose only LIST partitioning or RANGE partitioning.

If subpartitions are required, a table is first partitioned based on one partitioning strategy. Each partition is then further partitioned into subpartitions based on another partitioning strategy. For example, a table contains the create_time column and user_id column. You can perform RANGE partitioning on the create_time column and then perform HASH partitioning on the user_id column to obtain subpartitions.

RANGE partitioning

RANGE partitioning is the most common partitioning strategy, which is often used when date columns are involved. The database maps rows to partitions based on the value range of the partitioning key during RANGE partitioning.

You can use only one column of the INT type as the partitioning key for RANGE partitioning.

If you want to use multiple columns or a column of other data types as the partitioning key, you can choose RANGE COLUMNS partitioning.

RANGE COLUMNS partitioning is similar to RANGE partitioning, but the two differ in the following aspects:

Columns for RANGE COLUMNS partitioning are not restricted to integer columns. Instead, any types of columns are supported.
Expressions are not accepted for RANGE COLUMNS partitioning.
Multiple columns or column vectors are accepted for RANGE COLUMNS partitioning.

HASH partitioning and KEY partitioning

HASH partitioning

During HASH partitioning, the database maps rows to partitions based on the hash algorithm that is applied to the specified partitioning key.

The destination partition to which a row is mapped is determined by the number of hash partitions and the hash value that is calculated by the internal hash function. If the number of partitions is a power of 2, the rows are evenly mapped to all partitions based on the hash algorithm.

Therefore, the partitions are almost the same size.

HASH partitioning is an ideal method to evenly distribute data among nodes. HASH partitioning is also a simple alternative to RANGE partitioning, especially when you want to partition a table that contains non-historical data or does not have an obvious partitioning key.

HASH partitioning is especially suitable for online transaction processing (OLTP) systems in which update conflicts may occur on a regular basis. This is because HASH partitioning partitions a table and the modifications to the table are separately performed in different partitions instead of the entire table.

The values returned by the expression of a HASH partitioning key must be of the INT type.

KEY partitioning

KEY partitioning is similar to HASH partitioning. The two partitioning strategies differ in the following aspects:

The partitioning key for HASH partitioning can be a custom expression, whereas the partitioning key for KEY partitioning must be a column or left empty.
The partitioning key for KEY partitioning is not limited to the INT type.

During KEY partitioning, you can specify one or multiple columns as the partitioning key or specify no partitioning key. If the table where KEY partitioning is performed has the primary key, the one or more columns that are specified as the partitioning key must be part of or the entire primary key. If no partitioning key is specified during KEY partitioning, the primary key of the table functions as the partitioning key. If the table has no primary key but a unique key, the unique key is used as the partitioning key.

LIST partitioning

During LIST partitioning, the database uses a list of discrete values as the partitioning key of each partition. A partitioning key consists of one or more columns.

You can perform LIST partitioning to control how a single row is mapped to a specified partition. If the partitioning key is not suitable for sorting data, you can perform LIST partitioning to group and manage data.

LIST partitioning supports only a single partitioning key that may be a column or an expression. The partitioning key must be of the INT type.

If you want to use multiple columns or a column of other data types as the partitioning key, you can choose LIST COLUMNS partitioning.

LIST COLUMNS partitioning is a variant of LIST partitioning. During LIST COLUMNS partitioning, you can specify multiple partitioning keys. The partitioning keys can be of the following data types: INT, DATE, and DATETIME.

Composite partitioning

RANGE partitioning, LIST partitioning, and HASH partitioning can be used as subpartitioning strategies for composite partitioned tables.

During composite partitioning, a table is first partitioned based on one partitioning strategy. Then, each partition is further partitioned into subpartitions based on another partitioning strategy. Therefore, composite partitioning is a combination of basic data partitioning strategies. All subpartitions of a specified partition are logical subsets of data.

Composite partitioning provides the following benefits:

You can implement partition pruning along one or two dimensions based on SQL statements. This may improve the performance.
You can use full partition-wise joins or partial partition-wise joins on any dimension.
You can back up and restore a single table in parallel.
More partitions are obtained after composite partitioning than after single-level partitioning, which may facilitate parallel operations.
You can implement a rolling window to support historical data. If a lot of statements can benefit from partition pruning or partition-wise joins, you can still perform partitioning on another dimension.
You can store data in different ways based on the identifier of a partitioning key. For example, you may decide to store data for a specific product type in a read-only, compressed format, and store data for other product types in an uncompressed format.

The following figure shows composite RANGE-HASH partitioning and composite RANGE-LIST partitioning.

Partitioning strategies