Partitioning strategies
OceanBase Database provides multiple partitioning strategies that specify how data is distributed across partitions.
The basic partitioning strategies supported by OceanBase Database are RANGE, LIST, and HASH.
A single-level partitioning strategy uses only one data distribution method. For example, during single-level partitioning, you can use LIST partitioning or RANGE partitioning, but you cannot use them at the same time.
In composite partitioning, a table is partitioned by using a data distribution method and then subpartitioned by using another data distribution method. Assume that you have a table that contains the create_time and user_id columns. You can use RANGE partitioning on the create_time column and use HASH partitioning on the user_id column to create subpartitions.
RANGE partitioning
RANGE partitioning is the most commonly used partitioning method. It is often used when you want to query tables with dates. In RANGE partitioning, the database maps rows to partitions based on ranges of partition key values.
You can use only one column of the INT type as the partitioning key for RANGE partitioning.
If you want to use multiple columns or a column of other data types as the partitioning key, you can choose RANGE COLUMNS partitioning.
RANGE COLUMNS partitioning is similar to RANGE partitioning, but the two differ in the following aspects:
In RANGE COLUMNS partitioning, the partitioning keys support the
DOUBLE,FLOAT,DECIMAL, andTIMESTAMPdata types. Specific data types are listed as follows:DECIMALandDECIMAL[(M[,D])]DEC,NUMERIC, andFIXEDFLOAT[(M,D)]andFLOAT(p)DOUBLEandDOUBLE[(M,D)]DOUBLE PRECISIONandREALTIMESTAMP
Multiple columns or column vectors are accepted for RANGE COLUMNS partitioning.
Columns for RANGE COLUMNS partitioning are not restricted to integer columns. Instead, any types of columns are supported.
Expressions cannot be defined as the partitioning key for RANGE COLUMNS partitioning.
HASH partitioning and KEY partitioning
HASH partitioning
In HASH partitioning, the database maps rows to partitions based on a hashing algorithm that the database applies to the user-specified partitioning key.
The destination of a row is determined based on the number of HASH partitions calculated by using the internal hash function applied to the row. The hashing algorithm is designed to evenly distribute rows across partitions when the number of partitions is a power of two.
This algorithm ensures that each partition contains almost the same number of rows.
If you want to evenly distribute data across nodes, we recommend that you use the HASH partitioning method. HASH partitioning is also a simple alternative to RANGE partitioning, especially when you want to partition a table that contains non-historical data or does not have an obvious partitioning key.
You can also use HASH partitioning in online transaction processing (OLTP) systems with high update contention. HASH partitioning allows you to divide modifications to a single table into modifications to different partitions.
The values returned by the expression of a HASH partitioning key must be of the INT type.
KEY partitioning
KEY partitioning is similar to HASH partitioning. The two partitioning strategies differ in the following aspects:
The partitioning key for HASH partitioning can be a custom expression, whereas the partitioning key for KEY partitioning must be a column or left empty.
The partitioning key for KEY partitioning is not limited to the
INTtype.
During KEY partitioning, you can specify one or multiple columns as the partitioning key or specify no partitioning key. If the table where KEY partitioning is performed has the primary key, the one or more columns that are specified as the partitioning key must be part of or the entire primary key. If no partitioning key is specified during KEY partitioning, the primary key of the table functions as the partitioning key. If the table has no primary key but a unique key, the unique key is used as the partitioning key.
LIST partitioning
In LIST partitioning, the database uses a list of discrete values as the partitioning key for each partition. The partitioning key consists of one or more columns.
You can perform LIST partitioning to control how a single row is mapped to a specified partition. LIST partitioning allows you to group and organize related sets of data when the key used to identify them is not conveniently ordered.
LIST partitioning supports only a single partitioning key that may be a column or an expression. The partitioning key must be of the INT type.
If you want to use multiple columns or a column of other data types as the partitioning key, you can choose LIST COLUMNS partitioning.
LIST COLUMNS partitioning is a variant of LIST partitioning. During LIST COLUMNS partitioning, you can specify multiple partitioning keys. The partitioning keys can be of the following data types: INT, DATE, and DATETIME.
Composite partitioning
You can use the RANGE, LIST, and HASH partitioning methods to divide a partition into subpartitions.
In composite partitioning, a table is partitioned by using a data distribution method and then subpartitioned by using another data distribution method. Composite partitioning is a combination of two basic partitioning methods. All subpartitions of a specified partition represent a logical subset of data in the partition.
Composite partitioning provides the following benefits:
You can perform partition pruning on one or two dimensions based on the SQL statement to improve query performance.
You can perform partial partition-wise and full partition-wise joins for queries on either dimension.
You can perform parallel backup and restore on a table.
Compared with single-level partitioning, composite partitioning generates more partitions. This can be beneficial for the efficient parallel execution of queries.
You can implement a rolling window to support historical data and still partition on another dimension if many statements can benefit from partition pruning or partition-wise joins.
You can store data based on the partitioning key. For example, you can store data for a specific product type in a read-only and compressed format, and keep data of other product types uncompressed.
The following figure shows a graphical view of the range-hash and range-list composite partitioning methods.
