Overview|V4.1.0| docs|Distributed Database

Overview

Last Updated：2023-07-28 02:55:42 Updated

This topic describes the basic concepts and types of partitioning in OceanBase Database.

Basic concepts

OceanBase Database can split the data of a table into different groups based on some rules. Data in the same group is stored in the same physical area. This area is called a partition. A table split in this way is called a partitioned table. In MySQL mode of OceanBase Database, you can create at most 8,192 partitions in a single table.

Each partition can be divided into subpartitions based on some rules. A table that contains both partitions and subpartitions is called a subpartitioned table.

In a table row, a column determines which partition the row belongs to. The set of data in this column is called the partitioning key. The partitioning key must be a subset of a primary key or a unique key. An expression contains the partitioning key and determines which partitions the rows belong to. This expression is called a partitioning expression.

Partitioning types

OceanBase Database in MySQL mode supports the following partitioning types:

RANGE partitioning
RANGE COLUMNS partitioning
LIST partitioning
LIST COLUMNS partitioning
HASH partitioning
KEY partitioning
Composite partitioning

RANGE partitioning

RANGE partitioning maps data to partitions based on ranges of partitioning key values that you set up for each partition when you define the partitioned table. It is the most common partitioning type and is often used with dates. For example, you can partition business log tables by day, week, or month.

RANGE COLUMNS partitioning

RANGE COLUMNS partitioning is similar to RANGE partitioning, but they are different in the following aspects:

The results of RANGE COLUMNS partitioning do not have to be integer columns and all column types are accepted.
You cannot use an expression as the partitioning key for RANGE COLUMNS partitioning.
The partitioning key for RANGE COLUMNS partitioning can contain multiple columns (column vectors).

LIST partitioning

Unlike RANGE partitioning and HASH partitioning, LIST partitioning enables you to explicitly control how rows map to partitions by specifying a list of discrete values for the partitioning key in the description for each partition. The advantage of LIST partitioning is that you can partition unordered and unrelated data.

LIST COLUMNS partitioning

LIST COLUMNS partitioning is similar to LIST partitioning, but they are different in the following aspects:

The partitioning key for LIST COLUMNS partitioning does not have to be of the integer type. It can be of any data type.
The partitioning key for LIST COLUMNS partitioning can contain multiple columns (column vectors).

HASH partitioning

HASH partitioning applies to scenarios where RANGE partitioning or LIST partitioning cannot be used. HASH partitioning enables easy partitioning of data by distributing records over partitions based on a HASH function on the partitioning key. HASH partitioning is a better choice in the following cases:

You cannot identify an obvious partitioning key for the data.
The sizes of range partitions differ substantially or are difficult to balance manually.
RANGE partitioning can cause the data to be undesirably clustered.
Performance features such as parallel DML, partition pruning, and partition-wise joins are important.

KEY partitioning

KEY partitioning is similar to HASH partitioning. They both use the modulus operation to determine which partition the requested data belongs to. Their difference is that in KEY partitioning, the system applies an internal default function on the partitioning key before the modulus operation.

KEY partitioning has the following characteristics:

The partitioning key does not have to be of the integer type. It can be of any type.
You cannot use an expression as the partitioning key.
You can use a vector as the partitioning key.

If you do not specify any column as the partitioning key, the primary key is used as the partitioning key.

Sample code:

obclient> CREATE TABLE tbl1 (col1 INT PRIMARY KEY, col2 INT)
       PARTITION BY KEY()
       PARTITIONS 5;
Query OK, 0 rows affected

Composite partitioning

Composite partitioning partitions a table using one partitioning strategy and partitions each partition using a different partitioning strategy. It is suitable for business tables containing large amounts of data. Composite partitioning gives full play to the advantages of the two partitioning strategies that you use in combination.

Considerations

Note the following items if you use auto-increment columns as the partitioning key in MySQL mode:

In OceanBase Database, each value in an auto-increment column is globally unique but may not be incremental within a partition.
For a partitioned table that uses auto-increment columns as the partitioning key, insert operations may initiate cross-server transactions due to the lack of efficient routing. This degrades the performance.

Partition naming conventions

For a LIST- or RANGE-partitioned table, the partition names are specified when the table is created.

For a HASH- or KEY-partitioned table, if no partition names are specified when the table is created, the system generates the names based on the following naming conventions: The specific process is as follows:

If HASH or KEY is used as the partitioning strategy, the partitions are named as p0, p1, ..., and pn.
If HASH or KEY is used as the subpartitioning strategy, the subpartitions are named as sp0, sp1, ..., and spn.

For a template-based subpartitioned table, after you define subpartitions, the system names the subpartitions in the ($part_name)s($subpart_name) format. For a non-template-based subpartitioned table, the subpartitions are named by yourself.