Data distribution|V2.2.77|OceanBase Database| docs|Distributed Database

Data distribution

Last Updated：2023-08-18 09:26:34 Updated

In terms of data distribution, OceanBase Database is the implementation of partitioned database tables in a distributed system with syntax and usage that are compatible with conventional partitioned tables. When the storage capacity and service capabilities become insufficient, run an OceanBase administrative command to add more partitions.

Distributed table partitions

Common partitioning techniques supported by conventional database systems include hash partitioning, range partitioning, and list partitioning. They also support sub-partitioning using a combination of techniques. OceanBase Database also employs table partitioning, with the difference that partitions may be evenly distributed across many nodes in a database system. Based on a distributed model for partitioned tables, OceanBase Database provides horizontal scalability in terms of data storage and processing capacity. You can use this distributed database without being aware of the backend distributed architecture, as if it is a standalone database with a complete global index and global constraints.

Partition types

OceanBase Database supports a variety of partitioning policies when splitting data into multiple partitions based on application requirements. It supports hash partitioning, range partitioning, and list partitioning. OceanBase Database also supports both single-layered partitioning and sub-partitioning. In single-layered partitioning, users may specify a set of partitioning keys based on the partitioning strategy to split data based on a single dimension. An additional partitioning strategy and set of corresponding partitioning keys may be chosen to split the data on each layer one partition again to perform double-layered partitioning.

If a primary key is specified for a table, OceanBase Database demands that the partitioning keys must be the primary key columns. This is not required if the primary key is not set.

Hash partitioning

Users need to specify the partitioning keys and the number of partitions. A hash partitioning expression is applied on each data row to obtain an integer result. This result is then divided by the number of partitions. The remainder of the division determines which partition the data row belongs to. Hash partitioning is useful for distinct value queries on the partitioning keys, such as partitioning based on user ID. Hash partitioning may be used to eliminate hotspots in queries.

The following example splits table t1 into 5 partitions using hash partitioning, with column c1 being the partitioning key.

create table t1  (c1 int, c2 int) partition by hash(c1) partitions 5

Range partitioning

Range partitioning uses the range in the partitioning expressions. It is useful when you need to perform range queries on the partitioning keys. For example, range partitioning may be done based on a time field or price ranges.

The following example creates a table t2 and performs range partitioning using column c1 as the key. The table is split into 3 partitions p0, p1, and p2, with (min, 100), [100, 500), [500, max) being their respective range.

CREATE TABLE t2  (c1 NUMBER, c2 NUMBER) PARTITION BY RANGE(c1) (
PARTITION p0 VALUES LESS THAN(100), 
PARTITION p1 VALUES LESS THAN(500), 
PARTITION p2 VALUES LESS THAN(MAXVALUE)
);

List partitioning

List partitioning is done based on the values of an enumerated type. It is useful for enumerated types.

The following example creates a table t3 and performs list partitioning using column c1 as the key. If a row's c1 = 1 or c1 = 2 or c1 = 3, it belongs to partition p0. If a row's c1 = 5 or c1 = 6, it belongs to partition p1. All other rows belong to p2.

CREATE TABLE t3  (c1 NUMBER, c2 NUMBER) PARTITION BY RANGE(c1) (
PARTITION p0 VALUES IN (1,2,3), 
PARTITION p1 VALUES IN (5, 6), 
PARTITION p2 VALUES IN (DEFAULT)
);

Subpartitioning

Subpartitioning splits data based on two dimensions.

The most common example is user bills. Hash partitioning is first performed using user_id, followed by range partitioning on bill creation time. Possible combinations in sub-partitioning are hash + hash, hash + range, hash + list, range + hash, range + range, range + list, list + hash, list + range, and list + list.