Overview
Analytical business typically requires analyzing massive amounts of data, which places high demands on the query capabilities and data management capabilities of the database. OceanBase Database partitions a table into multiple data subsets based on the partitioning key, thereby improving query efficiency and data management capabilities:
- Improved query efficiency: Partition pruning reduces the amount of irrelevant data that needs to be scanned.
- Data maintenance: Supports data management at the partition level, such as data archiving and cleanup.
- Data distribution: Distributes data across multiple nodes based on the partitioning key, enabling good scalability.
Role of partitions in OceanBase Database
In OceanBase Database, partitions are the basic units of horizontal sharding and the smallest physical units for data distribution, load balancing, and parallel operations. A large table is logically divided into multiple smaller, more manageable blocks. Each partition (and even different replicas of a partition) can be stored on different OBServer nodes in the cluster.
This design provides fundamental advantages for analytical business: when the storage or computing capacity of a single node reaches its limit, you can simply add more nodes and redistribute the partitions to achieve nearly linear horizontal scaling, easily handling petabyte-scale data.
Partition pruning improves query efficiency
After partitions are created, when you specify a partitioning column for a query, partition pruning can filter out partitions that do not meet the query conditions, so that the query does not need to scan those partitions.
Here is an example. Create a hash-partitioned table t1 with four partitions, where the partitioning key is c2. Specify the query condition c2 = 1, which allows you to prune out only the partition p1.
-- Create a table t1 with four hash partitions, where the partitioning key is c2.
CREATE TABLE t1(c1 INT, c2 INT) PARTITION BY HASH(c2) PARTITIONS 4;
-- Specify the query condition c2 = 1 to prune out only the partition p1.
EXPLAIN SELECT * FROM t1 WHERE c2 = 1;
The execution plan output is as follows:
+------------------------------------------------------------------------------------+
| Query Plan |
+------------------------------------------------------------------------------------+
| =============================================== |
| |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| |
| ----------------------------------------------- |
| |0 |TABLE FULL SCAN|t1 |1 |3 | |
| =============================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([t1.c1], [t1.c2]), filter([t1.c2 = 1]), rowset=16 |
| access([t1.c2], [t1.c1]), partitions(p1) |
| is_index_back=false, is_global_index=false, filter_before_indexback[false], |
| range_key([t1.__pk_increment]), range(MIN ; MAX)always true |
+------------------------------------------------------------------------------------+
Partition pruning filters out unnecessary data. However, too many partitions may also cause other issues, such as excessive metadata and reduced pruning efficiency. Therefore, in OceanBase Database's columnar tables, we recommend that each partition contains at least 1 million rows.
Partitions as data maintenance units
In database operations and maintenance, treating partitions as the basic units of data maintenance can significantly simplify daily management processes, such as data cleanup and partition-level statistics collection.
Take the data cleanup scenario as an example. When data is partitioned by time, you no longer need to delete rows one by one to clean up expired data. Instead, you can simply delete the entire historical partition. This operation only requires modifying metadata and can completely release disk space, avoiding the performance overhead of traditional DML operations.
By naturally classifying data based on partitioning keys (such as time), maintenance operations shift from "row-by-row scanning" to "batch processing," greatly enhancing management efficiency and reducing operational complexity.
Partitions as data distribution units
In OceanBase Database, partitions serve as the data distribution units. Each partition's replicas can be placed on different OBServer nodes to achieve storage and computing scalability.
Storage scalability: When creating a partitioned table, OceanBase Database automatically schedules the partitions and their replicas to different physical nodes based on the cluster's resources. This means that a single table's capacity is no longer limited by a single machine's disk space but by the entire cluster's storage capacity. When the cluster's storage space is insufficient, you can simply add more nodes to expand the storage.
Parallel computing: This is one of the key factors in achieving high performance for analytical workloads. When a query (especially one involving full table scans or large-scale aggregations) is executed, OceanBase Database's optimizer identifies the partitions involved in the query. The query task can be decomposed into multiple subtasks and pushed down to the nodes where the data partitions are located for parallel execution. For example, a
SUM()operation calculates subtotals in each partition and then aggregates the intermediate results to obtain the final total. This fully utilizes the computing power of multiple nodes, significantly accelerating the query.
Partitioning in OceanBase Database
OceanBase Database supports three types of partitioning methods: Hash/Key, Range/Range Columns, and List/List Columns. The usage scenarios of the three types of partitioning methods are different.
HASH/KEY partitions
The hash partitioning method is suitable for scenarios with a large number of distinct values in the partitioning columns and where specific ranges are difficult to define. The advantage is that it ensures an even distribution of data without specific rules across different partitions, but the disadvantage is that range queries cannot benefit from partition pruning.
Scenario example:
- No clear query pattern, data must be evenly distributed across multiple nodes (e.g. user ID, transaction ID).
Key Design Considerations:
Choose a partition key:
- The NDV (number of distinct values) should be significantly larger than the number of partitions (e.g., the NDV of user IDs should be much greater than the number of partitions).
- Preferentially select non-skewed (or minimally skewed) integer or timestamp columns such as
user_id,order_time, or an auto-increment column. - High-frequency query field (e.g.
user_idas a join key).
Partition number recommendation:
- Make sure that the number of partitions matches the number of nodes in the cluster. Otherwise, resources will be unevenly allocated.
Example:
-- Hash partitioning by evenly distributing user_id
CREATE TABLE customer (
user_id BIGINT NOT NULL,
login_time TIMESTAMP NOT NULL,
customer_name VARCHAR(100) NOT NULL,
phone_num BIGINT NOT NULL,
city_name VARCHAR(50) NOT NULL,
sex INT NOT NULL,
id_number VARCHAR(18) NOT NULL,
home_address VARCHAR(255) NOT NULL,
office_address VARCHAR(255) NOT NULL,
age INT NOT NULL
)
PARTITION BY HASH(user_id) PARTITIONS 128;
Partitioning by range/range columns
It is generally applicable when the partition key can be divided into distinct ranges. For example, a large table storing sequence records can be partitioned by a column indicating the time of information, such as a RANGE partition.
Scenario:
- Data increases over time or in numeric ranges (such as
order_time,price). - Fast-clip historical data (e.g., query only the last month).
Considerations:
**Partition Key Selection: **
- Time field (e.g.,
order_time) or numerical field. - Align partition boundaries with business queries (e.g., by day or month).
- Time field (e.g.,
Number of Partitions Suggested:
- Partition data as it grows, for example, by month.
Example scenario:
-- Create a system logs table and create a range partition on the log time column to enable faster queries and data archiving.
CREATE TABLE system_logs (
log_id BIGINT,
log_date TIMESTAMP NOT NULL,
log_level VARCHAR(10),
source_system VARCHAR(50),
user_id BIGINT,
log_message TEXT,
client_ip VARCHAR(15)
)
-- Primary partition: Range partitioned by month, specifying partition boundaries directly by date.
PARTITION BY RANGE COLUMNS(log_date) (
PARTITION p_202001 VALUES LESS THAN ('2020-02-01'),
PARTITION p_202002 VALUES LESS THAN ('2020-03-01'),
PARTITION p_202003 VALUES LESS THAN ('2020-04-01'),
PARTITION p_202004 VALUES LESS THAN ('2020-05-01'),
PARTITION p_202005 VALUES LESS THAN ('2020-06-01'),
PARTITION p_202006 VALUES LESS THAN ('2020-07-01'),
PARTITION p_202007 VALUES LESS THAN ('2020-08-01'),
PARTITION p_202008 VALUES LESS THAN ('2020-09-01'),
PARTITION p_202009 VALUES LESS THAN ('2020-10-01'),
PARTITION p_202010 VALUES LESS THAN ('2020-11-01'),
PARTITION p_202011 VALUES LESS THAN ('2020-12-01'),
PARTITION p_202012 VALUES LESS THAN ('2021-01-01'),
-- Process records for future data or invalid time formats in default partitions
PARTITION p_future VALUES LESS THAN (MAXVALUE)
);
List/List Columns partitioning
It is suitable when you want to explicitly control how rows are mapped to a specific partition. It allows you to perform precise partitioning of unordered or unrelated datasets, but makes it difficult to perform partition pruning for range queries.
Example use case:
- Discrete fields (for example, regions, channel types).
- Query data by fixed categories (for example, query users in the east China region).
Key considerations:
**Partition key selection: **
- Discrete and limited in quantity (e.g., the
regionfield is only['east','west','south','north']). - Partitions need to cover all possible values, to avoid any missing values.
- Discrete and limited in quantity (e.g., the
Partition count limit: - Limit on the number of partitions: The limit is specified by the value of the
MAX_PARTITIONSparameter.- Configure the number of partitions for a table based on the business logic.
Example scenario:
CREATE TABLE orders_by_region (
order_id BIGINT COMMENT 'Order unique identifier',
region_code INT NOT NULL PRIMARY KEY COMMENT 'The region code (1 = north, 2 = east, 3 = south, 4 = west in China).',
customer_id BIGINT COMMENT 'Customer ID',
order_time DATETIME COMMENT 'Order creation time',
product_category VARCHAR(50) COMMENT 'product category',
order_amount DECIMAL(18,2) COMMENT 'Order amount',
payment_status VARCHAR(20) COMMENT 'Payment Status (e.g., PAID, UNPAID)'
)
PARTITION BY LIST(region_code) -- changed the partitioning key to the integer type
(
PARTITION p_north VALUES IN (1), -- Region code 1 corresponds to north/china.
PARTITION p_east VALUES IN (2),
PARTITION p_south VALUES IN (3),
PARTITION p_west VALUES IN (4),
PARTITION p_other VALUES IN (DEFAULT) -- Default partition handles unknown regions
);
Flexible partition management
OceanBase Database has flexible partition management capabilities. From the perspective of data management, it has both data maintenance and data distribution capabilities. From the perspective of usage, it has both manual and automatic management modes. From the perspective of partition levels, it supports the combination of primary and subpartitions. By combining different partitioning strategies, OceanBase Database can meet various data management needs.
This section will explore the data maintenance and data distribution capabilities, considering both usage modes and partition level combinations.
Data maintenance
In business applications, partitions are typically managed based on the time dimension to facilitate operations such as data archiving and cleanup. We will describe the manual partition management capabilities in the context of the complete data lifecycle of a business:
- Business table creation: Create a table partitioned by time and pre-create partitions for the next few days.
- Data import: Import data.
- Business operation: As time progresses, if the pre-created partitions are insufficient, continue to pre-create partitions for the next few days.
- Regular data cleanup: After data has been accumulated for a certain period, some data may no longer be needed. In such cases, the unnecessary partitions can be deleted.
Example:
-- 1. Create a partitioned table (partitioned by day, pre-create partitions for the next 7 days)
CREATE TABLE business_data (
id BIGINT NOT NULL AUTO_INCREMENT,
event_time DATETIME NOT NULL,
metric_value DECIMAL(10,2),
PRIMARY KEY (id, event_time)
) PARTITION BY RANGE COLUMNS(event_time) (
PARTITION p20231025 VALUES LESS THAN ('2023-10-26'),
PARTITION p20231026 VALUES LESS THAN ('2023-10-27'),
PARTITION p20231027 VALUES LESS THAN ('2023-10-28'),
PARTITION p20231028 VALUES LESS THAN ('2023-10-29'),
PARTITION p20231029 VALUES LESS THAN ('2023-10-30'),
PARTITION p20231030 VALUES LESS THAN ('2023-10-31'),
PARTITION p20231031 VALUES LESS THAN ('2023-11-01') -- Pre-create partitions for the next 7 days
);
-- 2. Import data (omitted for brevity)
-- 3. Pre-create partitions for the next 7 days
ALTER TABLE business_data ADD PARTITION (
PARTITION p20231101 VALUES LESS THAN ('2023-11-02'),
PARTITION p20231102 VALUES LESS THAN ('2023-11-03'),
PARTITION p20231103 VALUES LESS THAN ('2023-11-04'),
PARTITION p20231104 VALUES LESS THAN ('2023-11-05'),
PARTITION p20231105 VALUES LESS THAN ('2023-11-06'),
PARTITION p20231106 VALUES LESS THAN ('2023-11-07'),
PARTITION p20231107 VALUES LESS THAN ('2023-11-08')
);
-- 4. Regular data cleanup, for example, delete data older than 7 days after it has expired
ALTER TABLE business_data DROP PARTITION p20231025, p20231026, p20231027, p20231028, p20231029, p20231030, p20231031;
Since data is continuously written, manually maintaining pre-created partitions and regularly cleaning up partitions can be quite cumbersome. To simplify this process, OceanBase Database provides the dynamic partition feature, which supports partitioning by fixed time intervals, pre-creating partitions for a specified duration, and retaining historical partitions for a specified period.
For the example above, if we need to retain 30 days of data and pre-create 7 days of partitions each time, we can use the following syntax:
-- 1. Create a partitioned table and set the dynamic partition policy
CREATE TABLE t1 (
id BIGINT NOT NULL AUTO_INCREMENT,
event_time DATETIME NOT NULL,
metric_value DECIMAL(10,2),
PRIMARY KEY (id, event_time))
DYNAMIC_PARTITION_POLICY
(
ENABLE = true,
TIME_UNIT = 'day',
PRECREATE_TIME = '7day',
EXPIRE_TIME = '30day'
)
PARTITION BY RANGE COLUMNS (event_time)(
PARTITION p20231025 VALUES LESS THAN ('2023-10-26'));
In addition to the RANGE partitioning mode, businesses can also choose other basic partitioning methods based on their specific needs.
For more information about dynamic partitioning, see Dynamic partitioning, Create a dynamic partition table in MySQL mode, and Create a dynamic partition table in Oracle mode.
Data distribution
Partitions can also serve as units for data distribution management. Typically, to achieve data dispersion, the HASH partitioning method is used, which has the following advantages:
- It can effectively achieve uniform data distribution and supports precise partition pruning based on the partitioning key.
- For multiple tables that need to be joined, if they are all HASH partitioned using the same join key and have the same number of partitions, and if the OceanBase Database Table Group feature is used, partitions with the same hash value can be bound to the same node group. This enables Partition-Wise Join during execution, avoiding data shuffling across nodes and significantly improving query performance.
However, HASH partitioning also has limitations:
- Once the number of partitions in a HASH partitioned table is set, modifying it is a heavy operation that involves rewriting all the data in the table. Therefore, the number of partitions in a HASH partitioned table is usually set once and rarely changed, making it difficult to scale.
- For range queries on the partitioning key, partitions cannot be pruned, and all partitions need to be accessed, which may lead to read amplification.
To address the scalability and range query limitations of HASH partitioning, OceanBase Database now supports automatic partition splitting for rowstore tables.
Hybrid data maintenance and distribution management
We can also use subpartitioning to meet both data maintenance and distribution requirements. Typically, the primary partition is used for data maintenance, and the subpartition is used for data distribution. Each requirement can be addressed using the corresponding supported method.
Typical manual partitioning management approach
- Primary partition:
- Type selection: Use RANGE or LIST partitioning to match frequent query conditions (such as time range or region)
- Recommended number of partitions: Set a reasonable range based on the distribution of query conditions and data maintenance needs (e.g., partition by month and retain data for 12 months, or divide into 4 LIST partitions by region)
- Subpartition:
- Type selection: Use HASH partitioning to ensure data is evenly distributed
- Recommended number of subpartitions:
- If only one primary partition is used for writing, the number of subpartitions in the primary partition should meet the resource requirements for writing distribution
- If multiple primary partitions are used for writing, the number of subpartitions should meet the resource requirements for writing distribution
Here are two scenarios using RANGE + HASH and LIST + HASH:
RANGE + HASH
Choose RANGE partitioning for the primary partition and specify order_date. This allows quick filtering of partitions that do not need to be scanned and enables efficient data maintenance through partition management operations. For the subpartition, choose HASH partitioning to distribute writes or reads for the current month across 8 partitions, avoiding hotspots.
CREATE TABLE orders (
user_id BIGINT NOT NULL COMMENT 'User ID (subpartition key)',
order_date DATE NOT NULL COMMENT 'Order date (primary partition key)',
amount DECIMAL(10,2) NOT NULL COMMENT 'Order amount',
status TINYINT NOT NULL COMMENT 'Status: 0-Canceled, 1-Pending, 2-Paid, 3-Shipped, 4-Completed',
region_code CHAR(6) NOT NULL COMMENT 'Region code (first 2 digits for province)',
product_id INT NOT NULL COMMENT 'Product ID',
payment_method VARCHAR(20) COMMENT 'Payment method',
created_at TIMESTAMP(6) DEFAULT CURRENT_TIMESTAMP(6) COMMENT 'Record creation time')
PARTITION BY RANGE COLUMNS(order_date)
SUBPARTITION BY HASH(user_id) SUBPARTITIONS 8
(
PARTITION p202501 VALUES LESS THAN ('2025-02-01'),
PARTITION p202502 VALUES LESS THAN ('2025-03-01'),
...
PARTITION p202601 VALUES LESS THAN ('2026-02-01')
);
LIST + HASH
Choose LIST partitioning for the primary partition and specify the province to filter to the corresponding partition. This also allows data maintenance by province. For the subpartition, choose HASH or KEY partitioning to distribute the read and write traffic of each province across multiple partitions, achieving load balancing.
-- Primary partition: LIST partitioning by province (31 provincial administrative regions)
CREATE TABLE social_insurance_records (
record_id BIGINT,
province_code INT NOT NULL, -- Provincial code (e.g., 11 for Beijing, 31 for Shanghai)
payment_date DATE NOT NULL,
user_id VARCHAR(32) NOT NULL,
amount DECIMAL(10,2)
) PARTITION BY LIST (province_code) -- Primary LIST partition
SUBPARTITION BY KEY(user_id) SUBPARTITIONS 16 -- Subpartition HASH partition
(
PARTITION p_beijing VALUES IN (11),
PARTITION p_shanghai VALUES IN (31),
PARTITION p_tianjin VALUES IN (12),
...
PARTITION p_xizang VALUES IN (54)
);
Typical automatic partitioning management approach
- Primary partition: Choose dynamic partitioning, configure fixed-time partitioning, and set parameters such as how far in advance to pre-create partitions and how long to retain historical partitions.
- Subpartition: Choose automatic RANGE partition splitting, which allows automatic splitting without configuring the number of partitions or splitting rules.