Overview
Analytical workloads typically require analyzing large volumes of data, which places high demands on the query and data management capabilities of the database. OceanBase Database uses partitioning to horizontally split the data of a table into multiple data subsets based on the partitioning key. This approach enhances query efficiency and data management capabilities:
- Improved query efficiency: Partition pruning reduces the amount of irrelevant data scanned.
- Data maintenance: Supports data management at the partition level, such as archiving and cleaning.
- Data distribution: Distributes data across multiple nodes at the partition level, enabling good scalability.
Role of partitions in OceanBase Database
In OceanBase Database, partitions are the basic units of horizontal sharding and the smallest physical units for data distribution, load balancing, and parallel operations. A large table is logically divided into multiple smaller, more manageable independent blocks. Each partition (and even different replicas of a partition) can be stored on different OBServer nodes in the cluster.
This design provides fundamental advantages for analytical workloads: when the storage or computing capacity of a single node reaches its bottleneck, you can simply add nodes and redistribute the partitions to achieve nearly linear horizontal scalability, easily handling data volumes in the petabyte range.
Partition pruning improves query efficiency
After partitioning, when you specify the partitioning column for a query, the query can prune partitions that do not meet the query conditions, eliminating the need to scan those partitions.
Here is an example: create a hash-partitioned table t1 with four partitions, using c2 as the partitioning key. When you specify the query condition c2 = 1, the query will only scan partition p1.
-- Create a table t1 with four hash partitions, using c2 as the partitioning key.
CREATE TABLE t1(c1 INT, c2 INT) PARTITION BY HASH(c2) PARTITIONS 4;
-- Specify the query condition c2 = 1 to prune partition p1.
EXPLAIN SELECT * FROM t1 WHERE c2 = 1;
The execution plan output is as follows:
+------------------------------------------------------------------------------------+
| Query Plan |
+------------------------------------------------------------------------------------+
| =============================================== |
| |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| |
| ----------------------------------------------- |
| |0 |TABLE FULL SCAN|t1 |1 |3 | |
| =============================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([t1.c1], [t1.c2]), filter([t1.c2 = 1]), rowset=16 |
| access([t1.c2], [t1.c1]), partitions(p1) |
| is_index_back=false, is_global_index=false, filter_before_indexback[false], |
| range_key([t1.__pk_increment]), range(MIN ; MAX)always true |
+------------------------------------------------------------------------------------+
Partition pruning can filter out unnecessary data, but having too many partitions can also lead to other issues, such as excessive metadata and reduced pruning efficiency. Therefore, in OceanBase Database's columnar tables, it is recommended that each partition contain at least 100,000 rows.
Partitions as data maintenance units
In database operations and maintenance, treating partitions as the basic units for data maintenance can significantly simplify daily management processes, such as data cleanup and partition-level statistics collection.
For example, in the data cleanup scenario, when data is partitioned by time, you no longer need to delete rows one by one. Instead, you can simply delete the entire historical partition. This operation only requires modifying the metadata and can completely release disk space, avoiding the performance overhead of traditional DML operations.
By naturally classifying data using the partitioning key (such as time), maintenance operations shift from "row-by-row scanning" to "batch processing," greatly improving management efficiency and reducing the complexity of operations and maintenance.
Partitions as data distribution units
As data distribution units in OceanBase Database, each partition's replicas can be placed on different OBServer nodes to achieve storage and computing scalability.
Storage scalability: When creating a partitioned table, OceanBase Database automatically schedules the partitions and their replicas to different physical nodes based on the cluster's resources. This means the capacity of a single table is no longer limited by a single disk, but by the entire cluster's storage capacity. When the cluster's storage space is insufficient, you can simply add nodes to expand the storage.
Parallel computing: This is one of the key factors for achieving high performance in analytical workloads. When a query (especially one involving full-table scans or large-scale aggregations) is executed, OceanBase Database's optimizer identifies the partitions involved in the query. The query task can be decomposed into multiple subtasks and pushed down to the nodes where the data partitions are located for parallel execution. For example, a
SUM()operation calculates the subtotal in each partition and then aggregates the intermediate results to obtain the final total. This fully leverages the computing power of multiple nodes, significantly accelerating the query.
Basic partitioning methods in OceanBase Database
OceanBase Database supports three basic partitioning methods: HASH/KEY, RANGE/RANGE COLUMNS, and LIST/LIST COLUMNS. Each method has its own use cases.
HASH/KEY partitioning
This method is suitable for scenarios where the partitioning column has a high number of distinct values (NDV) and it is difficult to define clear ranges. The advantage is that it can evenly distribute data across partitions without specific rules, but the disadvantage is that it is challenging to perform partition pruning in range queries.
Use cases:
- When there is no clear query pattern and data needs to be evenly distributed across multiple nodes (e.g., user ID, transaction ID).
Design considerations:
Partition key selection:
- Choose a column with a high NDV (e.g., user ID) that is much larger than the number of partitions.
- Prefer integer or timestamp columns with minimal skew (e.g.,
user_id,order_time, or auto-increment columns). - Include frequently queried fields (e.g.,
user_idas a join key).
Recommended number of partitions:
- Ensure the number of partitions matches the number of nodes in the cluster to avoid resource imbalance.
Example scenario:
-- Hash partitioning by user_id to ensure even distribution
CREATE TABLE customer (
user_id BIGINT NOT NULL,
login_time TIMESTAMP NOT NULL,
customer_name VARCHAR(100) NOT NULL,
phone_num BIGINT NOT NULL,
city_name VARCHAR(50) NOT NULL,
sex INT NOT NULL,
id_number VARCHAR(18) NOT NULL,
home_address VARCHAR(255) NOT NULL,
office_address VARCHAR(255) NOT NULL,
age INT NOT NULL
)
PARTITION BY HASH(user_id) PARTITIONS 128;
RANGE/RANGE COLUMNS partitioning
This method is suitable when the partitioning column can be easily divided into clear ranges. For example, a large table storing transaction records can be RANGE-partitioned by a timestamp column to organize data by time.
Use cases:
- When data grows over time or within specific numerical ranges (e.g.,
order_time,price). - When you need to quickly prune historical data (e.g., query data from the last month).
Design considerations:
Partition key selection:
- Choose a timestamp column or a continuous numeric column.
- Align partition boundaries with business query conditions (e.g., partition by day or month).
Recommended number of partitions:
- Set partitions based on data growth, such as by month.
Example scenario:
-- Create a system logs table and partition it by month based on log_date
CREATE TABLE system_logs (
log_id BIGINT,
log_date TIMESTAMP NOT NULL,
log_level VARCHAR(10),
source_system VARCHAR(50),
user_id BIGINT,
log_message TEXT,
client_ip VARCHAR(15)
)
-- RANGE partitioning by month using log_date as the partition key
PARTITION BY RANGE COLUMNS(log_date) (
PARTITION p_202001 VALUES LESS THAN ('2020-02-01'),
PARTITION p_202002 VALUES LESS THAN ('2020-03-01'),
PARTITION p_202003 VALUES LESS THAN ('2020-04-01'),
PARTITION p_202004 VALUES LESS THAN ('2020-05-01'),
PARTITION p_202005 VALUES LESS THAN ('2020-06-01'),
PARTITION p_202006 VALUES LESS THAN ('2020-07-01'),
PARTITION p_202007 VALUES LESS THAN ('2020-08-01'),
PARTITION p_202008 VALUES LESS THAN ('2020-09-01'),
PARTITION p_202009 VALUES LESS THAN ('2020-10-01'),
PARTITION p_202010 VALUES LESS THAN ('2020-11-01'),
PARTITION p_202011 VALUES LESS THAN ('2020-12-01'),
PARTITION p_202012 VALUES LESS THAN ('2021-01-01'),
-- Default partition for future data or invalid timestamps
PARTITION p_future VALUES LESS THAN (MAXVALUE)
);
LIST/LIST COLUMNS partitioning
This method is suitable when you need to explicitly control how rows are mapped to specific partitions. The advantage is that it allows for precise partitioning of unordered or unrelated datasets, but the disadvantage is that it is challenging to perform partition pruning in range queries.
Use cases:
- For discrete fields such as region or channel type.
- When you need to quickly prune data based on fixed categories (e.g., query users in the East China region).
Design considerations:
Partition key selection:
- Choose a discrete column with a limited number of values (e.g.,
regionfield with values['east', 'west', 'south', 'north']). - Ensure all possible values are covered to avoid missing data.
- Choose a discrete column with a limited number of values (e.g.,
Partition number limit:
- Configure the number of partitions based on business logic.
Example scenario:
CREATE TABLE orders_by_region (
order_id BIGINT COMMENT 'Order unique identifier',
region_code INT NOT NULL PRIMARY KEY COMMENT 'Region code (1=north, 2=east, 3=south, 4=west)',
customer_id BIGINT COMMENT 'Customer ID',
order_time DATETIME COMMENT 'Order creation time',
product_category VARCHAR(50) COMMENT 'Product category',
order_amount DECIMAL(18,2) COMMENT 'Order amount',
payment_status VARCHAR(20) COMMENT 'Payment status (e.g., PAID, UNPAID)'
)
PARTITION BY LIST(region_code) -- Integer partition key
(
PARTITION p_north VALUES IN (1), -- Region code 1 corresponds to north
PARTITION p_east VALUES IN (2),
PARTITION p_south VALUES IN (3),
PARTITION p_west VALUES IN (4),
PARTITION p_other VALUES IN (DEFAULT) -- Default partition for unknown regions
);
Flexible partition management
OceanBase Database provides flexible partition management capabilities. From the perspective of data management, it supports data maintenance and data distribution. From the perspective of usage, it supports manual and automatic management. From the perspective of partition levels, it supports the combination of primary and subpartitions. By combining different partitioning methods, OceanBase Database meets different data management requirements.
This topic describes the data maintenance and data distribution capabilities of OceanBase Database.
Data maintenance
In business scenarios, partitions are usually managed by time dimension for easy data archiving and cleanup. This topic describes the manual partition management capabilities based on the complete data lifecycle process:
- Create a table: Create a table partitioned by time and create partitions for the next few days in advance.
- Import data: Import data.
- Run the business: As time passes, the pre-created partitions may be insufficient. Therefore, you need to create partitions for the next few days in advance.
- Clean up data periodically: After a certain amount of data is accumulated, the data that is no longer needed can be deleted by dropping the corresponding partitions.
Example:
-- 1. Create a partitioned table (partitioned by day, with 7 partitions created in advance)
CREATE TABLE business_data (
id BIGINT NOT NULL AUTO_INCREMENT,
event_time DATETIME NOT NULL,
metric_value DECIMAL(10,2),
PRIMARY KEY (id, event_time)
) PARTITION BY RANGE COLUMNS(event_time) (
PARTITION p20231025 VALUES LESS THAN ('2023-10-26'),
PARTITION p20231026 VALUES LESS THAN ('2023-10-27'),
PARTITION p20231027 VALUES LESS THAN ('2023-10-28'),
PARTITION p20231028 VALUES LESS THAN ('2023-10-29'),
PARTITION p20231029 VALUES LESS THAN ('2023-10-30'),
PARTITION p20231030 VALUES LESS THAN ('2023-10-31'),
PARTITION p20231031 VALUES LESS THAN ('2023-11-01') -- Create 7 partitions in advance
);
-- 2. Import data. This step is omitted.
-- 3. Create 7 partitions in advance
ALTER TABLE business_data ADD PARTITION (
PARTITION p20231101 VALUES LESS THAN ('2023-11-02'),
PARTITION p20231102 VALUES LESS THAN ('2023-11-03'),
PARTITION p20231103 VALUES LESS THAN ('2023-11-04'),
PARTITION p20231104 VALUES LESS THAN ('2023-11-05'),
PARTITION p20231105 VALUES LESS THAN ('2023-11-06'),
PARTITION p20231106 VALUES LESS THAN ('2023-11-07'),
PARTITION p20231107 VALUES LESS THAN ('2023-11-08')
);
-- 4. Clean up data periodically. For example, after data expires, delete data in the 7 partitions.
ALTER TABLE business_data DROP PARTITION p20231025, p20231026, p20231027, p20231028, p20231029, p20231030, p20231031;
Because data is continuously written, manually maintaining pre-created partitions and periodically cleaning up partitions is quite cumbersome. To simplify this process, OceanBase Database provides the dynamic partition feature, which supports partitioning by fixed time, pre-creating partitions for a specified period, and retaining historical partitions for a specified period.
For the example above, if you want to retain 30 days of data and pre-create 7 days of partitions each time, you can use the following syntax:
-- 1. Create a partitioned table and set the dynamic partition policy
CREATE TABLE t1 (
id BIGINT NOT NULL AUTO_INCREMENT,
event_time DATETIME NOT NULL,
metric_value DECIMAL(10,2),
PRIMARY KEY (id, event_time))
DYNAMIC_PARTITION_POLICY
(
ENABLE = true,
TIME_UNIT = 'day',
PRECREATE_TIME = '7day',
EXPIRE_TIME = '30day'
)
PARTITION BY RANGE COLUMNS (event_time)(
PARTITION p20231025 VALUES LESS THAN ('2023-10-26'));
In addition to RANGE partitioning, you can also use other partitioning methods based on your business requirements.
For more information about dynamic partitioning, see Dynamic partitioning, Create a dynamic partitioned table in MySQL mode, and Create a dynamic partitioned table in Oracle mode
Data distribution
Partitions can also be used as units for data distribution management. Generally, HASH partitioning is used to disperse data, and it has the following advantages:
- It can effectively achieve uniform data distribution and supports precise partition pruning based on the partitioning key.
- For multiple tables that need to be joined, if they are all HASH partitioned by the same JOIN key and have the same number of partitions, and you use the Table Group feature of OceanBase Database, you can bind partitions with the same hash value to the same node group. This enables Partition-Wise Join during JOIN operations, avoiding data shuffling across nodes and significantly improving query performance.
Hash partitioning also has limitations:
- Once the number of partitions is set for HASH partitioning, modifying it is a heavy operation that involves rewriting all data in the table. Therefore, the number of partitions is usually not changed after it is set.
- For range queries on the partitioning key, partition pruning is not supported, and all partitions need to be accessed, which may result in read amplification.
To address the scalability and range query limitations of HASH partitioning, OceanBase Database supports automatic partition splitting for rowstore tables.
Hybrid data maintenance and distribution management
We can also support both data maintenance and distribution management through subpartitioning. Typically, the approach is: use primary partitions for data maintenance and subpartitions for data distribution. Each requirement can be addressed using the corresponding supported methods.
Typical manual partitioning management
- Primary partitions:
- Type selection: Use RANGE or LIST partitions to match high-frequency query conditions (e.g., time range, region)
- Recommended number of partitions: Set a reasonable range based on the distribution of query conditions and data maintenance needs (e.g., partition by month and retain data for 12 months, or divide into 4 LIST partitions by region)
- Subpartitions:
- Type selection: Use HASH partitions to ensure data is evenly distributed
- Recommended number of subpartitions:
- If only one primary partition is used for writing, the number of subpartitions in that primary partition should meet the resource requirements for even data distribution
- If multiple primary partitions can be used for writing, the number of subpartitions should be sufficient to meet the resource requirements for even data distribution across the number of primary partitions
Here are two scenarios: Range + Hash and List + Hash
Range + Hash
Use RANGE partitions for the primary partition and specify order_date to quickly filter out partitions that do not need to be scanned, allowing for efficient data maintenance through partition management operations. Use HASH partitions for subpartitions to distribute writes or reads for the current month across 8 partitions, avoiding hotspots.
CREATE TABLE orders (
user_id BIGINT NOT NULL COMMENT 'User ID (subpartition key)',
order_date DATE NOT NULL COMMENT 'Order date (primary partition key)',
amount DECIMAL(10,2) NOT NULL COMMENT 'Order amount',
status TINYINT NOT NULL COMMENT 'Status: 0-Canceled, 1-Pending Payment, 2-Paid, 3-Shipped, 4-Completed',
region_code CHAR(6) NOT NULL COMMENT 'Region code (first 2 digits for province code)',
product_id INT NOT NULL COMMENT 'Product ID',
payment_method VARCHAR(20) COMMENT 'Payment method',
created_at TIMESTAMP(6) DEFAULT CURRENT_TIMESTAMP(6) COMMENT 'Record creation time')
PARTITION BY RANGE COLUMNS(order_date)
SUBPARTITION BY HASH(user_id) SUBPARTITIONS 8
(
PARTITION p202501 VALUES LESS THAN ('2025-02-01'),
PARTITION p202502 VALUES LESS THAN ('2025-03-01'),
...
PARTITION p202601 VALUES LESS THAN ('2026-02-01')
);
List + Hash
Use LIST partitions for the primary partition and specify provinces to allow for pruning to the corresponding partitions, enabling data maintenance at the provincial level. Use HASH or KEY partitions for subpartitions to distribute read and write traffic across multiple partitions, achieving load balancing.
-- Primary partition: LIST by province (31 provincial administrative regions)
CREATE TABLE social_insurance_records (
record_id BIGINT,
province_code INT NOT NULL, -- Provincial code (e.g., 11 for Beijing, 31 for Shanghai)
payment_date DATE NOT NULL,
user_id VARCHAR(32) NOT NULL,
amount DECIMAL(10,2)
) PARTITION BY LIST (province_code) -- Primary LIST partition
SUBPARTITION BY KEY(user_id) SUBPARTITIONS 16 -- Subpartition HASH partition
(
PARTITION p_beijing VALUES IN (11),
PARTITION p_shanghai VALUES IN (31),
PARTITION p_tianjin VALUES IN (12),
...
PARTITION p_xizang VALUES IN (54)
);
Typical automatic partitioning management
- Primary partitions: Choose dynamic partitions and configure parameters such as fixed time-based partitioning, how far in advance to pre-create partitions, and how long to retain historical partitions.
- Subpartitions: Choose automatic RANGE partition splitting, which allows for automatic splitting without the need to configure the number of partitions or partitioning rules.
