Overview
Analytical workloads typically require the analysis and computation of large volumes of data, which places high demands on the query capabilities and data management of the database. OceanBase Database partitions a table into multiple data subsets based on the partitioning key, which enhances query efficiency and data management capabilities:
- Improved query efficiency: Partition pruning reduces the scanning of irrelevant data.
- Data maintenance: Supports data management at the partition level, such as archiving and cleaning.
- Data distribution: Distributing data at the partition level allows it to be spread across multiple nodes, offering good scalability.
Role of partitions in OceanBase Database
In OceanBase Database, partitions are the basic units of horizontal sharding and the smallest physical units for data distribution, load balancing, and parallel operations. A large table is logically divided into multiple smaller, more manageable blocks. Each partition (and even different replicas of a partition) can be stored on different OBServer nodes within the cluster.
This design provides fundamental advantages for analytical workloads: when a single node's storage or computing capacity reaches its limit, you can simply add more nodes and redistribute the partitions to achieve nearly linear horizontal scaling, easily handling petabyte-scale data.
Partition pruning improves query efficiency
After partitioning, when you specify a partition column for a query, partition pruning can identify and query only the partitions that meet the query conditions, eliminating the need to scan partitions that do not meet the conditions.
Here's an example: create a table t1 with four hash partitions based on the c2 column. When you query for c2 = 1, partition pruning will only query the p1 partition.
-- Create a table t1 with four hash partitions based on the c2 column.
CREATE TABLE t1(c1 INT, c2 INT) PARTITION BY HASH(c2) PARTITIONS 4;
-- Query for c2 = 1, pruning to only the p1 partition.
EXPLAIN SELECT * FROM t1 WHERE c2 = 1;
The execution plan output shows:
+------------------------------------------------------------------------------------+
| Query Plan |
+------------------------------------------------------------------------------------+
| =============================================== |
| |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| |
| ----------------------------------------------- |
| |0 |TABLE FULL SCAN|t1 |1 |3 | |
| =============================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([t1.c1], [t1.c2]), filter([t1.c2 = 1]), rowset=16 |
| access([t1.c2], [t1.c1]), partitions(p1) |
| is_index_back=false, is_global_index=false, filter_before_indexback[false], |
| range_key([t1.__pk_increment]), range(MIN ; MAX)always true |
+------------------------------------------------------------------------------------+
Partition pruning filters out unnecessary data, but having too many partitions can also lead to other issues, such as excessive metadata and reduced pruning efficiency. Therefore, in OceanBase Database's columnar tables, it is recommended to have at least 100,000 rows per partition.
Partitions as data maintenance units
In database maintenance, treating partitions as the basic units for data management simplifies daily management processes, such as data cleanup and partition-level statistics collection.
Take the data cleanup scenario as an example. When data is partitioned by time, instead of deleting rows individually, you can simply delete the entire historical partition. This operation only requires modifying the metadata and can completely release disk space, avoiding the performance overhead of traditional DML operations.
By naturally categorizing data based on the partitioning key (such as time), maintenance operations shift from "row-by-row scanning" to "batch processing," significantly improving management efficiency and reducing operational complexity.
Partitions as data distribution units
In OceanBase Database, partitions serve as the data distribution units. Each partition's replicas can be placed on different OBServer nodes to achieve storage and computing scalability.
Storage scalability: When creating a partitioned table, OceanBase Database automatically schedules the partitions and their replicas to different physical nodes based on cluster resources. This means the capacity of a single table is no longer limited by a single server's disk space but by the entire cluster's storage capacity. If the cluster's storage space is insufficient, you can simply add more nodes to expand.
Parallel computing: This is one of the key factors in achieving high performance for analytical workloads. When a query (especially one involving full-table scans or large-scale aggregations) is executed, OceanBase Database's optimizer identifies the partitions involved in the query. The query task can be decomposed into multiple subtasks and pushed down to the nodes where the data partitions are located for parallel execution. For example, a
SUM()operation calculates subtotals in each partition and then aggregates the intermediate results to obtain the final total, fully utilizing the computing power of multiple nodes and significantly accelerating the query.
Partitioning methods of OceanBase Database
OceanBase Database supports three types of partitioning: hash/key, range/range columns, and list/list columns. Each type is used in specific scenarios.
HASH/KEY Partitioning
This is suitable for when the number of distinct values of the partitioning column is large and clear ranges are difficult to define. The advantage is that data without specific rules can be evenly distributed across different partitions. The disadvantage is that it is difficult to prune partitions during range queries.
Example scenario:
- A lack of clear query patterns requires data to be evenly distributed across multiple nodes (e.g., based on user ID and transaction ID).
Design considerations:
Choose a partition key:
- The number of distinct values (NDV) is significantly higher than the number of partitions (for example, the NDV of user IDs is much higher than the number of partitions).
- Preference for non-skewed (or minimally skewed) integer/time columns (such as
user_id,order_time, or auto-incrementing columns). - High-frequency query condition fields (such as
user_idas a join key).
Number of Partitions Recommended:
- Make sure that the number of partitions is equal to the number of cluster nodes to avoid resource allocation imbalance.
Scenario Example:
-- Hash partitioning, evenly distributed by user_id
CREATE TABLE customer (
user_id BIGINT NOT NULL,
login_time TIMESTAMP NOT NULL,
customer_name VARCHAR(100) NOT NULL,
phone_num BIGINT NOT NULL,
city_name VARCHAR(50) NOT NULL,
sex INT NOT NULL,
id_number VARCHAR(18) NOT NULL,
home_address VARCHAR(255) NOT NULL,
office_address VARCHAR(255) NOT NULL,
age INT NOT NULL
)
PARTITION BY HASH(user_id) PARTITIONS 128;
Range or range columns partitioning
Range partitioning is applicable when a partition key can clearly split the data into a certain range, for example, a large table that records the information such as data sequence can be partitioned by the column that represents the time.
Example:
- Data grows over time or numeric range (e.g.,
order_time,price). - You want to quickly prune historical data (such as query data within the last 30 days).
Design considerations:
Partition key selection:
- Time field (such as
order_time) or continuous numeric field. - Align the partition boundaries with the business query conditions (e.g., daily / monthly partitions).
- Time field (such as
Number of partitions recommendation:
- Partition the data based on the growth of data, for example, partition by month.
Example scenario:
-- Create a system log table and partition it by month using the RANGE partitioning method based on log time, which facilitates quick queries and data archiving.
CREATE TABLE system_logs (
log_id BIGINT,
log_date TIMESTAMP NOT NULL,
log_level VARCHAR(10),
source_system VARCHAR(50),
user_id BIGINT,
log_message TEXT,
client_ip VARCHAR(15)
)
-- Primary partitioning: RANGE partitioning by month, using the date as the boundary
PARTITION BY RANGE COLUMNS(log_date) (
PARTITION p_202001 VALUES LESS THAN ('2020-02-01'),
PARTITION p_202002 VALUES LESS THAN ('2020-03-01'),
PARTITION p_202003 VALUES LESS THAN ('2020-04-01'),
PARTITION p_202004 VALUES LESS THAN ('2020-05-01'),
PARTITION p_202005 VALUES LESS THAN ('2020-06-01'),
PARTITION p_202006 VALUES LESS THAN ('2020-07-01'),
PARTITION p_202007 VALUES LESS THAN ('2020-08-01'),
PARTITION p_202008 VALUES LESS THAN ('2020-09-01'),
PARTITION p_202009 VALUES LESS THAN ('2020-10-01'),
PARTITION p_202010 VALUES LESS THAN ('2020-11-01'),
PARTITION p_202011 VALUES LESS THAN ('2020-12-01'),
PARTITION p_202012 VALUES LESS THAN ('2021-01-01'),
-- A default partition handles records with future dates or time format errors.
PARTITION p_future VALUES LESS THAN (MAXVALUE)
);
List/List Columns Partition
This approach is suitable for situations requiring explicit control over how each row is mapped to a specific partition. The advantage is the ability to accurately partition unordered or unrelated datasets. However, it is challenging to perform partition pruning during range queries.
Scenario:
- Discrete fields such as regions and channel types.
- Quick filtering by fixed categories, such as querying users in the East China region.
Key design considerations:
**Partitioning key: **
- Discrete values with a finite number of options (e.g., the
regionfield only has['east','west','south','north']). - Partitioning values must cover all possible values to avoid any omissions.
- Discrete values with a finite number of options (e.g., the
Partitions limit:
- Configure the number of partitions based on your business logic.
Example:
CREATE TABLE orders_by_region (
order_id BIGINT COMMENT 'Order unique identifier',
region_code INT NOT NULL PRIMARY KEY COMMENT 'Region code (1=north/China, 2=east/China, 3=south/China, 4=west/China) ',
customer_id BIGINT COMMENT 'Customer ID',
order_time DATETIME COMMENT 'The time when the order was created',
product_category VARCHAR(50) COMMENT 'Product category',
order_amount DECIMAL(18,2) COMMENT 'Order amount',
payment_status VARCHAR(20) COMMENT 'Payment status (e.g., PAID, UNPAID)'
)
PARTITION BY LIST(region_code) -- Change the data type of the partitioning key to integer.
(
PARTITION p_north VALUES IN (1), -- Region code 1 corresponds to north/china
PARTITION p_east VALUES IN (2),
PARTITION p_south VALUES IN (3),
PARTITION p_west VALUES IN (4),
PARTITION p_other VALUES IN (DEFAULT) -- Default partition for unspecified regions
);
Flexible partition management
OceanBase Database provides flexible partition management capabilities. From the perspective of data management, it supports data maintenance and data distribution. From the perspective of usage, it supports manual and automatic management. From the perspective of partition levels, it supports the combination of primary and subpartitions. By combining different partitioning strategies, OceanBase Database can meet the diverse data management needs of users.
This section will explore data maintenance and data distribution from two angles, while also considering the usage methods and the combination of partition levels.
Data maintenance
At the business level, partitions are typically managed based on the time dimension to facilitate operations such as data archiving and cleanup. We will describe the manual partition management capabilities in the context of the complete data lifecycle of a business:
- Business table creation: Create a table partitioned by time and pre-create partitions for the next few days.
- Data import: Import data.
- Business operation: As time progresses, if the pre-created partitions are insufficient, continue to pre-create partitions for the next few days.
- Regular data cleanup: After data has accumulated for a certain period, some data may no longer be needed. In this case, you can delete the unnecessary partitions.
Example:
-- 1. Create a partitioned table (partitioned by day, pre-create partitions for the next 7 days)
CREATE TABLE business_data (
id BIGINT NOT NULL AUTO_INCREMENT,
event_time DATETIME NOT NULL,
metric_value DECIMAL(10,2),
PRIMARY KEY (id, event_time)
) PARTITION BY RANGE COLUMNS(event_time) (
PARTITION p20231025 VALUES LESS THAN ('2023-10-26'),
PARTITION p20231026 VALUES LESS THAN ('2023-10-27'),
PARTITION p20231027 VALUES LESS THAN ('2023-10-28'),
PARTITION p20231028 VALUES LESS THAN ('2023-10-29'),
PARTITION p20231029 VALUES LESS THAN ('2023-10-30'),
PARTITION p20231030 VALUES LESS THAN ('2023-10-31'),
PARTITION p20231031 VALUES LESS THAN ('2023-11-01') -- Pre-create partitions for the next 7 days
);
-- 2. Import data (omitted for brevity)
-- 3. Pre-create partitions for the next 7 days
ALTER TABLE business_data ADD PARTITION (
PARTITION p20231101 VALUES LESS THAN ('2023-11-02'),
PARTITION p20231102 VALUES LESS THAN ('2023-11-03'),
PARTITION p20231103 VALUES LESS THAN ('2023-11-04'),
PARTITION p20231104 VALUES LESS THAN ('2023-11-05'),
PARTITION p20231105 VALUES LESS THAN ('2023-11-06'),
PARTITION p20231106 VALUES LESS THAN ('2023-11-07'),
PARTITION p20231107 VALUES LESS THAN ('2023-11-08')
);
-- 4. Regular data cleanup, for example, delete data that is no longer needed after 7 days
ALTER TABLE business_data DROP PARTITION p20231025, p20231026, p20231027, p20231028, p20231029, p20231030, p20231031;
Since data is continuously written, manually maintaining pre-created partitions and regularly cleaning up partitions can be quite cumbersome. To simplify this process, OceanBase Database provides the dynamic partition feature, which supports partitioning based on fixed time intervals, pre-creating partitions for a specified duration, and retaining historical partitions for a specified period.
For the example above, if we need to retain 30 days of data and pre-create 7 days of partitions each time, we can use the following syntax:
-- 1. Create a partitioned table and set the dynamic partition strategy
CREATE TABLE t1 (
id BIGINT NOT NULL AUTO_INCREMENT,
event_time DATETIME NOT NULL,
metric_value DECIMAL(10,2),
PRIMARY KEY (id, event_time))
DYNAMIC_PARTITION_POLICY
(
ENABLE = true,
TIME_UNIT = 'day',
PRECREATE_TIME = '7day',
EXPIRE_TIME = '30day'
)
PARTITION BY RANGE COLUMNS (event_time)(
PARTITION p20231025 VALUES LESS THAN ('2023-10-26'));
In addition to the RANGE partitioning method, businesses can also choose other basic partitioning methods based on their specific needs.
For more information about dynamic partitioning, see Dynamic partitioning, Create a dynamic partition table in MySQL mode, and Create a dynamic partition table in Oracle mode.
Data distribution
Partitions can also serve as units for data distribution management. Typically, to achieve data dispersion, the HASH partitioning method is used. It has the following advantages:
- It can effectively achieve uniform data distribution and supports precise partition pruning based on the partitioning key.
- For multiple tables that need to perform JOIN operations, if they are all HASH partitioned using the same JOIN key and have the same number of partitions, and if the Table Group feature of OceanBase Database is used, partitions with the same hash value can be bound to the same group of nodes. This enables the use of Partition-Wise Join during JOIN operations, avoiding data shuffling across nodes and significantly improving query performance.
Hash partitioning also has limitations:
- Once the number of partitions for a HASH partitioned table is set, modifying it is a resource-intensive operation that involves rewriting the entire table. Therefore, the number of partitions for a HASH partitioned table is usually set once and rarely changed, making it difficult to scale.
- For range queries on the partitioning key, partition pruning is not supported, and all partitions must be accessed, which may result in read amplification.
To address the scalability issues of HASH partitioning and the limitations of range queries, OceanBase Database now supports automatic partition splitting for rowstore tables.
Hybrid data maintenance and data distribution management
We can also support both data maintenance and data distribution needs using subpartitioning. Typically, this approach is used in scenarios where primary partitioning is used for data maintenance and subpartitioning is used for data distribution. Each need can be addressed using the corresponding supported methods.
Typical manual partitioning management methods
- Primary partitioning:
- Type selection: Use Range or List partitioning to match high-frequency query conditions (such as time ranges or regions)
- Number of partitions: Set a reasonable range based on the distribution of query conditions and data maintenance needs (e.g., partition by month and retain data for 12 months, or partition by region into 4 List partitions)
- Subpartitioning:
- Type selection: Use Hash partitioning to ensure data dispersion
- Number of subpartitions:
- If there is only one primary partition for writing, the number of subpartitions in the primary partition should meet the resource requirements for writing dispersion
- If there are multiple primary partitions for writing, the number of subpartitions should meet the resource requirements for writing dispersion
Here are two scenarios using Range + Hash and List + Hash:
Range + Hash
Choose Range partitioning for the primary partition and specify order_date to quickly filter out partitions that do not need to be scanned and enable quick data maintenance through partition management operations. Choose Hash partitioning for the subpartition to disperse writes or reads for the current month across 8 partitions, avoiding hotspots.
CREATE TABLE orders (
user_id BIGINT NOT NULL COMMENT 'User ID (subpartition key)',
order_date DATE NOT NULL COMMENT 'Order date (primary partition key)',
amount DECIMAL(10,2) NOT NULL COMMENT 'Order amount',
status TINYINT NOT NULL COMMENT 'Status: 0-Canceled, 1-Pending, 2-Paid, 3-Shipped, 4-Completed',
region_code CHAR(6) NOT NULL COMMENT 'Region code (first 2 digits for province code)',
product_id INT NOT NULL COMMENT 'Product ID',
payment_method VARCHAR(20) COMMENT 'Payment method',
created_at TIMESTAMP(6) DEFAULT CURRENT_TIMESTAMP(6) COMMENT 'Record creation time')
PARTITION BY RANGE COLUMNS(order_date)
SUBPARTITION BY HASH(user_id) SUBPARTITIONS 8
(
PARTITION p202501 VALUES LESS THAN ('2025-02-01'),
PARTITION p202502 VALUES LESS THAN ('2025-03-01'),
...
PARTITION p202601 VALUES LESS THAN ('2026-02-01')
);
List + Hash
Choose List partitioning for the primary partition and specify provinces to prune to the corresponding partitions, enabling data maintenance at the provincial level. Choose Hash or Key partitioning for the subpartition to disperse read/write traffic across multiple partitions for load balancing.
-- Primary partitioning: LIST partitioning by province (31 provincial administrative regions)
CREATE TABLE social_insurance_records (
record_id BIGINT,
province_code INT NOT NULL, -- Provincial code (e.g., 11 for Beijing, 31 for Shanghai)
payment_date DATE NOT NULL,
user_id VARCHAR(32) NOT NULL,
amount DECIMAL(10,2)
) PARTITION BY LIST (province_code) -- Primary LIST partitioning
SUBPARTITION BY KEY(user_id) SUBPARTITIONS 16 -- Subpartitioning using HASH partitioning
(
PARTITION p_beijing VALUES IN (11),
PARTITION p_shanghai VALUES IN (31),
PARTITION p_tianjin VALUES IN (12),
...
PARTITION p_xizang VALUES IN (54)
);
Typical automatic partitioning management methods
- Primary partitioning: Choose dynamic partitioning, configure fixed-time partitioning, and set parameters such as how far ahead to pre-create partitions and how long to retain historical partitions.
- Subpartitioning: Choose automatic Range partition splitting, which allows automatic splitting without configuring the number of partitions or partitioning rules.
