OceanBase logo

OceanBase

A unified distributed database ready for your transactional, analytical, and AI workloads.

Product Overview
DEPLOY YOUR WAY

OceanBase Cloud

The best way to deploy and scale OceanBase

OceanBase Enterprise

Run and manage OceanBase on your infra

TRY OPEN SOURCE

OceanBase Community Edition

The free, open-source distributed database

OceanBase seekdb

Open source AI native search database

Customer Stories

Real-world success stories from enterprises across diverse industries.

View All
BY USE CASES

Mission-Critical Transactions

Global & Multicloud Application

Elastic Scaling for Peak Traffic

Real-time Analytics

Active Geo-redundancy

Database Consolidation

Resources

Comprehensive knowledge hub for OceanBase.

Blog

Live Demos

Training & Certification

Documentation

Official technical guides, tutorials, API references, and manuals for all OceanBase products.

View All
PRODUCTS

OceanBase Cloud

OceanBase Database

Tools

Connectors and Middleware

QUICK START

OceanBase Cloud

OceanBase Database

BEST PRACTICES

Practical guides for utilizing OceanBase more effectively and conveniently

Company

Learn more about OceanBase – our company, partnerships, and trust and security initiatives.

About OceanBase

Partner

Trust Center

Contact Us

International - English
中国站 - 简体中文
日本 - 日本語
Sign In
Start on Cloud

OceanBase

A unified distributed database ready for your transactional, analytical, and AI workloads.

Product Overview
DEPLOY YOUR WAY

OceanBase Cloud

The best way to deploy and scale OceanBase

OceanBase Enterprise

Run and manage OceanBase on your infra

TRY OPEN SOURCE

OceanBase Community Edition

The free, open-source distributed database

OceanBase seekdb

Open source AI native search database

Customer Stories

Real-world success stories from enterprises across diverse industries.

View All
BY USE CASES

Mission-Critical Transactions

Global & Multicloud Application

Elastic Scaling for Peak Traffic

Real-time Analytics

Active Geo-redundancy

Database Consolidation

Comprehensive knowledge hub for OceanBase.

Blog

Live Demos

Training & Certification

Documentation

Official technical guides, tutorials, API references, and manuals for all OceanBase products.

View All
PRODUCTS
OceanBase CloudOceanBase Database
ToolsConnectors and Middleware
QUICK START
OceanBase CloudOceanBase Database
BEST PRACTICES

Practical guides for utilizing OceanBase more effectively and conveniently

Learn more about OceanBase – our company, partnerships, and trust and security initiatives.

About OceanBase

Partner

Trust Center

Contact Us

Start on Cloud
编组
All Products
    • Databases
    • iconOceanBase Database
    • iconOceanBase Cloud
    • iconOceanBase Tugraph
    • iconInteractive Tutorials
    • iconOceanBase Best Practices
    • Tools
    • iconOceanBase Cloud Platform
    • iconOceanBase Migration Service
    • iconOceanBase Developer Center
    • iconOceanBase Migration Assessment
    • iconOceanBase Admin Tool
    • iconOceanBase Loader and Dumper
    • iconOceanBase Deployer
    • iconKubernetes operator for OceanBase
    • iconOceanBase Diagnostic Tool
    • iconOceanBase Binlog Service
    • Connectors and Middleware
    • iconOceanBase Database Proxy
    • iconEmbedded SQL in C for OceanBase
    • iconOceanBase Call Interface
    • iconOceanBase Connector/C
    • iconOceanBase Connector/J
    • iconOceanBase Connector/ODBC
    • iconOceanBase Connector/NET
icon

OceanBase Database

SQL - V4.3.5

    Download PDF

    OceanBase logo

    The Unified Distributed Database for the AI Era.

    Follow Us
    Products
    OceanBase CloudOceanBase EnterpriseOceanBase Community EditionOceanBase seekdb
    Resources
    DocsBlogLive DemosTraining & Certification
    Company
    About OceanBaseTrust CenterLegalPartnerContact Us
    Follow Us

    © OceanBase 2026. All rights reserved

    Cloud Service AgreementPrivacy PolicySecurity
    Contact Us
    Document Feedback
    1. Documentation Center
    2. OceanBase Database
    3. SQL
    4. V4.3.5
    iconOceanBase Database
    SQL - V 4.3.5
    SQL
    KV
    • V 4.4.2
    • V 4.3.5
    • V 4.3.3
    • V 4.3.1
    • V 4.3.0
    • V 4.2.5
    • V 4.2.2
    • V 4.2.1
    • V 4.2.0
    • V 4.1.0
    • V 4.0.0
    • V 3.1.4 and earlier

    Table partitioning design in OLAP scenarios

    Last Updated:2026-04-23 09:49:18  Updated
    share
    What is on this page
    Overview
    Role of partitions in OceanBase Database
    Partition pruning improves query efficiency
    Partitions as data maintenance units
    Partitions as data distribution units
    Partitioning in OceanBase Database
    Hash or key partitioning
    Partition by range/range columns
    Partition a list or list columns
    Flexible partition management
    Data maintenance
    Data distribution
    Mixed data maintenance and data distribution management
    Typical manual partitioning management methods
    Typical automatic partitioning management methods
    References

    folded

    share

    Overview

    Analytical business often requires the analysis and calculation of massive data, which places high demands on the query capabilities and data management capabilities of the database. OceanBase Database partitions a table's data into multiple subsets based on the partitioning key, thereby improving query efficiency and data management capabilities:

    • Improved query efficiency: Partition pruning reduces the amount of irrelevant data scanned.
    • Data maintenance: Supports data management at the partition level, such as archiving and cleaning.
    • Data distribution: Distributes data across multiple nodes at the partition level, enabling good scalability.

    Role of partitions in OceanBase Database

    In OceanBase Database, partitions are the basic units of horizontal sharding and the smallest physical units for data distribution, load balancing, and parallel operations. A large table is logically divided into smaller, more manageable blocks. Each partition (or even different replicas of a partition) can be stored on different OBServer nodes in the cluster.

    This design provides fundamental advantages for analytical business: when the storage or computing capacity of a single node reaches its bottleneck, you can simply add more nodes and redistribute the partitions to achieve nearly linear horizontal scalability, easily handling petabyte-scale data.

    Partition pruning improves query efficiency

    After partitioning, when you specify a partition column for a query, partition pruning can identify and query only the partitions that meet the query conditions, avoiding unnecessary scans of partitions that do not meet the conditions.

    Here is an example. Assume that you create a hash-partitioned table t1 with four partitions and specify c2 = 1 as the query condition. In this case, only partition p1 needs to be queried.

    -- Create a table t1 with four hash partitions and c2 as the partitioning key.
    CREATE TABLE t1(c1 INT, c2 INT) PARTITION BY HASH(c2) PARTITIONS 4;
    
    -- Specify c2 = 1 as the query condition to prune partition p1.
    EXPLAIN SELECT * FROM t1 WHERE c2 = 1;
    

    The execution plan output is as follows:

    +------------------------------------------------------------------------------------+
    | Query Plan                                                                         |
    +------------------------------------------------------------------------------------+
    | ===============================================                                    |
    | |ID|OPERATOR       |NAME|EST.ROWS|EST.TIME(us)|                                    |
    | -----------------------------------------------                                    |
    | |0 |TABLE FULL SCAN|t1  |1       |3           |                                    |
    | ===============================================                                    |
    | Outputs & filters:                                                                 |
    | -------------------------------------                                              |
    |   0 - output([t1.c1], [t1.c2]), filter([t1.c2 = 1]), rowset=16                     |
    |       access([t1.c2], [t1.c1]), partitions(p1)                                     |
    |       is_index_back=false, is_global_index=false, filter_before_indexback[false],  |
    |       range_key([t1.__pk_increment]), range(MIN ; MAX)always true                  |
    +------------------------------------------------------------------------------------+
    

    Partition pruning filters out unnecessary data, but too many partitions can also cause other issues, such as excessive metadata and reduced pruning efficiency. Therefore, we recommend that the number of rows in each partition in OceanBase Database columnstore tables be at least 1 million.

    Partitions as data maintenance units

    In database maintenance, treating partitions as the basic units of data maintenance greatly simplifies daily management processes, such as data cleanup and partition-level statistics collection.

    Take the data cleanup scenario as an example. When data is partitioned by time, you no longer need to delete rows individually. Instead, you can simply delete the entire historical partition. This operation only requires modifying metadata and can fully release disk space, avoiding the performance overhead of traditional DML operations.

    By naturally categorizing data based on the partitioning key (such as time), maintenance operations transition from "row-by-row scanning" to "batch processing," significantly improving management efficiency and reducing operational complexity.

    Partitions as data distribution units

    In OceanBase Database, partitions serve as the units for data distribution. Each partition's replicas can be placed on different OBServer nodes to achieve storage and computing scalability.

    • Storage scalability: When creating a partitioned table, OceanBase Database automatically schedules the partitions and their replicas to different physical nodes based on the cluster's resources. This means that the capacity of a single table is no longer limited by the disk space of a single node but by the total storage capacity of the entire cluster. When the cluster's storage space is insufficient, you can simply add more nodes to expand the storage capacity.

    • Parallel computing: This is one of the key factors in achieving high performance for analytical business. When a query (especially one involving full table scans or large-scale aggregations) is executed, OceanBase Database's optimizer identifies the partitions involved in the query. The query task is then decomposed into multiple subtasks and pushed down to the nodes where the data partitions are located for parallel execution. For example, a SUM() operation calculates the subtotal for each partition locally and then aggregates the intermediate results to obtain the final total. This fully leverages the computing power of multiple nodes, significantly accelerating the query.

    Partitioning in OceanBase Database

    OceanBase Database supports hash/key, range/range columns, and list/list columns partitions. Each type has specific usage scenarios.

    Hash or key partitioning

    Typically applicable when the number of distinct values (NDV) for the partitioning column is large and it's challenging to define clear ranges. The advantage is that it allows data without specific rules to be evenly distributed across different partitions, while the disadvantage is that it can be difficult to prune partitions during range queries.

    Example:

    • No distinct query patterns, and ensure that data is evenly distributed across multiple nodes (e.g., based on user IDs, transaction IDs, etc.).

    Key design:

    • Partition key selection:

      • NDV (number of distinct values) should significantly exceed the number of partitions, such as the NDV for user IDs should be much higher than the number of partitions.
      • Prefer integer or timestamp columns with little or no skew (e.g. user_id, order_time, and auto-incrementing columns).
      • Fields that are frequently queried as join conditions, such as user_id.
    • Recomended number of partitions:

      • Make sure that the number of partitions matches the number of nodes in the cluster to avoid resource allocation imbalance.

    Scenario:

    -- Hash partitioned based on user_id
    CREATE TABLE customer (
      user_id BIGINT NOT NULL,
      login_time TIMESTAMP NOT NULL,
      customer_name VARCHAR(100) NOT NULL,
      phone_num BIGINT NOT NULL,
      city_name VARCHAR(50) NOT NULL,
      sex INT NOT NULL,
      id_number VARCHAR(18) NOT NULL,
      home_address VARCHAR(255) NOT NULL,
      office_address VARCHAR(255) NOT NULL,
      age INT NOT NULL
    )
    PARTITION BY HASH(user_id) PARTITIONS 128;
    

    Partition by range/range columns

    This approach is suitable when the partitioning key can clearly define ranges, such as partitioning a large table storing records by a column indicating the time information.

    Scenario examples:

    • Data grows over time/numerical ranges (such as order_time, price).
    • Fastly trim historical data (e.g., queries for the last month only).

    Considerations:

    • Partition key selection:

      • A time field (such as order_time) or a numeric field.
      • Align the partition boundaries with business query conditions (e.g., partitioning by day or month).
    • Recommended number of partitions:

      • Partition your data as needed, for example by month.

    Example scenario:

    -- Create a system log table partitioned by month based on log time, to support fast querying and data archiving.
    CREATE TABLE system_logs (
        log_id BIGINT,
        log_date TIMESTAMP NOT NULL,
        log_level VARCHAR(10),
        source_system VARCHAR(50),
        user_id BIGINT,
        log_message TEXT,
        client_ip VARCHAR(15)
    )
    -- RANGE partitioned by months; partitions defined using literal dates.
    PARTITION BY RANGE COLUMNS(log_date) (
        PARTITION p_202001 VALUES LESS THAN ('2020-02-01'),
        PARTITION p_202002 VALUES LESS THAN ('2020-03-01'),
        PARTITION p_202003 VALUES LESS THAN ('2020-04-01'),
        PARTITION p_202004 VALUES LESS THAN ('2020-05-01'),
        PARTITION p_202005 VALUES LESS THAN ('2020-06-01'),
        PARTITION p_202006 VALUES LESS THAN ('2020-07-01'),
        PARTITION p_202007 VALUES LESS THAN ('2020-08-01'),
        PARTITION p_202008 VALUES LESS THAN ('2020-09-01'),
        PARTITION p_202009 VALUES LESS THAN ('2020-10-01'),
        PARTITION p_202010 VALUES LESS THAN ('2020-11-01'),
        PARTITION p_202011 VALUES LESS THAN ('2020-12-01'),
        PARTITION p_202012 VALUES LESS THAN ('2021-01-01'),
        -- Default partitioning handles future dates or records with time format errors
        PARTITION p_future VALUES LESS THAN (MAXVALUE)
    );
    

    Partition a list or list columns

    This is generally used when you need explicit control over how rows are mapped to specific partitions, allowing for precise partitioning of unordered or unrelated datasets. However, this approach can make partition pruning challenging during range queries.

    Example use case:

    • Discrete fields (such as region and channel type).
    • Need to quickly query users in a specific region (e.g., the East China region) based on a fixed category.

    Design considerations:

    • Partition key selection:

      • Discrete values with a limited number of possible values (for example, the region field only contains ['east','west','north','south']).
      • The partitioning value should cover all possible values and avoid omissions.
    • Partition count limits:

      • Configure the number of partitions based on your business logic.

    Scenario:

    CREATE TABLE orders_by_region (
        order_id BIGINT COMMENT 'Order unique identifier',
        region_code INT NOT NULL PRIMARY KEY  COMMENT 'region code (1=north/china, 2=east/china, 3=south/china, 4=west/china)',
        customer_id BIGINT COMMENT 'Customer ID',
        order_time DATETIME COMMENT 'The time when the order is created.',
        product_category VARCHAR(50) COMMENT 'Product Category',
        order_amount DECIMAL(18,2) COMMENT 'Order amount',
        payment_status VARCHAR(20) COMMENT 'Payment status (e.g. PAID, UNPAID)'
    )
    PARTITION BY LIST(region_code)  -- Change the partitioning column type to integer.
    (
        PARTITION p_north VALUES IN (1),  -- north/china code 1
        PARTITION p_east VALUES IN (2),
        PARTITION p_south VALUES IN (3),
        PARTITION p_west VALUES IN (4),
        PARTITION p_other VALUES IN (DEFAULT)  -- Default partitioning for unassigned regions.
    );
    

    Flexible partition management

    OceanBase Database has flexible partition management capabilities. From the perspective of data management, it supports both data maintenance and data distribution. From the perspective of usage, it supports both manual and automatic management. From the perspective of partition levels, it supports the combination of primary and subpartitions. Different combinations can meet different data management needs.

    This section will describe the data maintenance and data distribution capabilities from two perspectives, and consider the usage methods and partition level combinations in both perspectives.

    Data maintenance

    In business scenarios, partitions are usually managed by time dimension for convenient data archiving and cleanup. We will describe the manual partition management capabilities based on the complete data lifecycle process of a business:

    • Business table creation: Create a table partitioned by time and pre-create partitions for the upcoming period.
    • Data import: Import data.
    • Business operation: As time passes, the pre-created partitions may be insufficient, so continue to pre-create partitions for the upcoming period.
    • Regular data cleanup: After a certain period of data accumulation, the data from earlier periods may no longer be needed. In this case, you can delete the unnecessary partitions.

    Example:

    -- 1. Create a partitioned table (partitioned by day, pre-create partitions for the next 7 days)
    CREATE TABLE business_data (
        id BIGINT NOT NULL AUTO_INCREMENT,
        event_time DATETIME NOT NULL,
        metric_value DECIMAL(10,2),
        PRIMARY KEY (id, event_time)
    ) PARTITION BY RANGE COLUMNS(event_time) (
        PARTITION p20231025 VALUES LESS THAN ('2023-10-26'),
        PARTITION p20231026 VALUES LESS THAN ('2023-10-27'),
        PARTITION p20231027 VALUES LESS THAN ('2023-10-28'),
        PARTITION p20231028 VALUES LESS THAN ('2023-10-29'),
        PARTITION p20231029 VALUES LESS THAN ('2023-10-30'),
        PARTITION p20231030 VALUES LESS THAN ('2023-10-31'),
        PARTITION p20231031 VALUES LESS THAN ('2023-11-01')  -- Pre-create partitions for the next 7 days
    );
    -- 2. Import data (omitted)
    -- 3. Pre-create partitions for the next 7 days
    ALTER TABLE business_data ADD PARTITION (
      PARTITION p20231101 VALUES LESS THAN ('2023-11-02'),
      PARTITION p20231102 VALUES LESS THAN ('2023-11-03'),
      PARTITION p20231103 VALUES LESS THAN ('2023-11-04'),
      PARTITION p20231104 VALUES LESS THAN ('2023-11-05'),
      PARTITION p20231105 VALUES LESS THAN ('2023-11-06'),
      PARTITION p20231106 VALUES LESS THAN ('2023-11-07'),
      PARTITION p20231107 VALUES LESS THAN ('2023-11-08')
    );
    
    -- 4. Regular data cleanup, for example, delete data that is no longer needed after 7 days
    ALTER TABLE business_data DROP PARTITION p20231025, p20231026, p20231027, p20231028, p20231029, p20231030, p20231031;
    

    Since data is continuously written, manually maintaining pre-created partitions and regularly cleaning up partitions can be quite cumbersome. To simplify this process, OceanBase Database provides the dynamic partition feature, which supports fixed-time partitioning, pre-creation of partitions for a specified duration, and retention of historical partitions for a specified period.

    For the above example, if we need to retain 30 days of data and pre-create partitions for the next 7 days each time, we can use the following syntax:

    -- 1. Create a partitioned table and set the dynamic partition policy
    CREATE TABLE t1 (
        id BIGINT NOT NULL AUTO_INCREMENT,
        event_time DATETIME NOT NULL,
        metric_value DECIMAL(10,2),
        PRIMARY KEY (id, event_time))
    DYNAMIC_PARTITION_POLICY
    (
      ENABLE = true,
      TIME_UNIT = 'day',
      PRECREATE_TIME = '7day',
      EXPIRE_TIME = '30day'
    )
    PARTITION BY RANGE COLUMNS (event_time)(
      PARTITION p20231025 VALUES LESS THAN ('2023-10-26'));
    

    In addition to the RANGE partitioning mode, businesses can also choose other basic partitioning methods based on their needs.

    For more information about dynamic partitioning, see Dynamic partitioning, Create a dynamic partition table in MySQL mode, and Create a dynamic partition table in Oracle mode.

    Data distribution

    Partitions can also serve as units for data distribution management. Generally, to disperse data, the HASH partitioning method is used, which has the following advantages:

    • It can effectively achieve uniform data distribution and supports precise partition pruning based on the partitioning key.
    • For multiple tables that need to be joined, if they are all HASH-partitioned using the same JOIN key and have the same number of partitions, and if you use the Table Group feature of OceanBase Database, you can bind partitions with the same hash value to the same group of nodes. This enables Partition-Wise Join during execution, avoiding data shuffling across nodes and significantly improving query performance.

    Hash partitioning also has limitations:

    1. Once the number of partitions for a HASH-partitioned table is set, changing it is a resource-intensive operation that involves rewriting all the data in the table. Therefore, the number of partitions for a HASH-partitioned table is usually set once and rarely changed, making it difficult to scale.
    2. For range queries on the partitioning key, partition pruning is not supported, and all partitions need to be accessed, which may lead to read amplification.

    To address the scalability issues of HASH partitioning and the limitations of range queries, OceanBase Database now supports automatic partition splitting for rowstore tables.

    Mixed data maintenance and data distribution management

    We can also use subpartitioning to meet both data maintenance and data distribution requirements. Typically, the primary partitioning is used for data maintenance, and the subpartitioning is used for data distribution. Each requirement can be addressed using the corresponding method.

    Typical manual partitioning management methods

    • Primary partitioning:
      • Type selection: Use Range or List partitioning to match frequent query conditions (such as time range or region)
      • Number of partitions: Set a reasonable range based on the distribution of query conditions and data maintenance needs (e.g., partition by month and retain data for 12 months, or partition by region into 4 List partitions)
    • Subpartitioning:
      • Type selection: Use Hash partitioning to ensure data is evenly distributed
      • Number of subpartitions:
        • If there is only one primary partition for writing, the number of subpartitions in the primary partition should meet the resource requirements for writing distribution
        • If there are multiple primary partitions for writing, the number of subpartitions should meet the resource requirements for writing distribution

    Here are two scenarios: Range + Hash and List + Hash:

    Range + Hash

    Choose Range partitioning for the primary partition and specify order_date to quickly filter out partitions that do not need to be scanned and perform data maintenance operations. Choose Hash partitioning for the subpartitioning to distribute writes or reads for the current month across 8 partitions, avoiding hotspots.

    CREATE TABLE orders (
        user_id BIGINT NOT NULL COMMENT 'User ID (subpartition key)',
        order_date DATE NOT NULL COMMENT 'Order date (primary partition key)',
        amount DECIMAL(10,2) NOT NULL COMMENT 'Order amount',
        status TINYINT NOT NULL COMMENT 'Status: 0-Canceled, 1-Pending, 2-Paid, 3-Shipped, 4-Completed',
        region_code CHAR(6) NOT NULL COMMENT 'Region code (first 2 digits for province code)',
        product_id INT NOT NULL COMMENT 'Product ID',
        payment_method VARCHAR(20) COMMENT 'Payment method',
        created_at TIMESTAMP(6) DEFAULT CURRENT_TIMESTAMP(6) COMMENT 'Record creation time')
    PARTITION BY RANGE COLUMNS(order_date)
    SUBPARTITION BY HASH(user_id) SUBPARTITIONS 8
    (
      PARTITION p202501 VALUES LESS THAN ('2025-02-01'),
      PARTITION p202502 VALUES LESS THAN ('2025-03-01'),
      ...
      PARTITION p202601 VALUES LESS THAN ('2026-02-01')
    );
    

    List + Hash

    Choose List partitioning for the primary partition and specify the province to prune to the corresponding partition. You can also perform data maintenance at the provincial level. Choose Hash or Key partitioning for the subpartitioning to distribute the read and write traffic of each province across multiple partitions for load balancing.

    -- Primary partitioning: LIST partitioning by province (31 provincial administrative regions)
    CREATE TABLE social_insurance_records (
        record_id BIGINT,
        province_code INT NOT NULL,  -- Province code (e.g., 11 for Beijing, 31 for Shanghai)
        payment_date DATE NOT NULL,
        user_id VARCHAR(32) NOT NULL,
        amount DECIMAL(10,2)
    ) PARTITION BY LIST (province_code)  -- Primary LIST partitioning
    SUBPARTITION BY KEY(user_id) SUBPARTITIONS 16  -- Subpartitioning using HASH partitioning
    (
      PARTITION p_beijing VALUES IN (11),
      PARTITION p_shanghai VALUES IN (31),
      PARTITION p_tianjin VALUES IN (12),
      ...
      PARTITION p_xizang VALUES IN (54)
    );
    

    Typical automatic partitioning management methods

    1. Primary partitioning: Choose dynamic partitioning and configure parameters such as fixed time partitioning, how long to pre-create partitions, and how long to retain historical partitions.
    2. Subpartitioning: Choose automatic Range partition splitting to automatically split partitions without configuring the number of partitions or partition rules.

    References

    • MySQL mode:

      • Overview of dynamic partitions

      • Create a dynamic partition table

      • View a dynamic partition table

      • Modify a dynamic partition table

    • Oracle mode:

      • Overview of dynamic partitions

      • Create a dynamic partition table

      • View a dynamic partition table

      • Modify a dynamic partition table

    Previous topic

    Best practices for creating special indexes
    Last

    Next topic

    Overview
    Next
    What is on this page
    Overview
    Role of partitions in OceanBase Database
    Partition pruning improves query efficiency
    Partitions as data maintenance units
    Partitions as data distribution units
    Partitioning in OceanBase Database
    Hash or key partitioning
    Partition by range/range columns
    Partition a list or list columns
    Flexible partition management
    Data maintenance
    Data distribution
    Mixed data maintenance and data distribution management
    Typical manual partitioning management methods
    Typical automatic partitioning management methods
    References