The partition exchange feature of OceanBase Database can be combined with direct load technology to provide an efficient solution for large-scale incremental data loading. You can use full direct load to quickly load data to a newly created non-partitioned table and then use partition exchange to atomically migrate the data to the target partitioned table, thereby improving the performance of incremental data loading for existing partitioned tables.
Why combine partition exchange with full bypass load
In big data scenarios, enterprises often face the need to quickly import large volumes of incremental data. Traditional data import methods have performance bottlenecks, while combining partition exchange with bypass load addresses both import efficiency and query performance.
Comparison of full bypass load and incremental bypass load
Incremental bypass load
- Data storage format: Incremental bypass load generates dump files, which are stored in rowstore format by default.
- Performance impact: In columnstore scenarios, rowstore dump files significantly degrade query performance.
- Applicable scenarios: Suitable for small batches of frequent data updates.
Full bypass load
- Data storage format: Full bypass load rewrites all data, generating columnstore files (if the table is configured as columnstore).
- Performance advantage: All data is baseline data, ensuring optimal query performance.
- Applicable scenarios: Suitable for one-time import of large volumes of data.
Benefits of combining both approaches
By combining full bypass load with partition exchange, the performance bottleneck of storing incremental data in dump format can be effectively addressed. The core advantages of this solution are:
Data format optimization: Converts incremental data into baseline data format, ensuring all data is stored in columnstore format and eliminating the negative impact of rowstore dump files on query performance.
Performance improvement:
- Query performance: All data is unified to baseline format, achieving optimal query performance.
- Import performance: Utilizes the efficient mechanism of bypass load to significantly enhance data loading speed.
Architecture consistency: Through partition exchange, incremental data and existing data are unified in format, avoiding performance losses from mixed storage formats.
This technical solution maintains the flexibility of incremental import while leveraging the performance advantages of full import, providing optimal performance for large-scale data processing scenarios.
Scenario analysis
Empty table initialization
Scenario: The target table is empty and requires initial data import.
- Recommended solution: Direct full bypass load
- Advantages: All data is columnstore baseline, ensuring optimal query performance
Incremental import to an existing table
Scenario: The target table already contains historical data and requires new incremental data.
Traditional incremental bypass load
- Advantages: Fast import speed with no impact on existing data.
- Disadvantages: New data is stored as rowstore dump files, resulting in poor query performance.
Traditional full bypass load
- Advantages: All data is rewritten to columnstore, ensuring optimal query performance.
- Disadvantages: Requires rewriting all historical data, resulting in long import times and high resource consumption.
Full bypass load + Partition exchange (Recommended solution)
Advantages:
- New data is generated as columnstore baseline through full bypass load, ensuring optimal query performance
- Only incremental data needs to be imported, without affecting historical data
- Partition exchange operations are atomic, ensuring zero business interruption
- Both import and query performance are optimized
Simple performance comparison example
Assume a customer has a 1TB orders table with 100GB of new data added daily:
Traditional incremental bypass load
- Historical data: 1TB (columnstore, fast query)
- New data: 100GB (rowstore dump, slow query)
- Query performance: Mixed storage format results in overall performance degradation
Traditional full bypass load
- Rewritten data: 1.1TB (all columnstore, fast query)
- Import time: Needs to rewrite 1TB of historical data + 100GB of new data
- Resource consumption: High
Full bypass load + Partition exchange
- Historical data: 1TB (columnstore, unchanged)
- New data: 100GB (columnstore baseline, fast query)
- Import time: Only needs to import 100GB of new data
- Query performance: All columnstore, optimal performance
Basic syntax of partition exchange
Basic syntax of partition exchange
ALTER TABLE target_partition_table_name
EXCHANGE PARTITION first_level_partition_name
WITH TABLE origin_partition_table_name
WITHOUT VALIDATION;
Syntax of partition exchange with indexes
ALTER TABLE target_partition_table_name
EXCHANGE PARTITION first_level_partition_name
WITH TABLE origin_partition_table_name
INCLUDING INDEXES
WITHOUT VALIDATION;
Limitations
- The
target_partition_table_namemust be a subpartitioned table, and the partition to be exchanged must be a partition of this table. - The
origin_partition_table_namemust be a partitioned table, and the partition type of its partitions must be consistent with the partition type of the subpartitions under the partition to be exchanged. - The partition type of the partitions in the target table must be
RANGE (COLUMNS)orLIST (COLUMNS). - After the primary and subpartitions are exchanged, the statistics of the related tables become invalid and must be collected again.
Example of using direct load and partition exchange (based on a partitioned table)
- Fast data loading: Direct load bypasses SQL and Memtable, enabling massive data loading in seconds.
- Zero business interruption: Partition exchange operations are completed atomically in the background, with no lock table impact on business queries.
- Partition-level data management: Partition exchange allows precise control over data migration scope, supporting flexible strategies based on time or range.
The specific steps are as follows:
- Create a new empty partition P in the partitioned table A.
- Create a non-partitioned table B with the same structure as table A, and use full direct load to import data into table B.
- Use the partition exchange feature to swap the new partition P in table A with table B.
Step 1: Create the target partitioned table (orders table)
Assume the orders table is partitioned by the order_date field and has a reserved empty partition p_new for data exchange:
-- Create partitioned table A (example: orders table)
CREATE TABLE orders (
order_id BIGINT,
order_date TIMESTAMP NOT NULL,
customer_id INT,
amount DECIMAL(10,2),
product_id INT
)
PARTITION BY RANGE COLUMNS(order_date) (
PARTITION p_old1 VALUES LESS THAN ('2023-01-01'),
PARTITION p_old2 VALUES LESS THAN ('2023-04-01'),
-- Add an empty partition p_new for subsequent data exchange
PARTITION p_new VALUES LESS THAN ('2023-07-01')
);
Step 2: Prepare incremental data and create a non-partitioned table
Create or download the incremental data file.
Assume the incremental data file is
orders_increment.csvand contains the following fields:order_id,order_date,customer_id,amount,product_id 1001,"2023-05-01",101,199.99,1001 1002,"2023-06-15",102,299.99,1002 1003,"2023-05-10",103,350.00,1003 1004,"2023-05-20",104,49.99,1004 1005,"2023-06-01",105,450.50,1005 1006,"2023-05-25",106,200.00,1006 1007,"2023-06-10",107,150.75,1007 1008,"2023-05-15",108,300.00,1008 1009,"2023-06-20",109,499.99,1009 1010,"2023-05-30",101,99.99,1010Create a non-partitioned table
orders_bypass.Create a non-partitioned table
orders_bypasswith the same structure as the target partitioned table (orders table):CREATE TABLE orders_bypass ( order_id BIGINT, order_date TIMESTAMP NOT NULL, customer_id INT, amount DECIMAL(10,2), product_id INT ); -- Create a temporary table CREATE TABLE temp_staging_table LIKE orders_bypass;
Step 3: Direct load data into the non-partitioned table orders_bypass
Use full direct load to quickly import data into the non-partitioned table orders_bypass:
-- Example only. In practice, use tools like OceanBase's ObLoader.
-- Use LOAD DATA or INSERT /*+ APPEND */ to import data
LOAD DATA /*+ APPEND */ INFILE '/tmp/orders_increment.csv'
INTO TABLE temp_staging_table
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;
-- Or use INSERT for batch import (with APPEND hint)
INSERT /*+ APPEND */ INTO orders_bypass
SELECT * FROM temp_staging_table; -- The temporary table must have the same structure as orders_bypass
Step 4: Perform partition exchange (Exchange Partition)
Exchange the data in the non-partitioned table orders_bypass with the empty partition p_new in the partitioned table orders:
-- Perform partition exchange
ALTER TABLE orders
EXCHANGE PARTITION p_new
WITH TABLE orders_bypass
WITHOUT VALIDATION; -- Optional: If table B contains indexes, synchronize the exchange
Step 5: Verify the data exchange results
-- Verify the data in partition p_new
SELECT COUNT(*) FROM orders PARTITION (p_new);
-- Verify if table orders_bypass is empty (data will be cleared after exchange)
SELECT COUNT(*) FROM orders_bypass;
Example return results:
MySQL [test_db]> SELECT COUNT(*) FROM orders PARTITION (p_new);
+----------+
| COUNT(*) |
+----------+
| 10 |
+----------+
1 row in set (0.04 sec)
MySQL [test_db]> SELECT COUNT(*) FROM orders_bypass;
+----------+
| COUNT(*) |
+----------+
| 0 |
+----------+
1 row in set (0.01 sec)
Step 6: Clean up data and perform subsequent operations
-- Optional: Drop the empty table orders_bypass
DROP TABLE orders_bypass;
-- Optional: Rename partition p_new to a more descriptive name
ALTER TABLE orders RENAME PARTITION p_new TO p_2023q2;
Example of bypass load and partition swap (with subpartitions)
- Fast data loading: Bypass load technology directly bypasses SQL and Memtable, enabling the loading of massive amounts of data within seconds.
- Zero business interruption: Partition swap operations are completed in the background in an atomic manner, without locking tables and thus without affecting business queries.
- Partition-level data management: Partition swap allows precise control over the scope of data migration, supporting flexible strategies based on time or range partitioning.
- Incremental data import for subpartitioned tables: This is suitable for composite partitioned tables where the primary partition is range/list and the subpartition is hash.
The specific steps are as follows:
- Create a new empty primary partition P in the subpartitioned table A (primary range/list + subpartition hash).
- Create a new primary hash-partitioned table B, whose subpartition types must be completely consistent with those of table A. Then, perform a full bypass load to import the data into table B.
- Use the partition swap feature to swap the new primary partition P in table A with table B.
Step 1: Create the target subpartitioned table (orders)
Assume that the orders table orders is partitioned by the order_date field at the primary partition level and subpartitioned by the order_id field using hash partitioning, with an empty partition p_new reserved for data exchange:
-- Create subpartitioned table A (example: orders table)
CREATE TABLE orders (
order_id BIGINT,
order_date TIMESTAMP NOT NULL,
customer_id INT,
amount DECIMAL(10,2),
product_id INT,
PRIMARY KEY (order_id, order_date)
)
PARTITION BY RANGE COLUMNS(order_date)
SUBPARTITION BY HASH(order_id) SUBPARTITIONS 5
(
PARTITION p_old1 VALUES LESS THAN ('2023-01-01'),
PARTITION p_old2 VALUES LESS THAN ('2023-04-01'),
-- Add an empty partition p_new for subsequent data exchange
PARTITION p_new VALUES LESS THAN ('2023-07-01')
);
Step 2: Prepare incremental data and create a primary hash partitioned table
Create or download the incremental data file.
Assume that the incremental data file is named
orders_increment.csvand contains the following fields:order_id,order_date,customer_id,amount,product_id 1001,"2023-05-01",101,199.99,1001 1002,"2023-06-15",102,299.99,1002 1003,"2023-05-10",103,350.00,1003 1004,"2023-05-20",104,49.99,1004 1005,"2023-06-01",105,450.50,1005 1006,"2023-05-25",106,200.00,1006 1007,"2023-06-10",107,150.75,1007 1008,"2023-05-15",108,300.00,1008 1009,"2023-06-20",109,499.99,1009 1010,"2023-05-30",101,99.99,1010Create a primary hash partitioned table
orders_bypass.Create a primary hash partitioned table
orders_bypasswith the same structure as the target subpartitioned table (orders tableorders). The partition type must be hash partitioning, which is the same as the subpartition type of the target table.-- Create primary hash partitioned table B, with the same number of partitions as the subpartitions of the target table (5) CREATE TABLE orders_bypass ( order_id BIGINT, order_date TIMESTAMP NOT NULL, customer_id INT, amount DECIMAL(10,2), product_id INT, PRIMARY KEY (order_id, order_date) ) PARTITION BY HASH(order_id) PARTITIONS 5;
Note
The partition type of the primary hash partitioned table orders_bypass must be the same as the subpartition type of the target subpartitioned table orders. In other words, if the subpartition type of the target table is SUBPARTITION BY HASH(order_id), the partition type of the source table must be PARTITION BY HASH(order_id).
Step 3: Import data to the primary hash partitioned table orders_bypass
Use full bypass import to quickly import data to the primary hash partitioned table orders_bypass:
-- Create a temporary table for data loading
CREATE TABLE temp_staging_table LIKE orders_bypass;
-- Use LOAD DATA for bypass import
LOAD DATA /*+ APPEND */ INFILE '/tmp/orders_increment.csv'
INTO TABLE temp_staging_table
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;
-- Use INSERT for batch import (with APPEND hint for bypass import)
INSERT /*+ APPEND */ INTO orders_bypass
SELECT * FROM temp_staging_table;
-- Clean up the temporary table
DROP TABLE temp_staging_table;
Note
In practice, you can also use tools such as OceanBase's ObLoader for bypass import to achieve more efficient data loading.
Step 4: Exchange partitions
Exchange the data in the primary hash partitioned table orders_bypass with the empty primary partition p_new of the target subpartitioned table (orders table orders):
-- Execute the partition exchange
ALTER TABLE orders
EXCHANGE PARTITION p_new
WITH TABLE orders_bypass
WITHOUT VALIDATION;
Note
After the partition exchange is successful, the data in the orders_bypass table will be cleared, and the original data will be exchanged to the target partition p_new. If the source table contains indexes, you can use the INCLUDING INDEXES option to synchronize the exchange of indexes.
Step 5: Verify the data exchange result
Verify whether the partition exchange is successful:
-- Verify the data in partition p_new
SELECT COUNT(*) FROM orders PARTITION (p_new);
-- Verify whether the orders_bypass table is empty (the data in the orders_bypass table will be cleared after the exchange)
SELECT COUNT(*) FROM orders_bypass;
Example of the verification result:
MySQL [test_db]> SELECT COUNT(*) FROM orders PARTITION (p_new);
+----------+
| COUNT(*) |
+----------+
| 10 |
+----------+
1 row in set
MySQL [test_db]> SELECT COUNT(*) FROM orders_bypass;
+----------+
| COUNT(*) |
+----------+
| 0 |
+----------+
1 row in set
Step 6: Clean up data and perform subsequent operations
After the data exchange is completed, you can perform the following cleanup and subsequent operations:
-- Optional: Drop the empty table orders_bypass (if no longer needed)
DROP TABLE orders_bypass;
-- Optional: Rename the partition p_new to a more descriptive name
ALTER TABLE orders RENAME PARTITION p_new TO p_2023q2;
-- Recollect statistics (statistics will become invalid after the exchange)
ANALYZE TABLE orders;
Unsupported scenarios
The following scenarios are not supported for partition exchange:
- Exchange between primary partitioned tables: Not supported.
- Exchange between subpartitioned tables: Not supported.
- Exchange where the primary partition of the subpartitioned table is a hash partition: Not supported. The primary partition of the target table must be of the range or list type.
