The partition exchange feature in OceanBase Database, when combined with direct load technology, provides an efficient solution for large-scale incremental data loading. By using full direct load, data can be quickly loaded into a newly created non-partitioned temporary table. Then, through partition exchange, the data is atomically migrated to the target partitioned table, significantly improving the performance of incremental data import for existing partitioned tables.
Why combine partition exchange with full load direct load
In big data scenarios, enterprises often face the need to quickly import large volumes of incremental data. Traditional data import methods have performance bottlenecks, while combining partition exchange with direct load offers a solution that balances import efficiency and query performance.
Comparison of full load direct load and incremental direct load
Characteristics and limitations of incremental direct load
- Data storage format: Incremental direct load generates dump files, which are stored in rowstore format by default.
- Performance impact: In columnar storage scenarios, rowstore dump files significantly degrade query performance.
- Applicable scenarios: Suitable for small-batch, frequent data updates.
Characteristics and advantages of full load direct load
- Data storage format: Full load direct load rewrites all data, generating columnar storage files (if the table is configured for columnar storage).
- Performance advantage: All data is baseline data, achieving optimal query performance.
- Applicable scenarios: Suitable for one-time bulk data import.
Advantages of full load direct load and partition exchange
By combining full load direct load with partition exchange technology, we can effectively address the performance bottleneck caused by storing incremental data in dump format. The core advantage of this approach lies in:
Data format optimization: Convert incremental data to baseline data format, ensuring all data is stored in columnar format and eliminating the negative impact of rowstore dump files on query performance.
Performance improvement:
- Query performance: All data is unified to baseline format, achieving optimal query performance.
- Import performance: Leverage the efficient mechanism of direct load to significantly enhance data loading speed.
Architecture uniformity: Through partition exchange, unify the format of incremental and existing data, avoiding performance degradation from mixed storage formats.
This technical approach maintains the flexibility of incremental import while gaining the performance advantages of full load import, delivering optimal performance for large-scale data processing scenarios.
Scenario analysis
Empty table initialization
Scenario: The target table is empty, and initial data needs to be imported.
- Recommended solution: Use full load direct load directly
- Advantage: All data is columnar baseline data, achieving optimal query performance
Incremental import of existing data
Scenario: The target table already contains historical data, and new incremental data needs to be imported.
Traditional incremental direct load
- Advantage: Fast import speed with no impact on existing data.
- Disadvantage: Newly imported data is stored in rowstore dump format, resulting in poor query performance.
Traditional full load direct load
- Advantage: All data is rewritten to columnar format, achieving optimal query performance.
- Disadvantage: Requires rewriting all historical data, resulting in long import times and high resource consumption.
Full load direct load + Partition Exchange (Recommended Solution)
Advantages:
- New data is imported as columnar baseline data through full load direct load, achieving optimal query performance
- Only incremental data needs to be imported, without affecting historical data
- Partition exchange operations are atomic, ensuring zero business interruption
- Both import and query performance are optimized
Simple performance comparison example
Assume a customer has a 1TB orders table with 100GB of new data added daily:
Traditional incremental direct load
- Historical data: 1TB (columnar, fast queries)
- New data: 100GB (rowstore dump, slow queries)
- Query performance: Mixed storage format results in overall performance degradation
Traditional full load direct load
- Rewritten data: 1.1TB (all columnar, fast queries)
- Import time: Needs to rewrite 1TB of historical data + 100GB of new data
- Resource consumption: High
Full load direct load + Partition Exchange
- Historical data: 1TB (columnar, unchanged)
- New data: 100GB (columnar baseline, fast queries)
- Import time: Only needs to import 100GB of new data
- Query performance: All data is columnar, achieving optimal performance
Basic syntax of partition exchange
Basic syntax of partition exchange
ALTER TABLE target_partition_table_name
EXCHANGE PARTITION first_level_partition_name
WITH TABLE origin_partition_table_name
WITHOUT VALIDATION;
Syntax of partition exchange with indexes
ALTER TABLE target_partition_table_name
EXCHANGE PARTITION first_level_partition_name
WITH TABLE origin_partition_table_name
INCLUDING INDEXES
WITHOUT VALIDATION;
Limitations
- The
target_partition_table_namemust be a subpartitioned table, and the partition to be exchanged must be the name of a partition of this table. - The
origin_partition_table_namemust be a partitioned table, and the partition types of the partitions under the primary partitions to be exchanged must be consistent. - The primary partition type of the target table must be
RANGE (COLUMNS)orLIST (COLUMNS). - After a primary partition and a subpartition are exchanged, the statistics of the related tables become invalid and need to be collected again.
Example of using direct load and partition exchange (based on a partitioned table)
- Fast data loading: Direct load bypasses SQL and Memtable, enabling rapid loading of large volumes of data.
- Zero business interruption: Partition exchange is performed atomically in the background, with no impact on business queries.
- Partition-level data management: Partition exchange allows precise control over the scope of data migration, supporting flexible strategies based on time or range.
The specific steps are as follows:
- Create an empty new partition P in partitioned table A.
- Create a non-partitioned table B with the same structure as table A, and import data into table B using full direct load.
- Use partition exchange to swap the new partition P in table A with table B.
Step 1: Create the target partitioned table (orders table)
Assume that the orders table is partitioned by the order_date field and has an empty partition p_new reserved for data exchange:
-- Create partitioned table A (example: orders table)
CREATE TABLE orders (
order_id BIGINT,
order_date TIMESTAMP NOT NULL,
customer_id INT,
amount DECIMAL(10,2),
product_id INT
)
PARTITION BY RANGE COLUMNS(order_date) (
PARTITION p_old1 VALUES LESS THAN ('2023-01-01'),
PARTITION p_old2 VALUES LESS THAN ('2023-04-01'),
-- Add an empty partition p_new for subsequent data exchange
PARTITION p_new VALUES LESS THAN ('2023-07-01')
);
Step 2: Prepare incremental data and create a non-partitioned table
Create or download the incremental data file.
Assume the incremental data file is
orders_increment.csv, containing the following fields:order_id,order_date,customer_id,amount,product_id 1001,"2023-05-01",101,199.99,1001 1002,"2023-06-15",102,299.99,1002 1003,"2023-05-10",103,350.00,1003 1004,"2023-05-20",104,49.99,1004 1005,"2023-06-01",105,450.50,1005 1006,"2023-05-25",106,200.00,1006 1007,"2023-06-10",107,150.75,1007 1008,"2023-05-15",108,300.00,1008 1009,"2023-06-20",109,499.99,1009 1010,"2023-05-30",101,99.99,1010Create a non-partitioned table
orders_bypass.Create a non-partitioned table
orders_bypasswith the same structure as the target partitioned table (orders table):CREATE TABLE orders_bypass ( order_id BIGINT, order_date TIMESTAMP NOT NULL, customer_id INT, amount DECIMAL(10,2), product_id INT ); -- Create a temporary table CREATE TABLE temp_staging_table LIKE orders_bypass;
Step 3: Import data to the non-partitioned table orders_bypass
Import data to the non-partitioned table orders_bypass using full direct load:
-- Example only. In practice, use tools like OceanBase's ObLoader.
-- Use LOAD DATA or INSERT /*+ APPEND */ to import data
LOAD DATA /*+ APPEND */ INFILE '/tmp/orders_increment.csv'
INTO TABLE temp_staging_table
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;
-- Or use INSERT to batch import (requires the APPEND hint)
INSERT /*+ APPEND */ INTO orders_bypass
SELECT * FROM temp_staging_table; -- The temporary table must have the same structure as orders_bypass
Step 4: Execute partition exchange (Exchange Partition)
Exchange the data in the non-partitioned table orders_bypass with the empty partition p_new in the partitioned target table (orders table orders):
-- Execute partition exchange
ALTER TABLE orders
EXCHANGE PARTITION p_new
WITH TABLE orders_bypass
WITHOUT VALIDATION; -- Optional: if table B contains indexes, synchronize the exchange
Step 5: Verify the data exchange results
-- Verify the data in partition p_new
SELECT COUNT(*) FROM orders PARTITION (p_new);
-- Verify if table orders_bypass is empty (the data in table orders_bypass will be cleared after the exchange)
SELECT COUNT(*) FROM orders_bypass;
Example return results:
MySQL [test_db]> SELECT COUNT(*) FROM orders PARTITION (p_new);
+----------+
| COUNT(*) |
+----------+
| 10 |
+----------+
1 row in set (0.04 sec)
MySQL [test_db]> SELECT COUNT(*) FROM orders_bypass;
+----------+
| COUNT(*) |
+----------+
| 0 |
+----------+
1 row in set (0.01 sec)
Step 6: Clean up data and perform subsequent operations
-- Optional: drop the empty table orders_bypass
DROP TABLE orders_bypass;
-- Optional: rename partition p_new to a more descriptive name
ALTER TABLE orders RENAME PARTITION p_new TO p_2023q2;
Example of using bypass load and partition exchange (based on subpartitions)
- Fast data loading: Bypass load technology directly bypasses SQL and Memtable, enabling the loading of large amounts of data in seconds.
- Zero business interruption: Partition exchange operations are completed atomically in the background, with no lock table impact on business queries.
- Partition-level data management: Partition exchange allows precise control over the data migration range, supporting flexible strategies based on time or range.
- Incremental data import for subpartitioned tables: This is suitable for composite partitioned tables where the primary partitions are range or list and the subpartitions are hash.
The specific steps are as follows:
- Create a new empty primary partition P in the subpartitioned table A (primary partitions are range or list, subpartitions are hash).
- Create a new hash-partitioned table B, with subpartition types that must be exactly the same as those in table A. Use bypass load to import data into table B.
- Use the partition exchange feature to swap the new primary partition P in table A with table B.
Step 1: Create the target subpartitioned table (orders)
Assume that the orders table orders is partitioned by the order_date field at the primary partition level and subpartitioned by the order_id field at the subpartition level. A new empty partition p_new is reserved for data exchange.
-- Create subpartitioned table A (example: orders table)
CREATE TABLE orders (
order_id BIGINT,
order_date TIMESTAMP NOT NULL,
customer_id INT,
amount DECIMAL(10,2),
product_id INT,
PRIMARY KEY (order_id, order_date)
)
PARTITION BY RANGE COLUMNS(order_date)
SUBPARTITION BY HASH(order_id) SUBPARTITIONS 5
(
PARTITION p_old1 VALUES LESS THAN ('2023-01-01'),
PARTITION p_old2 VALUES LESS THAN ('2023-04-01'),
-- Add an empty partition p_new for subsequent data exchange
PARTITION p_new VALUES LESS THAN ('2023-07-01')
);
Step 2: Prepare incremental data and create a primary hash-partitioned table
Create or download the incremental data file.
Assume that the incremental data file is named
orders_increment.csvand contains the following fields:order_id,order_date,customer_id,amount,product_id 1001,"2023-05-01",101,199.99,1001 1002,"2023-06-15",102,299.99,1002 1003,"2023-05-10",103,350.00,1003 1004,"2023-05-20",104,49.99,1004 1005,"2023-06-01",105,450.50,1005 1006,"2023-05-25",106,200.00,1006 1007,"2023-06-10",107,150.75,1007 1008,"2023-05-15",108,300.00,1008 1009,"2023-06-20",109,499.99,1009 1010,"2023-05-30",101,99.99,1010Create a primary hash-partitioned table
orders_bypass.Create a primary hash-partitioned table
orders_bypasswith the same structure as the target subpartitioned table (orders tableorders). The partition type must be consistent with the subpartition type of the target table (i.e., hash partitioning).-- Create primary hash-partitioned table B with the same number of partitions as the subpartitions of the target table (5 partitions) CREATE TABLE orders_bypass ( order_id BIGINT, order_date TIMESTAMP NOT NULL, customer_id INT, amount DECIMAL(10,2), product_id INT, PRIMARY KEY (order_id, order_date) ) PARTITION BY HASH(order_id) PARTITIONS 5;
Note
The partition type of the primary hash-partitioned table orders_bypass must be consistent with the subpartition type of the target subpartitioned table orders. In other words, if the subpartitions of the target table are SUBPARTITION BY HASH(order_id), the partitions of the source table must be PARTITION BY HASH(order_id).
Step 3: Import data to the primary hash-partitioned table orders_bypass
Use full bypass loading to quickly import data to the primary hash-partitioned table orders_bypass:
-- Create a temporary table for data loading
CREATE TABLE temp_staging_table LIKE orders_bypass;
-- Use LOAD DATA for bypass loading
LOAD DATA /*+ APPEND */ INFILE '/tmp/orders_increment.csv'
INTO TABLE temp_staging_table
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;
-- Use INSERT for batch import (with APPEND hint for bypass loading)
INSERT /*+ APPEND */ INTO orders_bypass
SELECT * FROM temp_staging_table;
-- Clean up the temporary table
DROP TABLE temp_staging_table;
Note
In practice, you can also use tools such as ObLoader of OceanBase Database for bypass loading to achieve more efficient data loading.
Step 4: Exchange partitions
Exchange the data of the primary hash-partitioned table orders_bypass with the empty primary partition p_new of the target subpartitioned table (orders table orders):
-- Exchange partitions
ALTER TABLE orders
EXCHANGE PARTITION p_new
WITH TABLE orders_bypass
WITHOUT VALIDATION;
Note
After the partition exchange is successful, the data in the orders_bypass table will be cleared, and the original data will be exchanged to the target partition p_new. If the source table contains indexes, you can use the INCLUDING INDEXES option to synchronize the exchange of indexes.
Step 5: Verify the data exchange result
Verify whether the partition exchange is successful:
-- Verify the data in partition p_new
SELECT COUNT(*) FROM orders PARTITION (p_new);
-- Verify whether the table orders_bypass is empty (the data in the table orders_bypass will be cleared after the exchange)
SELECT COUNT(*) FROM orders_bypass;
Example verification result:
MySQL [test_db]> SELECT COUNT(*) FROM orders PARTITION (p_new);
+----------+
| COUNT(*) |
+----------+
| 10 |
+----------+
1 row in set
MySQL [test_db]> SELECT COUNT(*) FROM orders_bypass;
+----------+
| COUNT(*) |
+----------+
| 0 |
+----------+
1 row in set
Step 6: Clean up data and perform subsequent operations
After the data exchange is completed, you can perform the following cleanup and subsequent operations:
-- Optional: Drop the empty table orders_bypass if it is no longer needed
DROP TABLE orders_bypass;
-- Optional: Rename the partition p_new to a more descriptive name
ALTER TABLE orders RENAME PARTITION p_new TO p_2023q2;
-- Recollect statistics (statistics will be invalidated after the exchange)
ANALYZE TABLE orders;
Unsupported scenarios
The following scenarios are not supported for partition exchange:
- Exchange between primary partitioned tables: Not supported.
- Exchange between subpartitioned tables: Not supported.
- Exchange where the primary partition of the subpartitioned table is a hash partition: Not supported. The primary partition of the target table must be of the range or list type.