Efficient data loading with full direct load and partition exchange|V4.4.2|OceanBase Database| docs|Distributed Database

The partition exchange feature in OceanBase Database, when combined with direct load technology, provides an efficient solution for large-scale incremental data loading. By using full direct load, data can be quickly loaded into a newly created non-partitioned temporary table. Then, through partition exchange, the data is atomically migrated to the target partitioned table, significantly improving the performance of incremental data import for existing partitioned tables.

Why combine partition exchange with full load direct load

In big data scenarios, enterprises often face the need to quickly import large volumes of incremental data. Traditional data import methods have performance bottlenecks, while combining partition exchange with direct load offers a solution that balances import efficiency and query performance.

Comparison of full load direct load and incremental direct load

Characteristics and limitations of incremental direct load

Data storage format: Incremental direct load generates dump files, which are stored in rowstore format by default.
Performance impact: In columnar storage scenarios, rowstore dump files significantly degrade query performance.
Applicable scenarios: Suitable for small-batch, frequent data updates.

Characteristics and advantages of full load direct load

Data storage format: Full load direct load rewrites all data, generating columnar storage files (if the table is configured for columnar storage).
Performance advantage: All data is baseline data, achieving optimal query performance.
Applicable scenarios: Suitable for one-time bulk data import.

Advantages of full load direct load and partition exchange

By combining full load direct load with partition exchange technology, we can effectively address the performance bottleneck caused by storing incremental data in dump format. The core advantage of this approach lies in:

Data format optimization: Convert incremental data to baseline data format, ensuring all data is stored in columnar format and eliminating the negative impact of rowstore dump files on query performance.
Performance improvement:
- Query performance: All data is unified to baseline format, achieving optimal query performance.
- Import performance: Leverage the efficient mechanism of direct load to significantly enhance data loading speed.
Architecture uniformity: Through partition exchange, unify the format of incremental and existing data, avoiding performance degradation from mixed storage formats.

This technical approach maintains the flexibility of incremental import while gaining the performance advantages of full load import, delivering optimal performance for large-scale data processing scenarios.

Scenario analysis

Empty table initialization

Scenario: The target table is empty, and initial data needs to be imported.

Recommended solution: Use full load direct load directly
Advantage: All data is columnar baseline data, achieving optimal query performance

Incremental import of existing data

Scenario: The target table already contains historical data, and new incremental data needs to be imported.

Traditional incremental direct load

Advantage: Fast import speed with no impact on existing data.
Disadvantage: Newly imported data is stored in rowstore dump format, resulting in poor query performance.

Traditional full load direct load

Advantage: All data is rewritten to columnar format, achieving optimal query performance.
Disadvantage: Requires rewriting all historical data, resulting in long import times and high resource consumption.

Full load direct load + Partition Exchange (Recommended Solution)

Advantages:
- New data is imported as columnar baseline data through full load direct load, achieving optimal query performance
- Only incremental data needs to be imported, without affecting historical data
- Partition exchange operations are atomic, ensuring zero business interruption
- Both import and query performance are optimized

Simple performance comparison example

Assume a customer has a 1TB orders table with 100GB of new data added daily:

Traditional incremental direct load

Historical data: 1TB (columnar, fast queries)
New data: 100GB (rowstore dump, slow queries)
Query performance: Mixed storage format results in overall performance degradation

Traditional full load direct load

Rewritten data: 1.1TB (all columnar, fast queries)
Import time: Needs to rewrite 1TB of historical data + 100GB of new data
Resource consumption: High

Full load direct load + Partition Exchange

Historical data: 1TB (columnar, unchanged)
New data: 100GB (columnar baseline, fast queries)
Import time: Only needs to import 100GB of new data
Query performance: All data is columnar, achieving optimal performance

Basic syntax of partition exchange

ALTER TABLE target_partition_table_name
    EXCHANGE PARTITION first_level_partition_name
    WITH TABLE origin_partition_table_name
    WITHOUT VALIDATION;

Syntax of partition exchange with indexes

ALTER TABLE target_partition_table_name
    EXCHANGE PARTITION first_level_partition_name
    WITH TABLE origin_partition_table_name
    INCLUDING INDEXES
    WITHOUT VALIDATION;

Limitations

The target_partition_table_name must be a subpartitioned table, and the partition to be exchanged must be the name of a partition of this table.
The origin_partition_table_name must be a partitioned table, and the partition types of the partitions under the primary partitions to be exchanged must be consistent.
The primary partition type of the target table must be RANGE (COLUMNS) or LIST (COLUMNS).
After a primary partition and a subpartition are exchanged, the statistics of the related tables become invalid and need to be collected again.

Example of using direct load and partition exchange (based on a partitioned table)

Fast data loading: Direct load bypasses SQL and Memtable, enabling rapid loading of large volumes of data.
Zero business interruption: Partition exchange is performed atomically in the background, with no impact on business queries.
Partition-level data management: Partition exchange allows precise control over the scope of data migration, supporting flexible strategies based on time or range.

The specific steps are as follows:

Create an empty new partition P in partitioned table A.
Create a non-partitioned table B with the same structure as table A, and import data into table B using full direct load.
Use partition exchange to swap the new partition P in table A with table B.

Step 1: Create the target partitioned table (orders table)

Assume that the orders table is partitioned by the order_date field and has an empty partition p_new reserved for data exchange:

-- Create partitioned table A (example: orders table)
CREATE TABLE orders (
  order_id BIGINT,
  order_date TIMESTAMP NOT NULL,
  customer_id INT,
  amount DECIMAL(10,2),
  product_id INT
)
PARTITION BY RANGE COLUMNS(order_date) (
  PARTITION p_old1 VALUES LESS THAN ('2023-01-01'),
  PARTITION p_old2 VALUES LESS THAN ('2023-04-01'),
  -- Add an empty partition p_new for subsequent data exchange
  PARTITION p_new VALUES LESS THAN ('2023-07-01')
);

Step 2: Prepare incremental data and create a non-partitioned table

Create or download the incremental data file.

Assume the incremental data file is orders_increment.csv, containing the following fields:

order_id,order_date,customer_id,amount,product_id
1001,"2023-05-01",101,199.99,1001
1002,"2023-06-15",102,299.99,1002
1003,"2023-05-10",103,350.00,1003
1004,"2023-05-20",104,49.99,1004
1005,"2023-06-01",105,450.50,1005
1006,"2023-05-25",106,200.00,1006
1007,"2023-06-10",107,150.75,1007
1008,"2023-05-15",108,300.00,1008
1009,"2023-06-20",109,499.99,1009
1010,"2023-05-30",101,99.99,1010

Create a non-partitioned table orders_bypass.

Create a non-partitioned table orders_bypass with the same structure as the target partitioned table (orders table):

CREATE TABLE orders_bypass (
  order_id BIGINT,
  order_date TIMESTAMP NOT NULL,
  customer_id INT,
  amount DECIMAL(10,2),
  product_id INT
);

-- Create a temporary table
CREATE TABLE temp_staging_table LIKE orders_bypass;

Step 3: Import data to the non-partitioned table orders_bypass

Import data to the non-partitioned table orders_bypass using full direct load:

-- Example only. In practice, use tools like OceanBase's ObLoader.
-- Use LOAD DATA or INSERT /*+ APPEND */ to import data
LOAD DATA /*+ APPEND */ INFILE '/tmp/orders_increment.csv'
INTO TABLE temp_staging_table
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;

-- Or use INSERT to batch import (requires the APPEND hint)
INSERT /*+ APPEND */ INTO orders_bypass
SELECT * FROM temp_staging_table;  -- The temporary table must have the same structure as orders_bypass

Step 4: Execute partition exchange (Exchange Partition)

Exchange the data in the non-partitioned table orders_bypass with the empty partition p_new in the partitioned target table (orders table orders):

-- Execute partition exchange
ALTER TABLE orders
EXCHANGE PARTITION p_new
WITH TABLE orders_bypass
WITHOUT VALIDATION;  -- Optional: if table B contains indexes, synchronize the exchange

Step 5: Verify the data exchange results

-- Verify the data in partition p_new
SELECT COUNT(*) FROM orders PARTITION (p_new);

-- Verify if table orders_bypass is empty (the data in table orders_bypass will be cleared after the exchange)
SELECT COUNT(*) FROM orders_bypass;

Example return results:

MySQL [test_db]> SELECT COUNT(*) FROM orders PARTITION (p_new);
+----------+
| COUNT(*) |
+----------+
|       10 |
+----------+
1 row in set (0.04 sec)

MySQL [test_db]> SELECT COUNT(*) FROM orders_bypass;
+----------+
| COUNT(*) |
+----------+
|        0 |
+----------+
1 row in set (0.01 sec)

Step 6: Clean up data and perform subsequent operations

-- Optional: drop the empty table orders_bypass
DROP TABLE orders_bypass;

-- Optional: rename partition p_new to a more descriptive name
ALTER TABLE orders RENAME PARTITION p_new TO p_2023q2;

Example of using bypass load and partition exchange (based on subpartitions)

Fast data loading: Bypass load technology directly bypasses SQL and Memtable, enabling the loading of large amounts of data in seconds.
Zero business interruption: Partition exchange operations are completed atomically in the background, with no lock table impact on business queries.
Partition-level data management: Partition exchange allows precise control over the data migration range, supporting flexible strategies based on time or range.
Incremental data import for subpartitioned tables: This is suitable for composite partitioned tables where the primary partitions are range or list and the subpartitions are hash.

The specific steps are as follows:

Create a new empty primary partition P in the subpartitioned table A (primary partitions are range or list, subpartitions are hash).
Create a new hash-partitioned table B, with subpartition types that must be exactly the same as those in table A. Use bypass load to import data into table B.
Use the partition exchange feature to swap the new primary partition P in table A with table B.

Step 1: Create the target subpartitioned table (orders)

Assume that the orders table orders is partitioned by the order_date field at the primary partition level and subpartitioned by the order_id field at the subpartition level. A new empty partition p_new is reserved for data exchange.

-- Create subpartitioned table A (example: orders table)
CREATE TABLE orders (
  order_id BIGINT,
  order_date TIMESTAMP NOT NULL,
  customer_id INT,
  amount DECIMAL(10,2),
  product_id INT,
  PRIMARY KEY (order_id, order_date)
)
PARTITION BY RANGE COLUMNS(order_date)
SUBPARTITION BY HASH(order_id) SUBPARTITIONS 5
(
  PARTITION p_old1 VALUES LESS THAN ('2023-01-01'),
  PARTITION p_old2 VALUES LESS THAN ('2023-04-01'),
  -- Add an empty partition p_new for subsequent data exchange
  PARTITION p_new VALUES LESS THAN ('2023-07-01')
);

Step 2: Prepare incremental data and create a primary hash-partitioned table

Create or download the incremental data file.

Assume that the incremental data file is named orders_increment.csv and contains the following fields:

order_id,order_date,customer_id,amount,product_id
1001,"2023-05-01",101,199.99,1001
1002,"2023-06-15",102,299.99,1002
1003,"2023-05-10",103,350.00,1003
1004,"2023-05-20",104,49.99,1004
1005,"2023-06-01",105,450.50,1005
1006,"2023-05-25",106,200.00,1006
1007,"2023-06-10",107,150.75,1007
1008,"2023-05-15",108,300.00,1008
1009,"2023-06-20",109,499.99,1009
1010,"2023-05-30",101,99.99,1010

Create a primary hash-partitioned table orders_bypass.

Create a primary hash-partitioned table orders_bypass with the same structure as the target subpartitioned table (orders table orders). The partition type must be consistent with the subpartition type of the target table (i.e., hash partitioning).

-- Create primary hash-partitioned table B with the same number of partitions as the subpartitions of the target table (5 partitions)
CREATE TABLE orders_bypass (
  order_id BIGINT,
  order_date TIMESTAMP NOT NULL,
  customer_id INT,
  amount DECIMAL(10,2),
  product_id INT,
  PRIMARY KEY (order_id, order_date)
)
PARTITION BY HASH(order_id) PARTITIONS 5;

Note

The partition type of the primary hash-partitioned table orders_bypass must be consistent with the subpartition type of the target subpartitioned table orders. In other words, if the subpartitions of the target table are SUBPARTITION BY HASH(order_id), the partitions of the source table must be PARTITION BY HASH(order_id).

Step 3: Import data to the primary hash-partitioned table orders_bypass

Use full bypass loading to quickly import data to the primary hash-partitioned table orders_bypass:

-- Create a temporary table for data loading
CREATE TABLE temp_staging_table LIKE orders_bypass;

-- Use LOAD DATA for bypass loading
LOAD DATA /*+ APPEND */ INFILE '/tmp/orders_increment.csv'
INTO TABLE temp_staging_table
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;

-- Use INSERT for batch import (with APPEND hint for bypass loading)
INSERT /*+ APPEND */ INTO orders_bypass
SELECT * FROM temp_staging_table;

-- Clean up the temporary table
DROP TABLE temp_staging_table;

Note

In practice, you can also use tools such as ObLoader of OceanBase Database for bypass loading to achieve more efficient data loading.

Step 4: Exchange partitions

Exchange the data of the primary hash-partitioned table orders_bypass with the empty primary partition p_new of the target subpartitioned table (orders table orders):

-- Exchange partitions
ALTER TABLE orders
EXCHANGE PARTITION p_new
WITH TABLE orders_bypass
WITHOUT VALIDATION;

Note

After the partition exchange is successful, the data in the orders_bypass table will be cleared, and the original data will be exchanged to the target partition p_new. If the source table contains indexes, you can use the INCLUDING INDEXES option to synchronize the exchange of indexes.

Step 5: Verify the data exchange result

Verify whether the partition exchange is successful:

-- Verify the data in partition p_new
SELECT COUNT(*) FROM orders PARTITION (p_new);

-- Verify whether the table orders_bypass is empty (the data in the table orders_bypass will be cleared after the exchange)
SELECT COUNT(*) FROM orders_bypass;

Example verification result:

MySQL [test_db]> SELECT COUNT(*) FROM orders PARTITION (p_new);
+----------+
| COUNT(*) |
+----------+
|       10 |
+----------+
1 row in set

MySQL [test_db]> SELECT COUNT(*) FROM orders_bypass;
+----------+
| COUNT(*) |
+----------+
|        0 |
+----------+
1 row in set

Step 6: Clean up data and perform subsequent operations

After the data exchange is completed, you can perform the following cleanup and subsequent operations:

-- Optional: Drop the empty table orders_bypass if it is no longer needed
DROP TABLE orders_bypass;

-- Optional: Rename the partition p_new to a more descriptive name
ALTER TABLE orders RENAME PARTITION p_new TO p_2023q2;

-- Recollect statistics (statistics will be invalidated after the exchange)
ANALYZE TABLE orders;

Unsupported scenarios

The following scenarios are not supported for partition exchange:

Exchange between primary partitioned tables: Not supported.
Exchange between subpartitioned tables: Not supported.
Exchange where the primary partition of the subpartitioned table is a hash partition: Not supported. The primary partition of the target table must be of the range or list type.

OceanBase

Customer Stories

Documentation

Efficient data loading with full direct load and partition exchange

Why combine partition exchange with full load direct load

Comparison of full load direct load and incremental direct load

Characteristics and limitations of incremental direct load

Characteristics and advantages of full load direct load

Advantages of full load direct load and partition exchange

Scenario analysis

Empty table initialization

Incremental import of existing data

Traditional incremental direct load

Traditional full load direct load

Full load direct load + Partition Exchange (Recommended Solution)

Simple performance comparison example

Traditional incremental direct load

Traditional full load direct load

Full load direct load + Partition Exchange

Basic syntax of partition exchange

Basic syntax of partition exchange

Syntax of partition exchange with indexes

Limitations

Example of using direct load and partition exchange (based on a partitioned table)

Step 1: Create the target partitioned table (orders table)

Step 2: Prepare incremental data and create a non-partitioned table

Step 3: Import data to the non-partitioned table orders_bypass

Step 4: Execute partition exchange (Exchange Partition)

Step 5: Verify the data exchange results

Step 6: Clean up data and perform subsequent operations

Example of using bypass load and partition exchange (based on subpartitions)

Step 1: Create the target subpartitioned table (orders)

Step 2: Prepare incremental data and create a primary hash-partitioned table

Note

Step 3: Import data to the primary hash-partitioned table orders_bypass

Note

Step 4: Exchange partitions

Note

Step 5: Verify the data exchange result

Step 6: Clean up data and perform subsequent operations

Unsupported scenarios

References