Efficiently load data by using full direct load and partition exchange|V4.3.5|OceanBase Database| docs|Distributed Database

The partition exchange feature of OceanBase Database can be combined with direct load technology to provide an efficient solution for large-scale incremental data loading. You can use full direct load to quickly load data to a newly created non-partitioned table and then use partition exchange to atomically migrate the data to the target partitioned table, thereby improving the performance of incremental data loading for existing partitioned tables.

Why combine partition exchange with full bypass load

In big data scenarios, enterprises often face the need to quickly import large volumes of incremental data. Traditional data import methods have performance bottlenecks, while combining partition exchange with bypass load addresses both import efficiency and query performance.

Comparison of full bypass load and incremental bypass load

Incremental bypass load

Data storage format: Incremental bypass load generates dump files, which are stored in rowstore format by default.
Performance impact: In columnstore scenarios, rowstore dump files significantly degrade query performance.
Applicable scenarios: Suitable for small batches of frequent data updates.

Full bypass load

Data storage format: Full bypass load rewrites all data, generating columnstore files (if the table is configured as columnstore).
Performance advantage: All data is baseline data, ensuring optimal query performance.
Applicable scenarios: Suitable for one-time import of large volumes of data.

Benefits of combining both approaches

By combining full bypass load with partition exchange, the performance bottleneck of storing incremental data in dump format can be effectively addressed. The core advantages of this solution are:

Data format optimization: Converts incremental data into baseline data format, ensuring all data is stored in columnstore format and eliminating the negative impact of rowstore dump files on query performance.
Performance improvement:
- Query performance: All data is unified to baseline format, achieving optimal query performance.
- Import performance: Utilizes the efficient mechanism of bypass load to significantly enhance data loading speed.
Architecture consistency: Through partition exchange, incremental data and existing data are unified in format, avoiding performance losses from mixed storage formats.

This technical solution maintains the flexibility of incremental import while leveraging the performance advantages of full import, providing optimal performance for large-scale data processing scenarios.

Scenario analysis

Empty table initialization

Scenario: The target table is empty and requires initial data import.

Recommended solution: Direct full bypass load
Advantages: All data is columnstore baseline, ensuring optimal query performance

Incremental import to an existing table

Scenario: The target table already contains historical data and requires new incremental data.

Traditional incremental bypass load

Advantages: Fast import speed with no impact on existing data.
Disadvantages: New data is stored as rowstore dump files, resulting in poor query performance.

Traditional full bypass load

Advantages: All data is rewritten to columnstore, ensuring optimal query performance.
Disadvantages: Requires rewriting all historical data, resulting in long import times and high resource consumption.

Full bypass load + Partition exchange (Recommended solution)

Advantages:
- New data is generated as columnstore baseline through full bypass load, ensuring optimal query performance
- Only incremental data needs to be imported, without affecting historical data
- Partition exchange operations are atomic, ensuring zero business interruption
- Both import and query performance are optimized

Simple performance comparison example

Assume a customer has a 1TB orders table with 100GB of new data added daily:

Traditional incremental bypass load

Historical data: 1TB (columnstore, fast query)
New data: 100GB (rowstore dump, slow query)
Query performance: Mixed storage format results in overall performance degradation

Traditional full bypass load

Rewritten data: 1.1TB (all columnstore, fast query)
Import time: Needs to rewrite 1TB of historical data + 100GB of new data
Resource consumption: High

Full bypass load + Partition exchange

Historical data: 1TB (columnstore, unchanged)
New data: 100GB (columnstore baseline, fast query)
Import time: Only needs to import 100GB of new data
Query performance: All columnstore, optimal performance

Basic syntax of partition exchange

ALTER TABLE target_partition_table_name
    EXCHANGE PARTITION first_level_partition_name
    WITH TABLE origin_partition_table_name
    WITHOUT VALIDATION;

Syntax of partition exchange with indexes

ALTER TABLE target_partition_table_name
    EXCHANGE PARTITION first_level_partition_name
    WITH TABLE origin_partition_table_name
    INCLUDING INDEXES
    WITHOUT VALIDATION;

Limitations

The target_partition_table_name must be a subpartitioned table, and the partition to be exchanged must be a partition of this table.
The origin_partition_table_name must be a partitioned table, and the partition type of its partitions must be consistent with the partition type of the subpartitions under the partition to be exchanged.
The partition type of the partitions in the target table must be RANGE (COLUMNS) or LIST (COLUMNS).
After the primary and subpartitions are exchanged, the statistics of the related tables become invalid and must be collected again.

Example of using direct load and partition exchange (based on a partitioned table)

Fast data loading: Direct load bypasses SQL and Memtable, enabling massive data loading in seconds.
Zero business interruption: Partition exchange operations are completed atomically in the background, with no lock table impact on business queries.
Partition-level data management: Partition exchange allows precise control over data migration scope, supporting flexible strategies based on time or range.

The specific steps are as follows:

Create a new empty partition P in the partitioned table A.
Create a non-partitioned table B with the same structure as table A, and use full direct load to import data into table B.
Use the partition exchange feature to swap the new partition P in table A with table B.

Step 1: Create the target partitioned table (orders table)

Assume the orders table is partitioned by the order_date field and has a reserved empty partition p_new for data exchange:

-- Create partitioned table A (example: orders table)
CREATE TABLE orders (
  order_id BIGINT,
  order_date TIMESTAMP NOT NULL,
  customer_id INT,
  amount DECIMAL(10,2),
  product_id INT
)
PARTITION BY RANGE COLUMNS(order_date) (
  PARTITION p_old1 VALUES LESS THAN ('2023-01-01'),
  PARTITION p_old2 VALUES LESS THAN ('2023-04-01'),
  -- Add an empty partition p_new for subsequent data exchange
  PARTITION p_new VALUES LESS THAN ('2023-07-01')
);

Step 2: Prepare incremental data and create a non-partitioned table

Create or download the incremental data file.

Assume the incremental data file is orders_increment.csv and contains the following fields:

order_id,order_date,customer_id,amount,product_id
1001,"2023-05-01",101,199.99,1001
1002,"2023-06-15",102,299.99,1002
1003,"2023-05-10",103,350.00,1003
1004,"2023-05-20",104,49.99,1004
1005,"2023-06-01",105,450.50,1005
1006,"2023-05-25",106,200.00,1006
1007,"2023-06-10",107,150.75,1007
1008,"2023-05-15",108,300.00,1008
1009,"2023-06-20",109,499.99,1009
1010,"2023-05-30",101,99.99,1010

Create a non-partitioned table orders_bypass.

Create a non-partitioned table orders_bypass with the same structure as the target partitioned table (orders table):

CREATE TABLE orders_bypass (
  order_id BIGINT,
  order_date TIMESTAMP NOT NULL,
  customer_id INT,
  amount DECIMAL(10,2),
  product_id INT
);

-- Create a temporary table
CREATE TABLE temp_staging_table LIKE orders_bypass;

Step 3: Direct load data into the non-partitioned table orders_bypass

Use full direct load to quickly import data into the non-partitioned table orders_bypass:

-- Example only. In practice, use tools like OceanBase's ObLoader.
-- Use LOAD DATA or INSERT /*+ APPEND */ to import data
LOAD DATA /*+ APPEND */ INFILE '/tmp/orders_increment.csv'
INTO TABLE temp_staging_table
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;

-- Or use INSERT for batch import (with APPEND hint)
INSERT /*+ APPEND */ INTO orders_bypass
SELECT * FROM temp_staging_table;  -- The temporary table must have the same structure as orders_bypass

Step 4: Perform partition exchange (Exchange Partition)

Exchange the data in the non-partitioned table orders_bypass with the empty partition p_new in the partitioned table orders:

-- Perform partition exchange
ALTER TABLE orders
EXCHANGE PARTITION p_new
WITH TABLE orders_bypass
WITHOUT VALIDATION;  -- Optional: If table B contains indexes, synchronize the exchange

Step 5: Verify the data exchange results

-- Verify the data in partition p_new
SELECT COUNT(*) FROM orders PARTITION (p_new);

-- Verify if table orders_bypass is empty (data will be cleared after exchange)
SELECT COUNT(*) FROM orders_bypass;

Example return results:

MySQL [test_db]> SELECT COUNT(*) FROM orders PARTITION (p_new);
+----------+
| COUNT(*) |
+----------+
|       10 |
+----------+
1 row in set (0.04 sec)

MySQL [test_db]> SELECT COUNT(*) FROM orders_bypass;
+----------+
| COUNT(*) |
+----------+
|        0 |
+----------+
1 row in set (0.01 sec)

Step 6: Clean up data and perform subsequent operations

-- Optional: Drop the empty table orders_bypass
DROP TABLE orders_bypass;

-- Optional: Rename partition p_new to a more descriptive name
ALTER TABLE orders RENAME PARTITION p_new TO p_2023q2;

Example of bypass load and partition swap (with subpartitions)

Fast data loading: Bypass load technology directly bypasses SQL and Memtable, enabling the loading of massive amounts of data within seconds.
Zero business interruption: Partition swap operations are completed in the background in an atomic manner, without locking tables and thus without affecting business queries.
Partition-level data management: Partition swap allows precise control over the scope of data migration, supporting flexible strategies based on time or range partitioning.
Incremental data import for subpartitioned tables: This is suitable for composite partitioned tables where the primary partition is range/list and the subpartition is hash.

The specific steps are as follows:

Create a new empty primary partition P in the subpartitioned table A (primary range/list + subpartition hash).
Create a new primary hash-partitioned table B, whose subpartition types must be completely consistent with those of table A. Then, perform a full bypass load to import the data into table B.
Use the partition swap feature to swap the new primary partition P in table A with table B.

Step 1: Create the target subpartitioned table (orders)

Assume that the orders table orders is partitioned by the order_date field at the primary partition level and subpartitioned by the order_id field using hash partitioning, with an empty partition p_new reserved for data exchange:

-- Create subpartitioned table A (example: orders table)
CREATE TABLE orders (
  order_id BIGINT,
  order_date TIMESTAMP NOT NULL,
  customer_id INT,
  amount DECIMAL(10,2),
  product_id INT,
  PRIMARY KEY (order_id, order_date)
)
PARTITION BY RANGE COLUMNS(order_date)
SUBPARTITION BY HASH(order_id) SUBPARTITIONS 5
(
  PARTITION p_old1 VALUES LESS THAN ('2023-01-01'),
  PARTITION p_old2 VALUES LESS THAN ('2023-04-01'),
  -- Add an empty partition p_new for subsequent data exchange
  PARTITION p_new VALUES LESS THAN ('2023-07-01')
);

Step 2: Prepare incremental data and create a primary hash partitioned table

Create or download the incremental data file.

Assume that the incremental data file is named orders_increment.csv and contains the following fields:

order_id,order_date,customer_id,amount,product_id
1001,"2023-05-01",101,199.99,1001
1002,"2023-06-15",102,299.99,1002
1003,"2023-05-10",103,350.00,1003
1004,"2023-05-20",104,49.99,1004
1005,"2023-06-01",105,450.50,1005
1006,"2023-05-25",106,200.00,1006
1007,"2023-06-10",107,150.75,1007
1008,"2023-05-15",108,300.00,1008
1009,"2023-06-20",109,499.99,1009
1010,"2023-05-30",101,99.99,1010

Create a primary hash partitioned table orders_bypass.

Create a primary hash partitioned table orders_bypass with the same structure as the target subpartitioned table (orders table orders). The partition type must be hash partitioning, which is the same as the subpartition type of the target table.

-- Create primary hash partitioned table B, with the same number of partitions as the subpartitions of the target table (5)
CREATE TABLE orders_bypass (
  order_id BIGINT,
  order_date TIMESTAMP NOT NULL,
  customer_id INT,
  amount DECIMAL(10,2),
  product_id INT,
  PRIMARY KEY (order_id, order_date)
)
PARTITION BY HASH(order_id) PARTITIONS 5;

Note

The partition type of the primary hash partitioned table orders_bypass must be the same as the subpartition type of the target subpartitioned table orders. In other words, if the subpartition type of the target table is SUBPARTITION BY HASH(order_id), the partition type of the source table must be PARTITION BY HASH(order_id).

Step 3: Import data to the primary hash partitioned table orders_bypass

Use full bypass import to quickly import data to the primary hash partitioned table orders_bypass:

-- Create a temporary table for data loading
CREATE TABLE temp_staging_table LIKE orders_bypass;

-- Use LOAD DATA for bypass import
LOAD DATA /*+ APPEND */ INFILE '/tmp/orders_increment.csv'
INTO TABLE temp_staging_table
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;

-- Use INSERT for batch import (with APPEND hint for bypass import)
INSERT /*+ APPEND */ INTO orders_bypass
SELECT * FROM temp_staging_table;

-- Clean up the temporary table
DROP TABLE temp_staging_table;

Note

In practice, you can also use tools such as OceanBase's ObLoader for bypass import to achieve more efficient data loading.

Step 4: Exchange partitions

Exchange the data in the primary hash partitioned table orders_bypass with the empty primary partition p_new of the target subpartitioned table (orders table orders):

-- Execute the partition exchange
ALTER TABLE orders
EXCHANGE PARTITION p_new
WITH TABLE orders_bypass
WITHOUT VALIDATION;

Note

After the partition exchange is successful, the data in the orders_bypass table will be cleared, and the original data will be exchanged to the target partition p_new. If the source table contains indexes, you can use the INCLUDING INDEXES option to synchronize the exchange of indexes.

Step 5: Verify the data exchange result

Verify whether the partition exchange is successful:

-- Verify the data in partition p_new
SELECT COUNT(*) FROM orders PARTITION (p_new);

-- Verify whether the orders_bypass table is empty (the data in the orders_bypass table will be cleared after the exchange)
SELECT COUNT(*) FROM orders_bypass;

Example of the verification result:

MySQL [test_db]> SELECT COUNT(*) FROM orders PARTITION (p_new);
+----------+
| COUNT(*) |
+----------+
|       10 |
+----------+
1 row in set

MySQL [test_db]> SELECT COUNT(*) FROM orders_bypass;
+----------+
| COUNT(*) |
+----------+
|        0 |
+----------+
1 row in set

Step 6: Clean up data and perform subsequent operations

After the data exchange is completed, you can perform the following cleanup and subsequent operations:

-- Optional: Drop the empty table orders_bypass (if no longer needed)
DROP TABLE orders_bypass;

-- Optional: Rename the partition p_new to a more descriptive name
ALTER TABLE orders RENAME PARTITION p_new TO p_2023q2;

-- Recollect statistics (statistics will become invalid after the exchange)
ANALYZE TABLE orders;

Unsupported scenarios

The following scenarios are not supported for partition exchange:

Exchange between primary partitioned tables: Not supported.
Exchange between subpartitioned tables: Not supported.
Exchange where the primary partition of the subpartitioned table is a hash partition: Not supported. The primary partition of the target table must be of the range or list type.

OceanBase

Customer Stories

Documentation

Efficiently load data by using full direct load and partition exchange

Why combine partition exchange with full bypass load

Comparison of full bypass load and incremental bypass load

Incremental bypass load

Full bypass load

Benefits of combining both approaches

Scenario analysis

Empty table initialization

Incremental import to an existing table

Traditional incremental bypass load

Traditional full bypass load

Full bypass load + Partition exchange (Recommended solution)

Simple performance comparison example

Traditional incremental bypass load

Traditional full bypass load

Full bypass load + Partition exchange

Basic syntax of partition exchange

Basic syntax of partition exchange

Syntax of partition exchange with indexes

Limitations

Example of using direct load and partition exchange (based on a partitioned table)

Step 1: Create the target partitioned table (orders table)

Step 2: Prepare incremental data and create a non-partitioned table

Step 3: Direct load data into the non-partitioned table orders_bypass

Step 4: Perform partition exchange (Exchange Partition)

Step 5: Verify the data exchange results

Step 6: Clean up data and perform subsequent operations

Example of bypass load and partition swap (with subpartitions)

Step 1: Create the target subpartitioned table (orders)

Step 2: Prepare incremental data and create a primary hash partitioned table

Note

Step 3: Import data to the primary hash partitioned table orders_bypass

Note

Step 4: Exchange partitions

Note

Step 5: Verify the data exchange result

Step 6: Clean up data and perform subsequent operations

Unsupported scenarios

References