OceanBase Database supports inserting data using the direct load method, which allows you to write data directly to data files. Direct load bypasses the SQL layer, directly allocates space, and inserts data into the data files, thereby improving data import efficiency.
Traditional data import methods in OceanBase Database, such as LOAD DATA, OBLOADER, and OMS, rely on INSERT statements and follow this process:
SQL parsing -> transaction processing -> writing to MemTable -> minor compaction and major compaction -> SSTable.
Bottlenecks:
- High consumption of CPU and memory resources.
- Performance degradation when importing large volumes of data.
With direct load technology, OceanBase Database skips these intermediate steps and writes data directly to SSTables. This reduces resource consumption (CPU and memory) and enables efficient full or incremental data import.
Notice
Upgrading OceanBase Database during a direct load task is not recommended, as it may cause the task to fail.
Technical architecture
Traditional import path:
Client -> SQL parsing -> transaction processing -> MemTable -> minor/major compaction -> SSTable
Disadvantages: The path is long, resource consumption is high, and write speed is limited.
Direct load path:
Client -> type conversion -> primary key sorting (optional) -> direct write to major SSTable (full import) or mini SSTable (incremental import)
Advantages:
- Skips intermediate modules such as SQL parsing, transaction processing, and MemTable, reducing resource consumption.
- Directly writes data to underlying storage, avoiding the overhead of multiple compactions.
Core mechanisms of direct load
Skipping intermediate layers: Bypasses SQL parsing, transaction processing, and MemTable, writing directly to SSTables.
Type conversion and sorting:
- Data type conversion (for example, converting CSV data to OceanBase table structure).
- Primary key sorting (required for full import to ensure the order of the major SSTable).
Storage layer architecture and direct load process
Storage layer hierarchy
The LSM-Tree-based storage layer of OceanBase Database consists of three tiers:
- MemTable: An in-memory row-based structure that supports high-concurrency writes.
- Mini SSTable: An on-disk row-based structure generated by flushing data from the MemTable.
- Major SSTable: A column-based structure produced by daily major compactions, delivering optimal query performance.
Direct load modes
OceanBase Database supports full direct load and incremental direct load.
When you use the LOAD DATA, INSERT INTO SELECT, or CREATE TABLE AS SELECT statement to import data, you can specify direct load for the import task by adding a hint to the statement, or by setting the global parameter default_load_mode.
Full direct load
- Full direct load enables you to write an entire dataset directly into a database data file in a single operation. This method bypasses the SQL layer, directly allocates space, and inserts data into the data file, thereby improving data import efficiency.
- Full direct load is typically used for database initialization, data migration, or quick loading of large amounts of data.
Incremental direct load
- Incremental direct load allows you to write new data directly into a database data file, without using SQL interfaces, even when the database already contains a large amount of data. This method bypasses SQL layer processing and writes new data directly to the data file, thus improving data write efficiency.
- Incremental direct load is typically used in high-throughput data write scenarios, such as large-scale real-time data collection and log write.
Comparison between full load and incremental load
| Dimension | Full load | Incremental load |
|---|---|---|
| Triggering scenarios | Data initialization or complete data replacement | Data appending or partial updates |
| Import process |
|
|
| Write location | Major SSTable (columnar) | Mini SSTable (row-based) |
| Performance characteristics | Data is processed row by row and rewritten to the major SSTable, with overhead increasing as data volume grows. Speed decreases due to data rewriting.
|
Query performance is poor but parallel import is supported. Queries require waiting for daily major compaction to convert the mini SSTable to columnar format.
|
| Lock mechanism | Table lock (entire table unavailable) | Partition lock (only the target partition is locked) |
| Query performance | Columnar tables use a columnar structure, and row-based tables use a row-based structure, offering better query performance. | Both columnar and row-based tables use a row-based structure, resulting in poor query performance. |
Scenarios
The direct load feature can be used in the following scenarios:
Data migration and synchronization. When migrating and synchronizing data, you often need to move large volumes of data in various formats from different data sources to OceanBase Database. The performance of traditional SQL interfaces may not meet timeliness requirements.
Traditional ETL. After data is extracted and transformed at the source, large amounts of data often need to be loaded into the destination within a short period. Direct load technology can improve data import performance. During the data loading phase of ETL processes, you can also use direct load to enhance efficiency.
Load data from text files or other data sources to OceanBase Database. The direct load technology can also improve the data loading efficiency.