OceanBase Database supports direct data insertion to data files, bypassing the SQL layer. This direct insertion improves data import efficiency by allocating space and inserting data directly into data files without going through the SQL interface.
Traditional data import methods in OceanBase Database, such as LOAD DATA, OBLOADER, and OMS, rely on INSERT statements and follow these steps:
SQL parsing -> transaction processing -> MemTable writing -> minor/major compaction -> SSTable.
Bottlenecks:
- High CPU and memory resource consumption.
- Performance degradation with large-scale data imports.
OceanBase Database's bypass import technology skips these intermediate steps, directly writing data to SSTables while reducing resource consumption (CPU, memory) and supporting efficient full or incremental data imports.
Notice
Do not perform upgrades during a bypass import task, as this may cause the task to fail.
Technical architecture
Traditional import path:
Client -> SQL parsing -> transaction processing -> MemTable -> minor/major compaction -> SSTable
Disadvantages: Long path, high resource consumption, limited write speed.
Bypass import path:
Client -> type conversion -> primary key sorting (optional) -> direct write to Major SSTable (full import) or Mini SSTable (incremental import)
Advantages:
- Skips SQL, transaction, and MemTable modules, reducing resource consumption.
- Data is directly written to the underlying storage, avoiding multiple compaction overheads.
Core mechanisms of bypass import
Skipping intermediate layers: Bypasses SQL parsing, transaction processing, and MemTable, directly writing to SSTables.
Type conversion and sorting:
- Converts data types (e.g., CSV to OceanBase table structure).
- Sorts data by primary key (required for full imports to ensure Major SSTable order).
Features
OceanBase Database supports two types of bypass import: full bypass import and incremental bypass import.
When using the LOAD DATA, INSERT INTO SELECT, or CREATE TABLE AS SELECT statements to import data, you can specify bypass import for a single task using a hint, or set the global configuration parameter default_load_mode to enable bypass import.
Full bypass import
- Full bypass import writes an entire dataset directly to the database's data files. This method bypasses the SQL layer, allocating space and inserting data directly into data files, improving import efficiency.
- Typically used for database initialization, data migration, or rapid loading of large volumes of data.
Incremental bypass import
- Incremental bypass import writes new data directly to the database's data files without using the SQL interface. This method bypasses the SQL layer's data processing, directly writing new data to data files, improving write efficiency.
- Typically used in high-throughput data writing scenarios, such as large-scale real-time data collection or log writing.
Comparison of full and incremental imports
OceanBase Database's LSM-Tree storage layer consists of three levels:
- MemTable: An in-memory row-based structure supporting high-concurrency writes.
- Mini SSTable: A row-based structure on disk, generated by minor compactions from MemTables.
- Major SSTable: A columnar structure, generated by daily major compactions, offering optimal query performance.
| Dimension | Full Import | Incremental Import |
|---|---|---|
| Trigger scenarios | Data initialization or complete data overwrite. | Data appending or partial updates. |
| Import process |
|
|
| Write location | Major SSTable | Conflicting data is written to Mini SSTable, while non-conflicting data is written to a new Major SSTable. |
| Performance characteristics | Locks the table during import to prevent interference from other operations. | Best suited for scenarios with few primary key conflicts; performance decreases with many conflicts. |
| Lock mechanism | Table lock for specified tables, partition lock for specified partitions. | Table lock for specified tables, partition lock for specified partitions. |
| Query performance | Data is written directly to the baseline, ensuring optimal query performance. | For non-conflicting data, data is written directly to the baseline, ensuring optimal query performance. For conflicting data, incremental writes maintain good query performance. |
Use cases
Bypass import is suitable for the following scenarios:
Data migration and synchronization: When migrating and synchronizing large volumes of data from various sources to OceanBase Database, the performance of traditional SQL interfaces may not meet the required timeliness.
Traditional ETL: After data is extracted and transformed at the source, it needs to be loaded to the target system quickly. Bypass import improves the efficiency of this process. Additionally, during ETL, bypass import can enhance the efficiency of data loading.
Loading data from text files or other data sources to OceanBase Database: Bypass import can significantly improve the efficiency of data loading from these sources.
