Background information
OceanBase Database allows you to insert data into a database by using direct load. In other words, OceanBase Database can directly write data into data files. With the direct load feature, you can get around APIs at the SQL layer and directly allocate space and insert data into data files, thereby improving the data import efficiency.
Scenarios
The direct load feature applies to the following scenarios:
Data migration and synchronization: In data migration or synchronization, a large amount of data in different formats often needs to be migrated from different data sources to OceanBase Database. Conventional SQL APIs cannot meet the requirement on timeliness.
Conventional extract, transform, and load (ETL): After data is extracted and transformed in the source, a large amount of data often needs to be loaded to the target within a short time. The direct load technology can improve the import performance.
Data load from text files or other data sources to OceanBase Database: Direct load can accelerate the data load process.
Considerations
In direct load, the remote procedure call (RPC) port instead of the SQL port is used to transmit data.
Data to be imported is submitted at the table level instead of the session or transaction level.
Retry or resumable transmission is not supported.
Bit data types are not supported.
Virtual generated columns are not supported.
We recommend that you do not import a small amount of data in direct load mode.
The
--replace-dataoption cannot help address unique index conflicts.The differences between the
--threadand--paralleloptions are as follows:--threadspecifies the thread pool for connections from the client to the server, and is maintained on the client.--parallelspecifies the number of worker threads that can be called by the OBServer node for data writing and sorting.We recommend that you specify consistent values for
--threadand--parallel.
Command-line options for direct load
Notice
When you use OBLOADER for direct load, you can connect to an OBServer node directly or through OceanBase Database Proxy (ODP). The version requirements are as follows:
- OBServer node: OceanBase Database V4.2.0 or later.
- In the case of connection through ODP, the version of ODP must be V4.3.0 or later, and that of the OBServer node must be OceanBase Database V4.2.1 or later.
| Command-line option | Description | ApsaraDB for OceanBase & ODP | OceanBase Database & ODP | OceanBase Database |
|---|---|---|---|---|
| --direct | Specifies to enable direct load. | Required | Required | Required |
| --parallel | The degree of parallelism (DOP) on the server. The default value is 1. We recommend that the value be consistent with the number of CPU cores of the tenant. We recommend that you specify this option to ensure performance stability. |
Optional | Optional | Optional |
| --rpc-port | The inner RPC port of the server. You can obtain the RPC port as follows:
|
Required | Required | Required |
| -u(--user) | The database username. | Required | Required | Required |
| -P(--port) | The SQL port number. | Required | Required | Required |
| -c(--cluster) | The name of the cluster. | Optional | Required | - |
| -t(--tenant) | The tenant name for connecting to OceanBase Database.
NoticeYou must use this option with the |
Required | Required | - |
| --public-cloud | Imports database object definitions or table data from an ApsaraDB for OceanBase cluster.
NoticeYou must use this option with the |
Required | - | - |
| --no-sys | Specifies that the import does not rely on the sys tenant. This option applies only to OceanBase Database of a version earlier than V4.0.0. | Optional | Optional | Optional |
| --sys-user | The user on which the import relies in the sys tenant. If this option is not specified, the default value root takes effect. This option applies only to OceanBase Database of a version earlier than V4.0.0. |
Optional This option is mutually exclusive with the --no-sys option. |
Optional This option is mutually exclusive with the --no-sys option. |
Optional This option is mutually exclusive with the --no-sys option. |
| --sys-password | The password of the user on which the import relies in the sys tenant. This option applies only to OceanBase Database of a version earlier than V4.0.0. | Optional This option is mutually exclusive with the --no-sys option. |
Optional This option is mutually exclusive with the --no-sys option. |
Optional This option is mutually exclusive with the --no-sys option. |
Parameters for direct load
You can configure parameters for direct load in the session.config.json file in the {ob-loader-dumper}/conf directory.
Here is an example:
"direct_path_load": {
"task_timeout_ms": "2592000000000",
"heartbeat_timeout_ms": "60000000",
"heartbeat_interval_ms": "30000000"
}
task_timeout_ms: the timeout period of an import operation, in μs. If an import operation is not completed within the configured period, it is considered timed out.heartbeat_timeout_ms: the heartbeat timeout period in μs, which is used to detect the active status of an import operation.heartbeat_interval_ms: the interval between two heartbeats, in μs.