Direct load

2025-06-13 03:35:22  Updated

Background information

OceanBase Database allows you to insert data into a database by using direct load. In other words, OceanBase Database can directly write data into data files. With the direct load feature, you can get around APIs at the SQL layer and directly allocate space and insert data into data files, thereby improving the data import efficiency.

Scenarios

The direct load feature applies to the following scenarios:

  • Data migration and synchronization: In data migration or synchronization, a large amount of data in different formats often needs to be migrated from different data sources to OceanBase Database. Conventional SQL APIs cannot meet the requirement on timeliness.

  • Conventional extract, transform, and load (ETL): After data is extracted and transformed in the source, a large amount of data often needs to be loaded to the target within a short time. The direct load technology can improve the import performance.

  • Data load from text files or other data sources to OceanBase Database: Direct load can accelerate the data load process.

Considerations

  • In direct load, the remote procedure call (RPC) port instead of the SQL port is used to transmit data.

  • Data to be imported is submitted at the table level instead of the session or transaction level.

  • Retry or resumable transmission is not supported.

  • Bit data types are not supported.

  • Virtual generated columns are not supported.

  • We recommend that you do not import a small amount of data in direct load mode.

  • The --replace-data option cannot help address unique index conflicts.

  • The differences between the --thread and --parallel options are as follows:

    • --thread specifies the thread pool for connections from the client to the server, and is maintained on the client.

    • --parallel specifies the number of worker threads that can be called by the OBServer node for data writing and sorting.

    • We recommend that you specify consistent values for --thread and --parallel.

Command-line options for direct load

Notice

When you use OBLOADER for direct load, you can connect to an OBServer node directly or through OceanBase Database Proxy (ODP). The version requirements are as follows:

  • OBServer node: OceanBase Database V4.2.0 or later.
  • In the case of connection through ODP, the version of ODP must be V4.3.0 or later, and that of the OBServer node must be OceanBase Database V4.2.1 or later.

Command-line option Description ApsaraDB for OceanBase & ODP OceanBase Database & ODP OceanBase Database
--direct Specifies to enable direct load. Required Required Required
--parallel The degree of parallelism (DOP) on the server. The default value is 1. We recommend that the value be consistent with the number of CPU cores of the tenant.
We recommend that you specify this option to ensure performance stability.
Optional Optional Optional
--rpc-port The inner RPC port of the server. You can obtain the RPC port as follows:
  • In the case of connection through ODP:
    • In ApsaraDB for OceanBase, the RPC port of ODP is 3307 by default.
    • In OceanBase Database, the RPC port of ODP is 2885 by default. You can change the port by using the -s option when you start ODP.
  • In the case of direct connection to an OBServer node, you can query the DBA_OB_SERVER system view in the sys tenant for the RPC port of the OBServer node, which is 2882 by default.
Required Required Required
-u(--user) The database username. Required Required Required
-P(--port) The SQL port number. Required Required Required
-c(--cluster) The name of the cluster. Optional Required -
-t(--tenant) The tenant name for connecting to OceanBase Database.

Notice

You must use this option with the --public-cloud -t <tenant> option in direct load from cloud storage.

Required Required -
--public-cloud Imports database object definitions or table data from an ApsaraDB for OceanBase cluster.

Notice

You must use this option with the --public-cloud -t <tenant> option in direct load from cloud storage.

Required - -
--no-sys Specifies that the import does not rely on the sys tenant. This option applies only to OceanBase Database of a version earlier than V4.0.0. Optional Optional Optional
--sys-user The user on which the import relies in the sys tenant. If this option is not specified, the default value root takes effect. This option applies only to OceanBase Database of a version earlier than V4.0.0. Optional
This option is mutually exclusive with the --no-sys option.
Optional
This option is mutually exclusive with the --no-sys option.
Optional
This option is mutually exclusive with the --no-sys option.
--sys-password The password of the user on which the import relies in the sys tenant. This option applies only to OceanBase Database of a version earlier than V4.0.0. Optional
This option is mutually exclusive with the --no-sys option.
Optional
This option is mutually exclusive with the --no-sys option.
Optional
This option is mutually exclusive with the --no-sys option.

Parameters for direct load

You can configure parameters for direct load in the session.config.json file in the {ob-loader-dumper}/conf directory.

Here is an example:

"direct_path_load": {
    "task_timeout_ms": "25920000", 
    "heartbeat_timeout_ms": "60000000",
    "heartbeat_interval_ms": "30000000" 
}
  • task_timeout_ms: the timeout period of an import operation, in ms. If an import operation is not completed within the configured period, it is considered timed out.

  • heartbeat_timeout_ms: the heartbeat timeout period in ms, which is used to detect the active status of an import operation.

  • heartbeat_interval_ms: the interval between two heartbeats, in ms.

Contact Us