LOAD DATA|V4.1.0|OceanBase Database| docs|Distributed Database

LOAD DATA

Last Updated：2025-12-05 03:43:26 Updated

Description

The statement is used to import data from an external source.

In OceanBase Database, the LOAD DATA statement can load input files from OceanBase Database and Alibaba Cloud OSS, where the files are stored on OBServer nodes.

Note

The current version of OceanBase Database does not support using the LOCAL INFILE syntax to load data files.
Tables with triggers are not supported.

The LOAD DATA statement can currently import text files in CSV format. The entire import process consists of the following steps:

Parse the file: OceanBase Database reads data from the file based on the file name specified by the user and parses the data in the file in parallel or serially based on the specified degree of parallelism.
Distribute data: As OceanBase Database is a distributed database, data in partitions can be distributed across OBServer nodes. The LOAD DATA statement calculates the destination OBServer node for each data record parsed from the input file.
Insert data: Once a data record reaches its destination OBServer node, the node executes an INSERT statement to insert the record into the corresponding partition.

To import data from external files, you must have the FILE privilege. You can use the GRANT FILE ON *.* TO $user_name; statement to grant the privilege to the user who needs to execute the LOAD DATA statement, where $user_name is the name of the user.

Notice

If the statement is executed with the REPLACE or IGNORE clause and the degree of parallelism is greater than 1, the last inserted record for a conflicting row may be different from that obtained by serial execution. To strictly ensure the insertion results for conflicting rows, do not specify the degree of parallelism in the statement or set the degree of parallelism to 1.

Syntax

LOAD DATA
    [ [/*+ PARALLEL(N) load_batch_size(M) APPEND */] 
      | [/*+ PARALLEL(N) direct(bool, int) */] ]
    [REMOTE_OSS] INFILE 'file_name'
    [REPLACE | IGNORE]
    INTO TABLE table_name
    [{FIELDS | COLUMNS}
        [TERMINATED BY 'string']
        [[OPTIONALLY] ENCLOSED BY 'char']
        [ESCAPED BY 'char']
    ]
    [LINES
        [STARTING BY 'string']
        [TERMINATED BY 'string']
    ]
    [IGNORE number {LINES | ROWS}]
    [(column_name_var
        [, column_name_var] ...)]

Parameter description

Parameter	Description
parallel(N)	The degree of parallelism for loading data. The default value of `N` is `4`.
load_batch_size(M)	The number of rows to be inserted at a time. The default value of `M` is `100`. We recommend that you set this parameter to a value between 100 and 1000.
APPEND	Use this hint to enable direct load, which allows you to allocate space and write data directly in the data file. By default, the APPEND hint is equivalent to direct(true, 0). It also enables online collection of optimizer statistics (GATHER_OPTIMIZER_STATISTICS).
direct	Use this hint to enable direct load. The `bool` parameter in `direct(bool, int)` specifies whether to sort the given CSV file, where `true` indicates that sorting is required, and the `int` parameter specifies the maximum number of allowed error rows.
REMOTE_OSS	Specifies whether to read data from the OSS file system. Notice If you specify this parameter, the `file_name` parameter must be an address in the OSS file system.
file_name	The path and name of the input file. The format of the `file_name` parameter is as follows: If the import file is stored on an OBServer node, the format is `/$PATH/$FILENAME`. If the import file is stored on OSS, the format is `oss://$PATH/$FILENAME/?host=$HOST&access_id=$ACCESS_ID&access_key=$ACCESSKEY`. The parameters are described as follows: `$PATH` specifies the path of the file in the bucket, indicating the directory to which the file belongs. `$FILENAME` specifies the name of the file, indicating the specific file to be accessed. `$HOST` specifies the host name or domain name of the OSS service or CDN-accelerated domain name for accessing the target OSS service. `$ACCESS_ID` specifies the Access Key ID required for accessing the OSS service, used for authentication. `$ACCESSKEY` specifies the Access Key Secret required for accessing the OSS service, used for authentication. Note Before you import a file from OSS, make sure that: You have the permissions to access the specified OSS bucket and file. You need sufficient permissions to read the specified bucket and file. This typically requires setting the permissions on the OSS console or by using OSS APIs and configuring the access keys (Access Key ID and Access Key Secret) as credentials with appropriate permissions. The database server can connect to the specified `$HOST` over the network to access the OSS service. If a CDN-accelerated domain name is used, make sure that the CDN is correctly configured and the network connection is normal.
REPLACE \| IGNORE	Specifies the action to take when a unique key conflict occurs. `REPLACE` specifies to overwrite the conflicting row, and `IGNORE` specifies to ignore the conflicting row. `LOAD DATA` determines whether data is duplicated based on the primary key of the table. If the table does not have a primary key, the `REPLACE` and `IGNORE` options are the same. By default, when `LOAD DATA` encounters duplicate data, it records the error in a log file.
table_name	The name of the table to which the imported data is to be loaded. The table can be a partitioned or non-partitioned table.
FIELDS \| COLUMNS	Specifies the format of fields. `ENCLOSED BY`: specifies the delimiter for exported values. `TERMINATED BY`: specifies the delimiter for exported columns. `ESCAPED BY`: specifies the character to be ignored in exported values.
LINES STARTING BY	Specifies the start delimiter of a line.
LINES TERMINATED BY	Specifies the end delimiter of a line.
IGNORE number { LINES \| ROWS }	Specifies to ignore the first several lines. `LINES` specifies to ignore several lines in the file, and `ROWS` specifies to ignore several rows separated by field delimiters. By default, the system matches the fields in the input file one by one with the columns in the table. If the input file does not contain all the columns, the missing columns are filled with default values that are specified in the following rules: Character type: an empty string. Numeric type: 0. Date type: `0000-00-00`.
column_name_var	The name of the column to be imported.

Examples

Example 1: Import data from a file on the server (OBServer node)

Set the global secure path.

Notice

For security reasons, when you set the system variable secure_file_priv, you can connect to the database only through a local socket to execute the SQL statement that modifies the global variable. For more information, see secure_file_priv.
```
obclient> SET GLOBAL secure_file_priv = "";
```
Log out.

Note

After you set the secure_file_priv variable, you must execute the \q command to log out for the change to take effect.
```
obclinet> \q
```
The return result is as follows:
```
Bye
```

After you reconnect to the database, execute the LOAD DATA statement to import data.

Perform a regular import.

obclient> LOAD DATA INFILE '/home/admin/test.csv' INTO TABLE t1;

Use the APPEND hint to enable direct load.

LOAD DATA /*+ PARALLEL(4) APPEND */ INFILE '/home/admin/test.csv' INTO TABLE t1;

Example 2: Import data from an OSS file

Notice

When you use an object storage path, make sure that the values of the parameters in the object storage path are composed of uppercase and lowercase letters, digits, and the following special characters: \/-_$+=. If you enter characters other than the preceding ones, the setting may fail.

Use the direct(bool, int) hint to enable direct load for a file that is stored in OSS.

LOAD DATA /*+ direct(true,1024) parallel(16) */ REMOTE_OSS INFILE 'oss://antsys-oceanbasebackup/backup_rd/xiaotao.ht/lineitem2.tbl?host=***.oss-cdn.***&access_id=***&access_key=***' INTO TABLE tbl1 FIELDS TERMINATED BY ',';