LOAD DATA|V4.4.2|OceanBase Database| docs|Distributed Database

LOAD DATA

Last Updated：2026-04-27 06:29:11 Updated

Purpose

This statement is used to import data from an external source.

OceanBase Database supports the following input file types for the LOAD DATA statement:
- Server-side (OBServer node) files: These files are stored on the OBServer nodes of OceanBase Database. You can use the LOAD DATA INFILE or LOAD DATA FROM URL statement to load data from these files into a database table.
- Client-side (local) files: These files are stored on the local file system of the client. You can use the LOAD DATA LOCAL INFILE or LOAD DATA FROM URL statement to load data from these files into a database table.
  
  Note
  
  When you execute the LOAD DATA LOCAL INFILE statement, the system automatically adds the IGNORE option.
- OSS files: These files are stored in the OSS file system. You can use the LOAD DATA REMOTE_OSS INFILE statement to load data from these files into a database table.
The LOAD DATA statement can currently import text files in CSV format. The entire import process consists of the following steps:
1. File parsing: OceanBase Database reads the data from the specified file and parses it based on the specified parallelism, either in parallel or sequentially.
2. Data distribution: Since OceanBase Database is a distributed database, data may be distributed across different OBServer nodes. The LOAD DATA statement calculates which OBServer node the data should be sent to.
3. Data insertion: Once the target OBServer node receives the data, it executes the INSERT statement to insert the data into the corresponding partition.

Considerations

You cannot use the LOAD DATA statement on a table that contains triggers.
To import data from an external file, you must have the FILE privilege and the following settings:
- When loading a server-side file, you must set the system variable secure_file_priv to specify the path for accessing files during import or export.
- When loading a local client-side file, you must add the --local-infile[=1] option when starting the MySQL/OBClient client to enable data loading from the local file system.
When using partitioned table bypass import, the target table cannot be a replica table, and it cannot contain auto-increment columns, identity columns, or global indexes.

Syntax

-- Load data from a file
LOAD DATA
    [/*+ PARALLEL(N) [load_batch_size(M)] [APPEND | direct(bool, int, [load_mode])] | NO_DIRECT */]
    [REMOTE_OSS | LOCAL] INFILE 'file_name'
    [REPLACE | IGNORE]
    INTO TABLE table_name [PARTITION(PARTITION_OPTION)]
    [COMPRESSION [=] {AUTO|NONE|GZIP|DEFLATE|ZSTD}]
    [{FIELDS | COLUMNS}
        [TERMINATED BY 'string']
        [[OPTIONALLY] ENCLOSED BY 'char']
        [ESCAPED BY 'char']
    ]
    [LINES
        [STARTING BY 'string']
        [TERMINATED BY 'string']
    ]
    [IGNORE number {LINES | ROWS}]
    [(column_name_var
        [, column_name_var] ...)]

load_mode:
    'full'
    | 'inc_replace'

PARTITION_OPTION:
    partition_option_list
    | subpartition_option_list

-- Load data from a URL
LOAD DATA
    [/*+ PARALLEL(N) [load_batch_size(M)] [APPEND | direct(bool, int, [load_mode])] | NO_DIRECT */]
    [REPLACE | IGNORE]
    FROM { url_table_function_expr |
         ( SELECT expression_list FROM url_table_function_expr ) }
    INTO TABLE table_name
    [PARTITION(PARTITION_OPTION)]
    [(column_name_var [, column_name_var] ...)]
    [LOG ERRORS
        [INTO 'logfile_string']
        [REJECT LIMIT {integer | UNLIMITED}]
        [BADFILE 'badfile_string']]

load_mode:
    'full'
    | 'inc_replace'

url_table_function_expr:

  FILES (
    LOCATION = '<string>',
    {
      FORMAT = (
        TYPE = 'CSV',
        LINE_DELIMITER = '<string>' | <expr>,
        FIELD_DELIMITER = '<string>' | <expr>,
        PARSE_HEADER = { TRUE | FALSE },
        ESCAPE = '<character>' | <expr>,
        FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>,
        ENCODING = 'charset',
        NULL_IF = ('<string>' | <expr>, '<string>' | <expr> ...),
        SKIP_HEADER = <int>,
        SKIP_BLANK_LINES = { TRUE | FALSE },
        TRIM_SPACE = { TRUE | FALSE },
        EMPTY_FIELD_AS_NULL = { TRUE | FALSE }
      )
      | FORMAT = ( TYPE = 'PARQUET' | 'ORC' )
    },
    [PATTERN = '<regex_pattern>']
  )
  | SOURCE (
      TYPE = 'ODPS',
      ACCESSID = '<string>',
      ACCESSKEY = '<string>',
      ENDPOINT = '<string>',
      TUNNEL_ENDPOINT = '<string>',
      PROJECT_NAME = '<string>',
      SCHEMA_NAME = '<string>',
      TABLE_NAME = '<string>',
      QUOTA_NAME = '<string>',
      COMPRESSION_CODE = '<string>'
    )

PARTITION_OPTION:
    partition_option_list
    | subpartition_option_list

Parameters

Parameter	Description
parallel(N)	The degree of parallelism for data loading. Default value: 4.
load_batch_size(M)	The size of the batch to be inserted each time. The default value of `M` is `100`. We recommend that you set the value to a number in the range [100, 1000].
APPEND \| direct() \|NO_DIRECT	Specifies whether to enable direct load. Notice Do not upgrade OceanBase Database during a direct load task. Otherwise, the direct load task may fail. `APPEND` is equivalent to using `direct(true, 0)` by default. It also enables online statistics collection (the `GATHER_OPTIMIZER_STATISTICS` hint). `direct()` specifies whether to sort the data to be written. `true` indicates that the data needs to be sorted, and `false` indicates that the data does not need to be sorted. Note If the data is imported from an ordered file or a backup file that is sorted by the primary key, you can set `need_sort` to `false` to skip the sorting process. If the data is not sorted by the primary key and the sorting process is not explicitly enabled (that is, `need_sort` is set to `true`), an error is returned. `int` specifies the maximum number of error rows that can be tolerated. `load_mode` specifies the mode of direct load. This parameter is optional. You can specify the value as follows: `full`: the default value. This option indicates full direct load. Notice When you perform a full direct load on a table with a unique index, the system does not support the `REPLACE` or `IGNORE` option or the tolerance of error rows if duplicate unique index keys are found. `inc`: incremental direct load. This option supports the `INSERT` and `IGNORE` options. `inc_replace`: incremental direct load. This option does not check for duplicate primary keys and is equivalent to incremental direct load with the `REPLACE` option. Notice If `load_mode` is set to `inc_replace`, the `REPLACE` or `IGNORE` option cannot be specified in the `LOAD DATA` statement. `NO_DIRECT` specifies whether to force single-SQL execution without direct load. If the input SQL statement contains this hint, the entire statement ignores other direct load hints and executes a normal import. For more information about direct load, see Use the LOAD DATA statement for direct load.
REMOTE_OSS \| LOCAL	Optional. `REMOTE_OSS` specifies whether to read data from an object storage service. If you specify this parameter, it must be the endpoint of an object storage service. The endpoint must be in the `oss://` format for Alibaba Cloud OSS, the `s3://` format for AWS S3, or the `s3://` format for other object storage services that are compatible with S3, such as OBS and GCS. `LOCAL` specifies whether to read data from the local file system of the client. If you do not specify the `LOCAL` parameter, the system reads data from the file system of the server (OBServer node). For more information about this parameter, see Use the LOAD DATA statement for direct load.
file_name	The path and name of the input file. `file_name` can be in the following formats: If the import file is stored on an OBServer node or the client: `/$PATH/$FILENAME`. If the import file is stored on an object storage service: `oss://$PATH/$FILENAME/?host=$HOST&access_id=$ACCESS_ID&access_key=$ACCESSKEY`. The parameters are described as follows: `$PATH`: the path of the file in the bucket. `$FILENAME`: the name of the file to be accessed. `$HOST`: the endpoint of the object storage service, or the domain name of the CDN-accelerated endpoint. `$ACCESS_ID`: the Access Key ID required to access the object storage service. It is used for authentication. `$ACCESSKEY`: the Access Key secret required to access the object storage service. It is used for authentication. Note When you import a file from an object storage service, make sure that the following conditions are met: You have the necessary permissions to access the bucket and the file. You must have sufficient permissions to read the specified bucket and file. You can set the access permissions in the OSS console or by using the OSS API. You must also configure the Access Key ID and Access Key secret as valid credentials. The database server can connect to the specified `$HOST` endpoint to access the object storage service. If you use the CDN-accelerated endpoint of the object storage service, make sure that the CDN configuration is correct and that the network connection is normal.
REPLACE \| IGNORE	If a unique key conflict occurs, `REPLACE` indicates that the conflicting row will be overwritten, and `IGNORE` indicates that the conflicting row will be ignored. `LOAD DATA` determines whether data is duplicated based on the table's primary key. If the table does not have a primary key, the `REPLACE` and `IGNORE` options have no effect. By default, when duplicate data is encountered, `LOAD DATA` records the error data in the log file. Notice When you execute the `LOAD DATA LOCAL INFILE` command in MySQL mode, the system automatically adds the `IGNORE` option. This behavior improves compatibility with MySQL databases. If you specify the `REPLACE` or `IGNORE` clause and the degree of parallelism is greater than `1`, the last inserted record for conflicting rows may differ from the result when executed in serial mode. If you need to strictly ensure the insertion result of conflicting records, do not specify the degree of parallelism (or set it to `1`).
url_table_function_expr	Optional. Specifies the data source from which data is read by using the FILES and SOURCE keywords. FILES specifies the address and format of the data to be imported. For more information about FILES, see FILES below. SOURCE specifies the data source in ODPS format for data extraction. For more information about SOURCE, see SOURCE below.
table_name	The name of the table to which data is imported. Supports partitioned and non-partitioned tables. Supports specifying any number of columns in the table.
PARTITION_OPTION	Specifies the partition name during direct load. partition_option_list: a list of partitions to be inserted. Multiple partitions are separated by commas (,). subpartition_option_list: a list of subpartitions to be inserted. Multiple subpartitions are separated by commas (,). Note Specifying partitions is only supported during direct load. If you do not add the direct load hint or set the direct load configuration item, specifying partitions during LOAD DATA is ineffective.
COMPRESSION	Specifies the compression format of the file. `AUTO`: automatically detects the compression algorithm based on the file name suffix. When you use the `AUTO` parameter, different suffixes correspond to different compression formats. `.gz`: GZIP compression. `.deflate`: DEFLATE compression. `.zst/.zstd`: ZSTD compression. `NONE`: indicates that the file is not compressed. `GZIP`: GZIP compression. `DEFLATE`: DEFLATE compression without metadata. `ZSTD`: ZSTD compression. You can explicitly specify the compression format of the file or let the program detect the compression format based on the file name suffix.
FIELDS \| COLUMNS	Specifies the format of the fields. `ENCLOSED BY` specifies the modifier for the exported value. `TERMINATED BY` specifies the delimiter for the exported column. `ESCAPED BY` specifies the character to be ignored for the exported value.
LINES STARTING BY	Specifies the starting character of a line.
LINES TERMINATED BY	Specifies the ending character of a line.
IGNORE number { LINES \| ROWS }	Specifies the number of lines to be ignored. `LINES` specifies the number of lines at the beginning of the file, and `ROWS` specifies the number of rows at the beginning of the file based on the field delimiter. By default, the program maps each field in the input file to a column in the table. If the input file does not contain all the columns, the missing columns are filled with the following default values: For character types: an empty string. For numeric types: 0. For date types: `0000-00-00`. Note When you import data from multiple files, the behavior is the same as when you import data from a single file.
column_name_var	Optional. Specifies the name of the imported column.
LOG ERRORS	Optional. Specifies whether to enable error diagnostics during the import of an external table from a URL. For more information, see log_errors.

FILES

The FILES keyword is composed of the LOCATION, FORMAT, and PATTERN clauses.

The LOCATION clause specifies the path where the external table files are stored. Usually, the data files of an external table are stored in a separate directory, which can contain subdirectories. When you create an external table, it automatically collects all files in the directory.
- The format of a local LOCATION is LOCATION = '[file://] local_file_path', where local_file_path can be a relative or absolute path. If you specify a relative path, the current directory must be the installation directory of OceanBase Database. The secure_file_priv parameter specifies the file paths that OBServer nodes have permission to access. local_file_path must be a subpath of the secure_file_priv path.
- The format of a remote LOCATION is as follows:
  - LOCATION = '{oss|S3}://$ACCESS_ID:$ACCESS_KEY@$HOST:s3_region/remote_file_path' where $ACCESS_ID, $ACCESS_KEY, and $HOST are the access information required for accessing OSS and S3, and s3_region is the region information selected when using S3. These sensitive access information are stored in the system tables of the database in an encrypted manner.
  - LOCATION = 'hdfs://$ {hdfs_namenode_address}:${port}/PATH.localhost' where port is the port number of HDFS, and PATH is the directory path in HDFS.
    - For Kerberos authentication: LOCATION = 'hdfs://localhost:port/user?principal=xxx&keytab=xxx&krb5conf=xxx&configs=xxx'. Where:
      - principal: the user for login authentication.
      - keytab: the path of the user authentication key file.
      - krb5conf: the path of the Kerberos environment description file.
      - configs: additional HDFS configuration items. By default, this parameter is empty. However, in a Kerberos environment, this parameter usually has a value and needs to be configured, for example: dfs.data.transfer.protection=authentication,privacy, which specifies the data transmission protection level as authentication and privacy.
  Notice
  
  When using an object storage path, the parameters of the object storage path are separated by the & symbol. Make sure that the parameter values you enter contain only uppercase and lowercase letters, numbers, \/-_$+=, and wildcards. If you enter other characters, the settings may fail.
The FORMAT clause specifies properties related to the file reading format and supports three file formats: CSV, PARQUET, and ORC.
- When TYPE = 'CSV', the following fields are included:
  - LINE_DELIMITER: specifies the line delimiter of the CSV file. The default value is LINE_DELIMITER='\n'.
  - FIELD_DELIMITER: an optional field that specifies the column delimiter of the CSV file. The default value is FIELD_DELIMITER='\t'.
  - PARSE_HEADER: an optional field that specifies whether the first line of the CSV file is the column name for each column. The default value is FALSE, indicating that the first line of the CSV file is not specified as the column name for each column.
  - ESCAPE: specifies the escape character of the CSV file. It can only be one byte, with the default value of ESCAPE ='\'.
  - FIELD_OPTIONALLY_ENCLOSED_BY: an optional field that specifies the symbol used to enclose field values in the CSV file. The default value is empty. Using this option indicates that only certain types of fields (such as CHAR, VARCHAR, TEXT, and JSON) are enclosed.
  - ENCODING: specifies the character set encoding format of the file. If not specified, the default value is UTF8MB4.
  - NULL_IF: specifies the string to be treated as NULL. The default value is empty.
  - SKIP_HEADER: skips the file header and specifies the number of lines to skip.
  - SKIP_BLANK_LINES: specifies whether to skip blank lines. The default value is FALSE, indicating that blank lines are not skipped.
  - TRIM_SPACE: specifies whether to remove leading and trailing spaces from fields in the file. The default value is FALSE, indicating that leading and trailing spaces in fields are not removed.
  - EMPTY_FIELD_AS_NULL: specifies whether to treat an empty string as NULL. The default value is FALSE, indicating that an empty string is not treated as NULL.
- When TYPE = 'PARQUET/ORC', there are no additional fields.
The PATTERN clause specifies a regular expression pattern to filter files in the LOCATION directory. For each file path in the LOCATION directory, if it matches the pattern, the external table accesses the file; otherwise, it skips the file. If this parameter is not specified, the external table accesses all files in the LOCATION directory by default.

SOURCE

The SOURCE keyword does not include other clauses. In this case, TYPE = 'ODPS' and the following fields are included:

ACCESSID: specifies the ID of the ODPS user.
ACCESSKEY: specifies the password of the ODPS user.
ENDPOINT: specifies the connection address of the ODPS service.
TUNNEL_ENDPOINT: specifies the connection address of the Tunnel data transmission service.
PROJECT_NAME: specifies the project where the queried table is located.
SCHEMA_NAME: an optional field that specifies the schema of the queried table.
TABLE_NAME: specifies the name of the queried table.
QUOTA_NAME: an optional field that specifies whether to use the specified quota.
COMPRESSION_CODE: an optional field that specifies the compression format of the data source. Supported compression formats include ZLIB, ZSTD, LZ4, and ODPS_LZ4. If not specified, compression is not enabled.

log_errors

LOG ERRORS: Enables error diagnostics during the import process, allowing failed rows to be recorded instead of terminating the entire operation at the first error. When used with the REJECT LIMIT clause, it controls the maximum number of rows that can be rejected.
INTO 'logfile_string': Optional. Specifies the file in the target directory where error information will be written. If not specified, error information is only recorded in the warning buffer, which can be viewed using show warnings. logfile_string indicates the directory for storing error information, with the following format:

Note

The INTO 'logfile_string' parameter is supported starting from V4.4.0.
- When error information is stored locally, logfile_string is in the format [file://] local_file_path, where local_file_path can be a relative or absolute path. If a relative path is specified, the current directory must be the installation directory of OceanBase Database. secure_file_priv specifies the file paths that OBServer nodes are allowed to access. local_file_path must be a subpath of the secure_file_priv directory.
- When error information is stored remotely (refer to the Location section in the syntax for creating external tables), logfile_string is in the following format:
  - {oss\|s3}://$ACCESS_ID:$ACCESS_KEY@$HOST:s3_region/remote_file_path, where $ACCESS_ID, $ACCESS_KEY, and $HOST are the access information required for accessing Alibaba Cloud OSS, AWS S3, and object storage compatible with the S3 protocol, respectively. s3_region specifies the region selected when using S3. These sensitive access details are stored in the system tables of the database in an encrypted format.
  - hdfs://localhost:port/PATH, where localhost is the HDFS address, port is the HDFS port number, and PATH is the directory path in HDFS. For Kerberos authentication, the address is: hdfs://localhost:port/user?principal=xxx&keytab=xxx&krb5conf=xxx&configs=xxx.
OceanBase Database allows you to set tenant-level parameters to configure the compression algorithm and the maximum size of each diagnostic log file. For more information, see load_data_diagnosis_log_compression and load_data_diagnosis_log_max_size.
REJECT LIMIT: Optional. Specifies the maximum number of rows that can be rejected:
- Default value: 0. No rows can be rejected. The operation fails at the first error.
- integer: The maximum number of rows that can be rejected on a single machine. For example, 10 means that up to 10 rows can be rejected on one machine.
- UNLIMITED: No limit on the number of rejected rows.
BADFILE 'badfile_string': Specifies the path for storing error data files. The format of badfile_string is the same as that of logfile_string.

Note

The BADFILE 'badfile_string' parameter is supported starting from V4.4.0.

Notice

When the LOG ERRORS clause is not specified, the default behavior is to import normally, which means that the operation will fail immediately upon encountering the first error.

If the LOG ERRORS clause is specified but the REJECT LIMIT clause is not, it is equivalent to setting the LIMIT to 0. In this case, the operation will fail at the first error, but the first error will be recorded. The error code will be related to the diagnostic error, specifically "reject limit reached".

View error logs

OceanBase Database supports the following SQL statements to view the exported error log information:

SELECT * FROM READ_ERROR_LOG('diagnosis_log_path');

Here, diagnosis_log_path indicates the path of the error log. When executed, this SQL statement is equivalent to the following URL external table query:

SELECT *
FROM FILES (
    LOCATION = 'diagnosis_log_path/'
    FORMAT(
        TYPE = 'csv'
        FIELD_DELIMITER = ','
        FIELD_OPTIONALLY_ENCLOSED_BY = '\',
        PARSE_HEADER  = true
        )
    [, PATTERN = 'filename']
);

Here are some examples:

When the specified log path is a filename (not ending with /):

SELECT * FROM READ_ERROR_LOG('diagnosis/log/path/filename');

The corresponding URL external table statement is as follows (the filename will be used as a pattern to filter files):

SELECT *
FROM FILES (
    LOCATION = 'diagnosis/log/path/',
    FORMAT (
        TYPE = 'csv'
        FIELD_DELIMITER = ','
        FIELD_OPTIONALLY_ENCLOSED_BY = '\',
        PARSE_HEADER = true
        ),
    PATTERN = 'filename'
);

When the specified log path is a folder (ending with /):

SELECT * FROM READ_ERROR_LOG('diagnosis/log/path/');

The corresponding URL external table statement is as follows (no pattern will be used):

SELECT *
FROM FILES (
    LOCATION = 'diagnosis/log/path/',
    FORMAT (
        TYPE = 'csv'
        FIELD_DELIMITER = ','
        FIELD_OPTIONALLY_ENCLOSED_BY = '\',
        PARSE_HEADER = true
        )
);

Rules for using wildcards in multi-file direct load

To facilitate multi-file import, the wildcard feature is introduced for server-side and OSS file imports, but not for client-side file imports.

Server-side wildcard usage
- Matching rules:
  - Matching a file: load data /*+ parallel(20) direct(true, 0) */ infile '/xxx/test.*.csv' replace into table t1 fields terminated by '|';
  - Matching a directory: load data /*+ parallel(20) direct(true, 0) */ infile '/aaa*bb/test.1.csv' replace into table t1 fields terminated by '|';
  - Matching both a directory and a file: load data /*+ parallel(20) direct(true, 0) */ infile '/aaa*bb/test.*.csv' replace into table t1 fields terminated by '|';
- Considerations:
  - At least one matching file must exist. Otherwise, an error code 4027 is returned.
  - For the input load data /*+ parallel(20) direct(true, 0) */ infile '/xxx/test.1*.csv,/xxx/test.6*.csv' replace into table t1 fields terminated by '|';, /xxx/test.1*.csv,/xxx/test.6*.csv is considered as a whole match. If no match is found, an error code 4027 is returned.
  - Only POSIX-compatible GLOB functions are supported. For example, test.6*(6|0).csv and test.6*({0.csv,6.csv}|.csv) can be found using the ls command, but GLOB functions cannot match them, resulting in an error code 4027.
Wildcard usage in Cloud Object Storage Service (OSS)
- Matching rules:
  
  Matching a file: load data /*+ parallel(20) direct(true, 0) */ remote_oss infile 'oss://xxx/test.*.csv?host=xxx&access_id=xxx&access_key=xxx' replace into table t1 fields terminated by '|';
- Considerations:
  - Directory matching is not supported. For example, load data /*+ parallel(20) direct(true, 0) */ remote_oss infile 'oss://aa*bb/test.*.csv?host=xxx&access_id=xxx&access_key=xxx' replace into table t1 fields terminated by '|'; will return OB_NOT_SUPPORTED.
  - Only * and ? are supported as filename wildcards. Other wildcards, although allowed, cannot match any results.

Examples

Note

You can use \N to represent NULL when you use the LOAD DATA statement to load data.

Import data from a server-side file

Example 1: Import data from a server-side file.

Set the global secure file path.
```
obclient> SET GLOBAL secure_file_priv = "/"
Query OK, 0 rows affected
obclinet> \q
Bye
```
Note

Because secure_file_priv is a GLOBAL variable, you must execute \q to make it take effect.

After you reconnect to the database, import data from an external file.

obclient> LOAD DATA INFILE 'test.sql' INTO TABLE t1;
Query OK, 0 rows affected

Example 2: Use the APPEND hint to enable direct load.

LOAD DATA /*+ PARALLEL(4) APPEND */
   INFILE '/home/admin/a.csv'
   INTO TABLE t;

Example 3: Import a CSV file.

Import all columns from the test1.csv file.

load data  /*+ direct(true,0) parallel(2)*/
from files(
  location = "data/csv",
  format = (
    type = 'csv',
    field_delimiter = ',',
    parse_header = true,
    skip_blank_lines = true
  ),
  pattern = 'test1.csv')
into table t1;

Read the c1 and c2 columns from the test1.csv file in the data/csv directory and import them to the col1 and col2 columns of the t1 table.

load data  /*+ direct(true,0) parallel(2)*/
from (
  select c1, c2 from files(
      location = 'data/csv'
        format = (
        type = 'csv',
        field_delimiter = ',',
        parse_header = true,
        skip_blank_lines = true
      ),
      pattern = 'test1.csv'))
into table t1 (col1, col2);

Example 4: Import a PARQUET file.

load data  /*+ direct(true,0) parallel(2)*/
from files(
  location = "data/parquet",
  format = ( type = 'PARQUET'),
  pattern = 'test1.parquet')
into table t1;

Example 5: Import an ORC file.

load data  /*+ direct(true,0) parallel(2)*/
from files(
  location = "data/orc",
  format = ( type = 'ORC'),
  pattern = 'test1.orc')
into table t1;

Example 5: Import an ODPS file.

load data  /*+ direct(true,0) parallel(2)*/
from source (
  type = 'ODPS',
  accessid = '$ODPS_ACCESSID',
  accesskey = '******',
  endpoint= '$ODPS_ENDPOINT',
  project_name = 'example_project',
  schema_name = '',
  table_name = 'example_table',
  quota_name = '',
  compression_code = '')
into table t1;

Import data from a client-side file

Example 1: Import data from a local file to a table in OceanBase Database.

Open the terminal or command prompt window and execute the following command to start the client.

obclient --local-infile -hxxx.xxx.xxx.xxx -P2881 -uroot@mysql001 -p****** -A -Dtest

The execution result is as follows:

Welcome to the OceanBase.  Commands end with ; or \g.
Your OceanBase connection id is 3221719526
Server version: OceanBase 4.2.2.0 (r100000022023121309-f536833402c6efe9364d5a4b61830a858ef24d82) (Built Dec 13 2023 09:58:18)

Copyright (c) 2000, 2018, OceanBase and/or its affiliates. All rights reserved.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

obclient [test]>

Notice

To use the LOAD DATA LOCAL INFILE feature, you must use OBClient V2.2.4 or later. If you do not have an OBClient client that specifies the version, you can also use a MySQL client to connect to the database.

In the client, execute the LOAD DATA LOCAL INFILE statement to load the local data file.

obclient [test]> LOAD DATA LOCAL INFILE '/home/admin/test_data/tbl1.csv' INTO TABLE tbl1 FIELDS TERMINATED BY ',';

The execution result is as follows:

Query OK, 3 rows affected
Records: 3  Deleted: 0  Skipped: 0  Warnings: 0

Example 2: Directly import a compressed file by setting the COMPRESSION parameter.

LOAD DATA LOCAL INFILE '/your/file/lineitem.tbl.gz'
    INTO TABLE lineitem
    COMPRESSION GZIP
    FIELDS TERMINATED BY '|';

Example 3: Specify a partition for direct load by using the PARTITION clause.

Specify a partition for direct load.

load data /*+ direct(true,0) parallel(2) load_batch_size(100) */
infile "$FILE_PATH"
into table t1 partition(p0, p1)
fields terminated by '|' enclosed by '' lines starting by '' terminated by '\n';

Specify a subpartition for direct load.

load data /*+ direct(true,0) parallel(2) load_batch_size(100) */
infile "$FILE_PATH"
into table t1 partition(p0sp0, p1sp1)
fields terminated by '|' enclosed by '' lines starting by '' terminated by '\n';

Import data from an OSS file

Example 1: Use the direct(bool, int) hint to enable direct load. The direct load file is stored in OSS.

load data /*+ parallel(1) direct(false,0)*/ remote_oss infile 'oss://antsys-oceanbasebackup/backup_rd/xiaotao.ht/lineitem2.tbl?host=***.oss-cdn.***&access_id=***&access_key=***' into table lineitem fields terminated by '|' enclosed by '' lines starting by '' terminated by '\n';

Import data from a server-side file as a URL external table

Notice

The IP address in the sample commands has been desensitized. Please replace it with the actual IP address of your machine when verifying.

The following example demonstrates how to import data from an external file located on the server (OBServer node) and create an external table in MySQL mode of OceanBase Database. The steps are as follows:

Create directories on the OBServer node. The /home/admin/test_csv directory stores the external data, the /home/admin/test_into directory stores the error log, and the /home/admin/test_badfile directory stores the error data file.
```
[admin@xxx /home/admin]# mkdir -p /home/admin/{test_csv,test_into,test_badfile}
```
Prepare the external file. In the /home/admin/test_csv directory, create a file named type_cast.csv.
```
[admin@xxx /home/admin/test_csv]# vi type_cast.csv
```
The content of the file is as follows:
```
1,2,3
2,4,af
3,4,5
ds,6,32
4,5,6
5,2,3
6,v4,af
7,4,5
kj,a6,32
8,5,6
```
Set the import file path.

Notice

For security reasons, when you set the secure_file_priv system variable, you can only connect to the database through a local socket to execute the SQL statement that modifies the global variable. For more information, see secure_file_priv.
1. Execute the following command to log in to the server where the OBServer node is located.
```
ssh admin@10.10.10.1
```
2. Execute the following command to connect to the mysql001 tenant through a local Unix socket.
```
obclient -S /home/admin/oceanbase/run/sql.sock -uroot@mysql001 -p******
```
3. Execute the following SQL statement to set the import and export to unlimited.
```
SET GLOBAL secure_file_priv = "/";
```

Reconnect to the mysql001 tenant.

Here is an example:

obclient -h10.10.10.1 -P2881 -uroot@mysql001 -p****** -A -Ddb_test

Create a table named test_tbl1.

CREATE TABLE test_tbl1(col1 INT, col2 INT, col3 INT);

Set the compression algorithm for diagnostic logs to AUTO.
```
ALTER SYSTEM SET load_data_diagnosis_log_compression = 'AUTO';
```
For more information about how to set the compression algorithm for diagnostic logs, see load_data_diagnosis_log_compression.
Set the maximum size of a single diagnostic log file to 1 KB. If the size of the exported log exceeds 1 KB, a second file will be automatically generated, and the export will continue.
```
ALTER SYSTEM SET load_data_diagnosis_log_max_size = '1K';
```
For more information about how to set the size of a single diagnostic log file, see load_data_diagnosis_log_max_size.

Use the LOAD DATA statement to import data into the test_tbl1 table from a URL external table, specifying error diagnostics, with the error log stored in the /home/admin/test_into/ directory and the error data file stored in the /home/admin/test_badfile/ directory.

LOAD DATA FROM FILES(
    LOCATION = '/home/admin/test_csv/',
    FORMAT = (
        TYPE = 'csv',
        FIELD_DELIMITER = ','),
    PATTERN = 'type_cast.csv')
    INTO TABLE test_tbl1
    LOG ERRORS
        INTO '/home/admin/test_into/'
        REJECT LIMIT UNLIMITED
        BADFILE '/home/admin/test_badfile/';

The returned result is as follows:

Query OK, 6 rows affected, 4 warnings
Records: 6  Deleted: 0  Skipped: 0  Warnings: 4

Use the read_error_log statement to view the error log content.

SELECT * FROM READ_ERROR_LOG('/home/admin/test_into/');

The returned result is as follows:

+------------+---------------+-------------+-------------------------------------------------------------------------------------------------------------------+
| ERROR CODE | FILE NAME     | LINE NUMBER | ERROR MESSAGE                                                                                                     |
+------------+---------------+-------------+-------------------------------------------------------------------------------------------------------------------+
| -4226      | type_cast.csv | 4           | fail to scan file type_cast.csv at line 4 for column "db_test"."test_tbl1"."col1", error: Incorrect integer value |
| -4226      | type_cast.csv | 9           | fail to scan file type_cast.csv at line 9 for column "db_test"."test_tbl1"."col1", error: Incorrect integer value |
| -4226      | type_cast.csv | 7           | fail to scan file type_cast.csv at line 7 for column "db_test"."test_tbl1"."col2", error: Incorrect integer value |
| -4226      | type_cast.csv | 2           | fail to scan file type_cast.csv at line 2 for column "db_test"."test_tbl1"."col3", error: Incorrect integer value |
+------------+---------------+-------------+-------------------------------------------------------------------------------------------------------------------+
4 rows in set

View the data in the test_tbl1 table.

SELECT * FROM test_tbl1;

The returned result is as follows:

+------+------+------+
| col1 | col2 | col3 |
+------+------+------+
|    1 |    2 |    3 |
|    3 |    4 |    5 |
|    4 |    5 |    6 |
|    5 |    2 |    3 |
|    7 |    4 |    5 |
|    8 |    5 |    6 |
+------+------+------+
6 rows in set

View the content of the bad file through the URL external table.

SELECT *
FROM FILES (
    LOCATION = '/home/admin/test_badfile/',
    FORMAT (
        TYPE = 'csv',
        FIELD_DELIMITER = ','),
    PATTERN = 'data.bad');

The returned result is as follows:

+------+------+------+
| c1   | c2   | c3   |
+------+------+------+
| ds   | 6    | 32   |
| kj   | a6   | 32   |
| 6    | v4   | af   |
| 2    | 4    | af   |
+------+------+------+
4 rows in set

References

For more information about how to connect to OceanBase Database, see Overview of connection methods.
For more information about how to use the LOAD DATA statement, see Import data by using the LOAD DATA statement
For more information about how to use the LOAD DATA statement to perform a direct load, see Import data by using the LOAD DATA statement

OceanBase

Customer Stories

Documentation

LOAD DATA

Purpose

Note

Considerations

Syntax

Parameters

Notice

Note

Notice

Notice

Note

Notice

Note

Note

FILES

Notice

SOURCE

log_errors

Note

Note

Notice

View error logs

Rules for using wildcards in multi-file direct load

Examples

Note

Import data from a server-side file

Note

Import data from a client-side file

Notice

Import data from an OSS file

Import data from a server-side file as a URL external table

Notice

Notice

References