LOAD DATA|V4.6.0|OceanBase Database| docs|Distributed Database

LOAD DATA

Last Updated：2026-05-07 11:26:25 Updated

Overview

This statement is used to import data from an external source. OceanBase Database provides two forms of the LOAD DATA statement, each with a different execution path and use case:

LOAD DATA INFILE: This form is compatible with MySQL syntax and supports only CSV format. The file source can be specified using INFILE, LOCAL, or REMOTE_OSS to indicate whether the file is located on the server, client, or in Object Storage Service (OSS).
LOAD DATA FROM: This form is recommended. It reads data from a URL using a URL table and supports multiple formats, including CSV, PARQUET, ORC, and ODPS. The execution path differs from the INFILE form and is better suited for batch import scenarios such as AP.

The syntax, parameters, and diagnostic methods differ significantly between the two forms. Please refer to the corresponding sections below based on the actual form you are using.

General considerations

You cannot use the LOAD DATA statement for tables that have triggers.
To import data from an external file, you must have the FILE privilege and the following settings:
- When loading server-side files, you must set the system variable secure_file_priv to specify the path for importing or exporting files.
- When loading client-side local files, you must add the --local-infile[=1] option when starting the MySQL/OBClient client to enable the feature of loading data from the local file system.
When using partitioned table direct load, the target table cannot be a replica table, and it cannot contain auto-increment columns, identity columns, or global indexes.
To improve import efficiency, we recommend that you create the base table first and then create indexes after the import is complete. If global indexes are involved, make sure to create them after the import is complete, otherwise, an error not support may occur.

1. LOAD DATA INFILE

This form is compatible with the LOAD DATA syntax of MySQL. It only supports CSV format. It can load files from the server (INFILE), client local files (LOCAL), or OSS files (REMOTE_OSS). The import process involves parsing the file, distributing the data, and inserting it.

Note

When you execute LOAD DATA LOCAL INFILE, the system automatically adds the IGNORE option.
If the file to be imported is located on the local disk of an OBServer node, execute this statement directly on the OBServer node. Do not execute it through a Proxy.

Syntax of LOAD DATA INFILE

LOAD DATA
    [/*+ PARALLEL(N) [load_batch_size(M)] [APPEND | direct(bool, int, [load_mode])] | NO_DIRECT */]
    [REMOTE_OSS | LOCAL] INFILE 'file_name'
    [REPLACE | IGNORE]
    INTO TABLE table_name [PARTITION(PARTITION_OPTION)]
    [CHARACTER SET 'charset_name']
    [COMPRESSION [=] {AUTO|NONE|GZIP|DEFLATE|ZSTD}]
    [{FIELDS | COLUMNS}
        [TERMINATED BY 'string']
        [[OPTIONALLY] ENCLOSED BY 'char']
        [ESCAPED BY 'char']
    ]
    [LINES
        [STARTING BY 'string']
        [TERMINATED BY 'string']
    ]
    [IGNORE number {LINES | ROWS}]
    [(column_name_var
        [, column_name_var] ...)]
    [SET col_name = {expr | DEFAULT}
        [, col_name = {expr | DEFAULT}] ...]

load_mode:
    'full'
    | 'inc_replace'

PARTITION_OPTION:
    partition_option_list
    | subpartition_option_list

Support for Location objects in LOAD DATA INFILE and SELECT INTO OUTFILE

Overview

LOAD DATA INFILE and SELECT INTO OUTFILE support specifying the file path by using Location objects. If the AccessKey (AK) and SecretKey (SK) are not explicitly specified, the system automatically matches the Location object based on the longest path matching principle.

LOAD DATA INFILE

Automatic matching mechanism

If the import path is OSS and the AK and SK are not explicitly specified, the system automatically matches the path with the Location object that the current user has permissions for, and selects the corresponding AK and SK based on the longest path matching principle.

Note

The type of the Location object must be OSS path.

Syntax example

Specify the path by using the @location_name 'relative_path' format, where the part after @ is the name of the Location object, and the string is the subpath relative to the object:

LOAD DATA REMOTE_OSS INFILE @oss_t 'load_data_test.csv'
INTO TABLE load_table
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n';

In this example, @oss_t is the name of the Location object, and 'load_data_test.csv' is the subpath relative to the object.

SELECT INTO OUTFILE

Automatic matching mechanism

If the export path is OSS or HDFS and the AK and SK are not explicitly specified, the system automatically matches the path with the Location object that the current user has permissions for, and selects the corresponding AK and SK based on the longest path matching principle.

Note

The type of the Location object can be any of the following: OSS, HDFS, and so on.

Syntax example

Specify the path by using the @location_name 'relative_path' format, where the part after @ is the name of the Location object, and the string is the subpath relative to the object:

SELECT *
INTO OUTFILE @oss_t 'select_into_test.csv'
FORMAT = (
    TYPE = 'CSV',
    FIELD_DELIMITER = ',',
    LINE_DELIMITER = '\n'
)
FROM export_table;

In this example, @oss_t is the name of the Location object, and 'select_into_test.csv' is the subpath relative to the object.

Parameters (applicable only to the INFILE form)

Parameter	Description
parallel(N)	The degree of parallelism for data loading. The default value is 4. We recommend that you set the value to a number within the range [0, maximum number of CPU cores of the tenant].
load_batch_size(M)	The batch size for each insert. The default value of `M` is `100`. We recommend that you set the value to a number within the range [100, 1000].
APPEND \| direct() \|NO_DIRECT	Use the hint to enable direct load. Notice Do not perform an upgrade of OceanBase Database during a direct load task, because this may cause the direct load task to fail. `APPEND` is equivalent to using `direct(true, 0)` by default. It also supports the `GATHER_OPTIMIZER_STATISTICS` hint. `direct()` is described as follows: `bool` indicates whether the data to be written needs to be sorted. `true` indicates that the data needs to be sorted. `false` indicates that the data does not need to be sorted. Note When you import an ordered file or backup data (which is sorted by the primary key), you can set `need_sort` to `false` to skip the sorting process. If the data is not sorted by the primary key and sorting is not explicitly enabled (i.e., `need_sort` is set to `true`), an error will be triggered. `int` indicates the maximum number of error rows that can be tolerated. `load_mode` is an optional parameter that specifies the mode of direct load. The value must be enclosed in single quotation marks. Valid values: `full`: The default value. This indicates a full load. Notice When you perform a full direct load on a table with a unique index, the `REPLACE` or `IGNORE` keyword cannot be specified, and the number of error rows that can be tolerated cannot be set if duplicate unique index keys are found. `inc`: This indicates an incremental load. It supports the `INSERT` and `IGNORE` semantics. `inc_replace`: This indicates an incremental load, but it does not check for duplicate primary keys. This is equivalent to an incremental load with the `REPLACE` semantics. Notice If `load_mode` is set to `inc_replace`, the `REPLACE` or `IGNORE` keyword cannot be specified in the `LOAD DATA` statement. `NO_DIRECT`: This forces a single SQL statement to not use direct load. If the SQL statement contains this hint, it ignores other direct load hints and performs a normal load.
REMOTE_OSS \| LOCAL	Optional. `REMOTE_OSS` specifies whether to read data from an object storage service. If you use this parameter, you must specify the endpoint of an object storage service. Supported object storage services: Alibaba Cloud Object Storage Service (OSS) (with the endpoint in the `oss://` format), Amazon Simple Storage Service (S3) (with the endpoint in the `s3://` format), and object storage services compatible with S3, such as OBS and GCS (with the endpoint in the `s3://` format). `LOCAL` specifies whether to read data from the local file system of the client. If you do not use the `LOCAL` parameter, data is read from the file system of the server (OBServer node).
file_name	The path and name of the input file. `file_name` can be in the following formats: When the import file is stored on an OBServer node or client: `/$PATH/$FILENAME`. When the import file is stored on an object storage service: `oss://$PATH/$FILENAME/?host=$HOST&access_id=$ACCESS_ID&access_key=$ACCESSKEY`. The parameters are described as follows: `$PATH`: The path of the file in the bucket. `$FILENAME`: The name of the file. `$HOST`: The host name or the CDN-accelerated domain name of the object storage service. This is the endpoint of the object storage service. `$ACCESS_ID`: The Access Key ID required to access the object storage service. This is used for authentication. `$ACCESSKEY`: The Access Key Secret required to access the object storage service. This is used for authentication. Note When you import a file from an object storage service, make sure that the following conditions are met: You have the necessary permissions to access the bucket and file. You must have sufficient permissions to read the specified bucket and file. This typically requires setting up access permissions in the OSS console or through OSS API and configuring the Access Key ID and Access Key Secret as appropriate credentials. The database server can connect to the specified `$HOST` endpoint to access the object storage service. If you use the CDN-accelerated domain name of the object storage service, make sure that the CDN is configured correctly and the network connection is normal.
REPLACE \| IGNORE	If a unique key conflict occurs, `REPLACE` indicates that the conflicting row will be overwritten, and `IGNORE` indicates that the conflicting row will be ignored. `LOAD DATA` determines whether data is duplicated based on the table's primary key. If the table does not have a primary key, the `REPLACE` and `IGNORE` options have no effect. By default, when duplicate data is encountered, `LOAD DATA` logs the error data to the log file. Notice When you execute the `LOAD DATA LOCAL INFILE` command in MySQL mode, the system automatically adds the `IGNORE` option. This behavior improves compatibility with MySQL databases. If you specify the `REPLACE` or `IGNORE` clause and the degree of parallelism is greater than `1`, the last inserted record for conflicting rows may differ from the result of a serial execution. If you require strict consistency for conflicting records, do not specify the degree of parallelism or set it to `1`.
table_name	The name of the table to import data into. Supports partitioned and non-partitioned tables. Supports specifying any number of columns in the table.
CHARACTER SET 'charset_name'	Optional. Specifies the character set used for the source file. If not specified, the system defaults to UTF-8.
PARTITION_OPTION	Specifies the partition name for direct load. partition_option_list: a list of partitions to insert. Separate multiple partitions with commas (,). subpartition_option_list: a list of subpartitions to insert. Separate multiple subpartitions with commas (,). Note Specifying partitions is only supported for direct load. It is not supported for regular LOAD DATA. That is, if you do not add a direct load hint or set a direct load configuration, specifying partitions during LOAD DATA has no effect.
COMPRESSION	Specifies the compression format of the file. `AUTO`: automatically detects the compression algorithm based on the file extension. When you use the `AUTO` parameter, different extensions correspond to different compression formats. `.gz`: GZIP compression. `.deflate`: DEFLATE compression. `.zst/.zstd`: ZSTD compression. `NONE`: indicates that the file is not compressed. `GZIP`: GZIP compression. `DEFLATE`: GZIP compression without metadata. `ZSTD`: ZSTD compression. You can explicitly specify the compression format or let the program detect the compression format based on the file extension.
FIELDS \| COLUMNS	Specifies the format of the fields. `ENCLOSED BY` : specifies the modifier for the exported value. `TERMINATED BY` : specifies the delimiter for the exported column. `ESCAPED BY` : specifies the character to ignore for the exported value.
LINES STARTING BY	Specifies the starting character for a line.
LINES TERMINATED BY	Specifies the ending character for a line.
IGNORE number { LINES \| ROWS }	Specifies the number of rows to ignore. `LINES` indicates the number of rows at the beginning of the file, and `ROWS` indicates the number of rows at the beginning of the file specified by the field delimiter. By default, the system maps each field in the input file to the corresponding column in the table. If the input file does not contain all the columns, the missing columns are filled with default values based on the following rules: For character types: an empty string. For numeric types: 0. For date types: `0000-00-00`. Note For multiple files, the behavior is the same as for a single file.
column_name_var	Optional. Specifies the name of the imported column. To represent a null value in the file, use `\N`.
SET col_name = {expr \| DEFAULT}	Optional. Specifies an expression or default value for a column not read from the file. For example, `SET dates=CURRENT_TIMESTAMP` sets the `dates` column to the current timestamp.

Supported format: This format only supports the CSV format.

Bypass mode and non-bypass mode: You can use the Hint to enable bypass import to improve performance.

Wildcard rules for multi-file direct load (INFILE only)

To facilitate multi-file imports, wildcard support has been introduced for both server-side and OSS file imports, but not for client-side imports.

Server-side wildcard usage
- Matching rules: File name matching: load data /*+ parallel(20) direct(true, 0) */ infile '/xxx/test.*.csv' replace into table t1 fields terminated by '|';; directory matching is also supported, and it works similarly to file name matching.
- Considerations: At least one matching file must exist; otherwise, an error code 4027 is returned. Only POSIX-compliant GLOB functions are supported.
Wildcard usage in Alibaba Cloud Object Storage Service (OSS): File name matching: load data /*+ parallel(20) direct(true, 0) */ remote_oss infile 'oss://xxx/test.*.csv?host=xxx&access_id=xxx&access_key=xxx' replace into table t1 fields terminated by '|';. Directory matching is not supported. Only * and ? are supported as file name wildcards.

2. LOAD DATA FROM (Recommended)

This method reads data from a URL table and supports various formats such as CSV, PARQUET, ORC, and ODPS. The execution path is different from the INFILE method, making it suitable for batch import scenarios like AP. This is the currently recommended method. When executed, it is equivalent to INSERT INTO ... SELECT ...: LOAD DATA FROM FILES(...) INTO TABLE table_name is equivalent to INSERT INTO table_name SELECT * FROM FILES(...). The hint is the same as that of the INSERT statement.

LOAD DATA FROM syntax

Note

When you execute this form, it is equivalent to INSERT INTO table_name SELECT ... FROM url_table_function_expr. Therefore, the hints in this statement are consistent with those in the INSERT statement (such as direct and parallel). The load_batch_size parameter, which is specific to the INFILE form, is no longer supported.

LOAD DATA
    [/*+ insert_hint */]
    [REPLACE | IGNORE]
    FROM { url_table_function_expr |
         ( SELECT expression_list FROM url_table_function_expr ) }
    INTO TABLE table_name
    [PARTITION(PARTITION_OPTION)]
    [(column_name_var [, column_name_var] ...)]
    [LOG ERRORS
        [INTO 'logfile_string']
        [REJECT LIMIT {integer | UNLIMITED}]
        [BADFILE 'badfile_string']]

url_table_function_expr:

  FILES (
    LOCATION = '<string>',
    {
      FORMAT = (
        TYPE = 'CSV',
        LINE_DELIMITER = '<string>' | <expr>,
        FIELD_DELIMITER = '<string>' | <expr>,
        PARSE_HEADER = { TRUE | FALSE },
        ESCAPE = '<character>' | <expr>,
        FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>,
        ENCODING = 'charset',
        NULL_IF = ('<string>' | <expr>, '<string>' | <expr> ...),
        SKIP_HEADER = <int>,
        SKIP_BLANK_LINES = { TRUE | FALSE },
        TRIM_SPACE = { TRUE | FALSE },
        EMPTY_FIELD_AS_NULL = { TRUE | FALSE }
      )
      | FORMAT = ( TYPE = 'PARQUET' | 'ORC' )
    },
    [PATTERN = '<regex_pattern>']
  )
  | SOURCE (
      TYPE = 'ODPS',
      ACCESSID = '<string>',
      ACCESSKEY = '<string>',
      ENDPOINT = '<string>',
      TUNNEL_ENDPOINT = '<string>',
      PROJECT_NAME = '<string>',
      SCHEMA_NAME = '<string>',
      TABLE_NAME = '<string>',
      QUOTA_NAME = '<string>',
      COMPRESSION_CODE = '<string>'
    )

PARTITION_OPTION:
    partition_option_list
    | subpartition_option_list

Parameters (only for LOAD DATA FROM form)

Parameter	Description
Hint	Consistent with the hints in the INSERT statement (this form is equivalent to the INSERT INTO ... SELECT ... statement).
url_table_function_expr	Specifies the data source and format by using FILES or SOURCE.
table_name, PARTITION_OPTION	Same as in the INFILE form.
column_name_var	Optional. Specifies the name of the column to be imported.
LOG ERRORS	Optional. Enables error diagnostics during the import. For more information, see log_errors.

Supported formats: CSV, PARQUET, and ORC (by using FILES); and ODPS (by using SOURCE).

Bypass and non-bypass: You can enable bypass import by using a hint.

Diagnostics: Use LOG ERRORS, REJECT LIMIT, and BADFILE to record failed rows. Use READ_ERROR_LOG to view the error log. For more information, see log_errors.

FILES

The FILES keyword consists of the LOCATION, FORMAT, and PATTERN clauses.

The LOCATION clause specifies the path where the external table files are stored. Typically, the data files of the external table are stored in a separate directory, which can contain subdirectories. When a table is created, the external table automatically collects all files in this directory.
- For a local location, the format is LOCATION = '[file://] local_file_path', where local_file_path can be a relative or absolute path. If a relative path is specified, the current directory must be the installation directory of OceanBase Database. The secure_file_priv parameter specifies the file path that the OBServer node has permission to access. local_file_path must be a subpath of the secure_file_priv path.
- For a remote location, the format is as follows:
  - LOCATION = '{oss|S3}://$ACCESS_ID:$ACCESS_KEY@$HOST:s3_region/remote_file_path' where $ACCESS_ID, $ACCESS_KEY, and $HOST are the access information required to access OSS and S3, and s3_region is the region information selected when using S3. These sensitive access information are stored in the system tables of the database in an encrypted manner.
  - LOCATION = 'hdfs://$ {hdfs_namenode_address}:${port}/PATH.localhost' where port is the port number of HDFS, and PATH is the directory path in HDFS.
    - For Kerberos authentication: LOCATION = 'hdfs://localhost:port/user?principal=xxx&keytab=xxx&krb5conf=xxx&configs=xxx'. Where:
      - principal: the user for login authentication.
      - keytab: the path to the user's authentication key file.
      - krb5conf: the path to the Kerberos environment description file.
      - configs: additional HDFS configuration items. By default, it is empty. However, in a Kerberos environment, this configuration item usually has a value and needs to be configured, for example: dfs.data.transfer.protection=authentication,privacy, which specifies the data transmission protection level as authentication and privacy.
  Notice
  
  When using an object storage path, the parameters of the object storage path are separated by the & symbol. Ensure that the parameter values you enter contain only uppercase and lowercase letters, numbers, \/-_$+=, and wildcards. If you enter other characters, the settings may fail.
The FORMAT clause specifies the properties related to the file reading format and supports three file formats: CSV, PARQUET, and ORC.
- When TYPE = 'CSV', the following fields are included:
  - LINE_DELIMITER: specifies the line delimiter of the CSV file. The default value is LINE_DELIMITER='\n'.
  - FIELD_DELIMITER: optional. Specifies the column delimiter of the CSV file. The default value is FIELD_DELIMITER='\t'.
  - PARSE_HEADER: optional. Specifies whether the first line of the CSV file is the column name for each column. The default value is FALSE, indicating that the first line of the CSV file is not specified as the column name for each column.
  - ESCAPE: specifies the escape character of the CSV file. It must be one byte. The default value is ESCAPE ='\'.
  - FIELD_OPTIONALLY_ENCLOSED_BY: optional. Specifies the symbol used to wrap field values in the CSV file. The default value is an empty string. Using this option indicates that only certain types of fields (such as CHAR, VARCHAR, TEXT, and JSON) are wrapped.
  - ENCODING: specifies the character set encoding format of the file. If not specified, the default value is UTF8MB4.
  - NULL_IF: specifies the string that is treated as NULL. The default value is an empty string.
  - SKIP_HEADER: skips the file header and specifies the number of lines to skip.
  - SKIP_BLANK_LINES: specifies whether to skip blank lines. The default value is FALSE, indicating that blank lines are not skipped.
  - TRIM_SPACE: specifies whether to remove leading and trailing spaces from fields in the file. The default value is FALSE, indicating that leading and trailing spaces in fields are not removed.
  - TRIM_SPACE: specifies whether to remove leading and trailing spaces from fields in the file. The default value is FALSE, indicating that leading and trailing spaces in fields are not removed.
  - EMPTY_FIELD_AS_NULL: specifies whether to treat empty strings as NULL. The default value is FALSE, indicating that empty strings are not treated as NULL.
- When TYPE = 'PARQUET/ORC', there are no additional fields.
The PATTERN clause specifies a regular expression pattern to filter files in the LOCATION directory. For each file path in the LOCATION directory, if it matches the pattern, the external table accesses the file; otherwise, it skips the file. If this parameter is not specified, the external table can access all files in the LOCATION directory by default.

SOURCE

The SOURCE keyword does not include any other clauses. In this case, TYPE = 'ODPS' and the following fields are included:

ACCESSID: specifies the ID of the ODPS user.
ACCESSKEY: specifies the password of the ODPS user.
ENDPOINT: specifies the connection address of the ODPS service.
TUNNEL_ENDPOINT: specifies the connection address of the Tunnel data transmission service.
PROJECT_NAME: specifies the project where the table to be queried is located.
SCHEMA_NAME: optional. Specifies the schema of the table to be queried.
TABLE_NAME: specifies the name of the table to be queried.
QUOTA_NAME: optional. Specifies whether to use the specified quota.
COMPRESSION_CODE: optional. Specifies the compression format of the data source. Supported compression formats include ZLIB, ZSTD, LZ4, and ODPS_LZ4. If not specified, compression is not enabled.

log_errors

LOG ERRORS: Enables error diagnostics during the import process, allowing failed rows to be recorded instead of terminating the entire operation at the first error. When used with the REJECT LIMIT clause, it controls the maximum number of rows that can be rejected.
INTO 'logfile_string': Optional. Specifies the file in the target directory where error information will be written. If INTO 'logfile_string' is not specified, error information is only recorded in the warning buffer, which can be viewed using show warnings. logfile_string specifies the directory for storing error information, with the following format:

Note

The INTO 'logfile_string' parameter is supported starting from V4.4.0.
- When error information is stored locally, logfile_string is in the format [file://] local_file_path, where local_file_path can be either a relative or absolute path. If a relative path is specified, the current directory must be the installation directory of OceanBase Database. secure_file_priv specifies the file paths that OBServer nodes have permission to access. local_file_path must be a subpath of secure_file_priv.
- When error information is stored remotely (refer to the Location section in the syntax for creating external tables), logfile_string is in the following format:
  - {oss\|s3}://$ACCESS_ID:$ACCESS_KEY@$HOST:s3_region/remote_file_path, where $ACCESS_ID, $ACCESS_KEY, and $HOST are the access credentials required for accessing Alibaba Cloud OSS, AWS S3, and object storage compatible with the S3 protocol, respectively. s3_region specifies the region selected when using S3. These sensitive access credentials are stored in the system tables of the database in an encrypted manner.
  - hdfs://localhost:port/PATH, where localhost is the address of HDFS, port is the port number of HDFS, and PATH is the directory path in HDFS. For Kerberos authentication, the address is: hdfs://localhost:port/user?principal=xxx&keytab=xxx&krb5conf=xxx&configs=xxx.
OceanBase Database allows you to set tenant-level parameters to configure the compression algorithm and maximum size of a single diagnostic log file. For more information, see load_data_diagnosis_log_compression and load_data_diagnosis_log_max_size.
REJECT LIMIT: Optional. Specifies the maximum number of rows that can be rejected:
- Default value: 0. No rows can be rejected. The operation fails at the first error.
- integer: The maximum number of rows that can be rejected on a single server. For example, 10 means that up to 10 rows can be rejected on a single server.
- UNLIMITED: No limit on the number of rejected rows.
BADFILE 'badfile_string': Specifies the path where error data files are stored. The format of badfile_string is the same as that of logfile_string.

Note

The BADFILE 'badfile_string' parameter is supported starting from V4.4.0.

Notice

If the LOG ERRORS clause is not specified, the default behavior is to import data normally, which means that the operation fails at the first error.

If the LOG ERRORS clause is specified but the REJECT LIMIT clause is not, it is equivalent to setting LIMIT to 0. The operation fails at the first error, but the first error is recorded, and the error code is a diagnostic-related error, such as "reject limit reached".

View error logs

OceanBase Database allows you to execute the following SQL statement to view the error logs of an import operation:

SELECT * FROM READ_ERROR_LOG('diagnosis_log_path');

In this statement, diagnosis_log_path specifies the path of the error log. When executed, this statement is equivalent to the following URL external table query:

SELECT *
FROM FILES (
    LOCATION = 'diagnosis_log_path/'
    FORMAT(
        TYPE = 'csv'
        FIELD_DELIMITER = ','
        FIELD_OPTIONALLY_ENCLOSED_BY = '\',
        PARSE_HEADER  = true
        )
    [, PATTERN = 'filename']
);

Here is an example:

The log path is specified as a file name (without a trailing /).

SELECT * FROM READ_ERROR_LOG('diagnosis/log/path/filename');

The corresponding URL external table query is as follows (the file name is used as a pattern to filter files):

SELECT *
FROM FILES (
    LOCATION = 'diagnosis/log/path/',
    FORMAT (
        TYPE = 'csv'
        FIELD_DELIMITER = ','
        FIELD_OPTIONALLY_ENCLOSED_BY = '\',
        PARSE_HEADER = true
        ),
    PATTERN = 'filename'
);

The log path is specified as a folder (with a trailing /).

SELECT * FROM READ_ERROR_LOG('diagnosis/log/path/');

The corresponding URL external table query is as follows (no pattern is used):

SELECT *
FROM FILES (
    LOCATION = 'diagnosis/log/path/',
    FORMAT (
        TYPE = 'csv'
        FIELD_DELIMITER = ','
        FIELD_OPTIONALLY_ENCLOSED_BY = '\',
        PARSE_HEADER = true
        )
);

Examples

Note

When you use LOAD DATA to load data, you can use \N to represent NULL.

1. Example of the LOAD DATA INFILE statement

Import data from a server-side file

Example 1: Import data from a server-side file.

Set the global secure file path.
```
obclient> SET GLOBAL secure_file_priv = "/"
Query OK, 0 rows affected
obclinet> \q
Bye
```
Note

Since secure_file_priv is a GLOBAL variable, you need to execute \q to make it effective.

After reconnecting to the database, import data from an external file.

obclient> LOAD DATA INFILE 'test.sql' INTO TABLE t1;
Query OK, 0 rows affected

Example 2: Use the APPEND hint to enable direct load.

LOAD DATA /*+ PARALLEL(4) APPEND */
   INFILE '/home/admin/a.csv'
   INTO TABLE t;

Example 3: Specify the source file character set.

If the source file is encoded in GBK or another non-UTF-8 encoding, you must use CHARACTER SET to specify the source file character set. Otherwise, an error may occur during the import of Chinese characters (such as ERROR 1366 (HY000): Incorrect string value).

obclient> LOAD DATA INFILE '/tmp/t1.csv' INTO TABLE t1 CHARACTER SET 'gbk' FIELDS TERMINATED BY ',' LINES TERMINATED BY '\r\n';
Query OK, 2 rows affected

If the source file is encoded in UTF-8, you can specify CHARACTER SET 'utf8mb4'.

obclient> LOAD DATA INFILE '/tmp/t1.csv' INTO TABLE t1 CHARACTER SET 'utf8mb4' FIELDS TERMINATED BY ',' LINES TERMINATED BY '\r\n';
Query OK, 2 rows affected

II. Example of LOAD DATA FROM

Example 4: Import a CSV file (FROM FILES).

Import all columns from the test1.csv file.

load data  /*+ direct(true,0) parallel(2)*/
from files(
  location = "data/csv",
  format = (
    type = 'csv',
    field_delimiter = ',',
    parse_header = true,
    skip_blank_lines = true
  ),
  pattern = 'test1.csv')
into table t1;

Read the c1 and c2 columns from the test1.csv file in the data/csv directory and import them into the col1 and col2 columns of the t1 table.

load data  /*+ direct(true,0) parallel(2)*/
from (
  select c1, c2 from files(
      location = 'data/csv'
        format = (
        type = 'csv',
        field_delimiter = ',',
        parse_header = true,
        skip_blank_lines = true
      ),
      pattern = 'test1.csv'))
into table t1 (col1, col2);

Example 5: Import a PARQUET file.

load data  /*+ direct(true,0) parallel(2)*/
from files(
  location = "data/parquet",
  format = ( type = 'PARQUET'),
  pattern = 'test1.parquet')
into table t1;

Example 6: Import an ORC file.

load data  /*+ direct(true,0) parallel(2)*/
from files(
  location = "data/orc",
  format = ( type = 'ORC'),
  pattern = 'test1.orc')
into table t1;

Example 7: Import an ODPS file.

load data  /*+ direct(true,0) parallel(2)*/
from source (
  type = 'ODPS',
  accessid = '$ODPS_ACCESSID',
  accesskey = '******',
  endpoint= '$ODPS_ENDPOINT',
  project_name = 'example_project',
  schema_name = '',
  table_name = 'example_table',
  quota_name = '',
  compression_code = '')
into table t1;

Import data from a client (local) file

Example 1: Import data from a local file to a table in OceanBase Database.

Open a terminal or command prompt window and enter the following command to start the client.

obclient --local-infile -hxxx.xxx.xxx.xxx -P2881 -uroot@mysql001 -p****** -A -Dtest

The return result is as follows:

Welcome to the OceanBase.  Commands end with ; or \g.
Your OceanBase connection id is 3221719526
Server version: OceanBase 4.2.2.0 (r100000022023121309-f536833402c6efe9364d5a4b61830a858ef24d82) (Built Dec 13 2023 09:58:18)

Copyright (c) 2000, 2018, OceanBase and/or its affiliates. All rights reserved.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

obclient [test]>

Notice

To use the LOAD DATA LOCAL INFILE feature, you must use OBClient V2.2.4 or later. If you do not have a specific version of OBClient, you can also use a MySQL client to connect to the database.

In the client, execute the LOAD DATA LOCAL INFILE statement to load the local data file.

obclient [test]> LOAD DATA LOCAL INFILE '/home/admin/test_data/tbl1.csv' INTO TABLE tbl1 FIELDS TERMINATED BY ',';

The return result is as follows:

Query OK, 3 rows affected
Records: 3  Deleted: 0  Skipped: 0  Warnings: 0

Example 2: Directly import a compressed file by setting COMPRESSION.

LOAD DATA LOCAL INFILE '/your/file/lineitem.tbl.gz'
    INTO TABLE lineitem
    COMPRESSION GZIP
    FIELDS TERMINATED BY '|';

Example 3: Specify a partition for direct import by using PARTITION.

Specify a partition for direct import

load data /*+ direct(true,0) parallel(2) load_batch_size(100) */
infile "$FILE_PATH"
into table t1 partition(p0, p1)
fields terminated by '|' enclosed by '' lines starting by '' terminated by '\n';

Specify a subpartition for direct import

load data /*+ direct(true,0) parallel(2) load_batch_size(100) */
infile "$FILE_PATH"
into table t1 partition(p0sp0, p1sp1)
fields terminated by '|' enclosed by '' lines starting by '' terminated by '\n';

Import data from an OSS file

Example 1: Enable direct import by using the direct(bool, int) hint, and the direct import file can be on OSS.

load data /*+ parallel(1) direct(false,0)*/ remote_oss infile 'oss://antsys-oceanbasebackup/backup_rd/xiaotao.ht/lineitem2.tbl?host=***.oss-cdn.***&access_id=***&access_key=***' into table lineitem fields terminated by '|' enclosed by '' lines starting by '' terminated by '\n';

Import data from a server-side file as a URL external table (with LOG ERRORS diagnostics)

Notice

The commands involving IP addresses in the examples have been desensitized. When verifying, please fill in the actual IP address of your machine.

The following example demonstrates how to import data from an external file located on the server (OBServer node) and in OceanBase Database in MySQL mode. The steps are as follows:

Create directories on the OBServer node. The /home/admin/test_csv directory stores the external data, the /home/admin/test_into directory stores the error logs, and the /home/admin/test_badfile directory stores the error data files.
```
[admin@xxx /home/admin]# mkdir -p /home/admin/{test_csv,test_into,test_badfile}
```
Prepare the external file. In the /home/admin/test_csv directory, create a file named type_cast.csv.
```
[admin@xxx /home/admin/test_csv]# vi type_cast.csv
```
The content of the file is as follows:
```
1,2,3
2,4,af
3,4,5
ds,6,32
4,5,6
5,2,3
6,v4,af
7,4,5
kj,a6,32
8,5,6
```
Set the import file path.

Notice

For security reasons, when setting the secure_file_priv system variable, you can only modify the global variable by executing an SQL statement through a local Socket connection to the database. For more information, see secure_file_priv.
1. Run the following command to log in to the server where the OBServer node is located.
```
ssh admin@10.10.10.1
```
2. Run the following command to connect to the mysql001 tenant by using a local Unix socket.
```
obclient -S /home/admin/oceanbase/run/sql.sock -uroot@mysql001 -p******
```
3. Run the following SQL statement to set the import and export mode to unlimited.
```
SET GLOBAL secure_file_priv = "/";
```

Reconnect to the mysql001 tenant.

Here is an example:

obclient -h10.10.10.1 -P2881 -uroot@mysql001 -p****** -A -Ddb_test

Create a table named test_tbl1.

CREATE TABLE test_tbl1(col1 INT, col2 INT, col3 INT);

Set the compression algorithm used for diagnostic logs to AUTO.
```
ALTER SYSTEM SET load_data_diagnosis_log_compression = 'AUTO';
```
For more information about how to set the compression algorithm used for diagnostic logs, see load_data_diagnosis_log_compression.
Set the maximum size of a single diagnostic log file to 1 KB. When the size of the exported log exceeds 1 KB, a second file will be automatically generated and the export will continue.
```
ALTER SYSTEM SET load_data_diagnosis_log_max_size = '1K';
```
For more information about how to set the size of a single diagnostic log file, see load_data_diagnosis_log_max_size.

Use the LOAD DATA statement to import data into the test_tbl1 table from a URL external table, specifying error diagnostics, with the error log stored in the /home/admin/test_into/ directory and the error data file stored in the /home/admin/test_badfile/ directory.

LOAD DATA FROM FILES(
    LOCATION = '/home/admin/test_csv/',
    FORMAT = (
        TYPE = 'csv',
        FIELD_DELIMITER = ','),
    PATTERN = 'type_cast.csv')
    INTO TABLE test_tbl1
    LOG ERRORS
        INTO '/home/admin/test_into/'
        REJECT LIMIT UNLIMITED
        BADFILE '/home/admin/test_badfile/';

The return result is as follows:

Query OK, 6 rows affected, 4 warnings
Records: 6  Deleted: 0  Skipped: 0  Warnings: 4

Run the read_error_log statement to view the error log content.

SELECT * FROM READ_ERROR_LOG('/home/admin/test_into/');

The return result is as follows:

+------------+---------------+-------------+-------------------------------------------------------------------------------------------------------------------+
| ERROR CODE | FILE NAME     | LINE NUMBER | ERROR MESSAGE                                                                                                     |
+------------+---------------+-------------+-------------------------------------------------------------------------------------------------------------------+
| -4226      | type_cast.csv | 4           | fail to scan file type_cast.csv at line 4 for column "db_test"."test_tbl1"."col1", error: Incorrect integer value |
| -4226      | type_cast.csv | 9           | fail to scan file type_cast.csv at line 9 for column "db_test"."test_tbl1"."col1", error: Incorrect integer value |
| -4226      | type_cast.csv | 7           | fail to scan file type_cast.csv at line 7 for column "db_test"."test_tbl1"."col2", error: Incorrect integer value |
| -4226      | type_cast.csv | 2           | fail to scan file type_cast.csv at line 2 for column "db_test"."test_tbl1"."col3", error: Incorrect integer value |
+------------+---------------+-------------+-------------------------------------------------------------------------------------------------------------------+
4 rows in set

View the data in the test_tbl1 table.

SELECT * FROM test_tbl1;

The return result is as follows:

+------+------+------+
| col1 | col2 | col3 |
+------+------+------+
|    1 |    2 |    3 |
|    3 |    4 |    5 |
|    4 |    5 |    6 |
|    5 |    2 |    3 |
|    7 |    4 |    5 |
|    8 |    5 |    6 |
+------+------+------+
6 rows in set

View the content of the bad file by using the URL external table.

SELECT *
FROM FILES (
    LOCATION = '/home/admin/test_badfile/',
    FORMAT (
        TYPE = 'csv',
        FIELD_DELIMITER = ','),
    PATTERN = 'data.bad');

The return result is as follows:

+------+------+------+
| c1   | c2   | c3   |
+------+------+------+
| ds   | 6    | 32   |
| kj   | a6   | 32   |
| 6    | v4   | af   |
| 2    | 4    | af   |
+------+------+------+
4 rows in set

References

For more information about how to connect to OceanBase Database, see Overview of connection methods.
For more information about how to use the LOAD DATA statement, see Import data by using the LOAD DATA statement

OceanBase

Customer Stories

Documentation

LOAD DATA

Overview

General considerations

1. LOAD DATA INFILE

Note

Syntax of LOAD DATA INFILE

Support for Location objects in LOAD DATA INFILE and SELECT INTO OUTFILE

Overview

LOAD DATA INFILE

Automatic matching mechanism

Note

Syntax example

SELECT INTO OUTFILE

Automatic matching mechanism

Note

Syntax example

Parameters (applicable only to the INFILE form)

Notice

Note

Notice

Notice

Note

Notice

Note

Note

Wildcard rules for multi-file direct load (INFILE only)

2. LOAD DATA FROM (Recommended)

LOAD DATA FROM syntax

Note

Parameters (only for LOAD DATA FROM form)

FILES

Notice

SOURCE

log_errors

Note

Note

Notice

View error logs

Examples

Note

1. Example of the LOAD DATA INFILE statement

Import data from a server-side file

Note

II. Example of LOAD DATA FROM

Import data from a client (local) file

Notice

Import data from an OSS file

Import data from a server-side file as a URL external table (with LOG ERRORS diagnostics)

Notice

Notice

References