LOAD DATA|V4.6.0|OceanBase Database| docs|Distributed Database

LOAD DATA

Last Updated：2026-05-07 11:26:25 Updated

The LOAD DATA statement of OceanBase Database supports loading the following types of input files:
- Server-side files (on OBServer nodes): These files are stored on the OBServer nodes of OceanBase Database. You can use the LOAD DATA INFILE or LOAD DATA FROM URL statement to load data from these files into database tables.
- Client-side files (local files): These files are stored on the local file system of the client. You can use the LOAD DATA LOCAL INFILE or LOAD DATA FROM URL statement to load data from these files into database tables.
  
  Note
  
  When you execute the LOAD DATA LOCAL INFILE statement, the system automatically adds the IGNORE option.
- OSS files: These files are stored on the OSS file system. You can use the LOAD DATA REMOTE_OSS INFILE statement to load data from these files into database tables.
Currently, the LOAD DATA statement supports importing text files in the CSV format. The entire import process can be divided into the following steps:
1. File parsing: OceanBase Database reads the data from the specified file and parses the data in parallel or serially based on the specified parallelism.
2. Data distribution: Since OceanBase Database is a distributed database, data in different partitions may be distributed across different OBServer nodes. The LOAD DATA statement calculates the parsed data and determines which OBServer node to send the data to.
3. Data insertion: After the target OBServer node receives the data, it executes the INSERT statement to insert the data into the corresponding partition.

General considerations

You cannot use the LOAD DATA statement for a table that has triggers.
To import data from an external file, you must have the FILE privilege and the following settings:
- When you load a server-side file, you must set the system variable secure_file_priv to specify the path for importing or exporting files.
- When you load a client-side file, you must add the --local-infile[=1] option when you start the MySQL/OBClient client to enable the data loading feature from the local file system.
When you use partitioned table bypass import, the target table cannot be a replicated table, and cannot contain auto-increment columns, identity columns, or global indexes.
To improve the import efficiency, we recommend that you create the base table first and create indexes after the data is imported. If you have global indexes, you must create them after the data is imported. Otherwise, an error will be returned.

To import data from an external file, you must have the FILE privilege. You can use the GRANT FILE ON *.* TO $user_name; statement to grant the privilege, where $user_name is the user who executes the LOAD DATA command.

1. LOAD DATA INFILE

This form is compatible with the LOAD DATA syntax of MySQL. It only supports the CSV format. It can load files from the server side (INFILE), client side (LOCAL), or OSS (REMOTE_OSS). The import process involves parsing the file, distributing the data, and inserting the data.

Note

When you execute LOAD DATA LOCAL INFILE, the system automatically adds the IGNORE option.

Notice

If the file to be imported is stored on the local disk of an OBServer node, execute this statement directly on the OBServer node, and do not use a Proxy to execute it.

LOAD DATA INFILE syntax

LOAD DATA
    [/*+ PARALLEL(N) [load_batch_size(M)] [APPEND | direct(bool, int, [load_mode])] | NO_DIRECT */]
    [REMOTE_OSS | LOCAL] INFILE 'file_name'
    [REPLACE | IGNORE]
    INTO TABLE table_name [PARTITION(PARTITION_OPTION)]
    [CHARACTER SET 'charset_name']
    [COMPRESSION [=] {AUTO|NONE|GZIP|DEFLATE|ZSTD}]
    [{FIELDS | COLUMNS}
        [TERMINATED BY 'string']
        [[OPTIONALLY] ENCLOSED BY 'char']
        [ESCAPED BY 'char']
    ]
    [LINES
        [STARTING BY 'string']
        [TERMINATED BY 'string']
    ]
    [IGNORE number {LINES | ROWS}]
    [(column_name_var
        [, column_name_var] ...)]
    [SET col_name = {expr | DEFAULT}
        [, col_name = {expr | DEFAULT}] ...]

load_mode:
    'full'
    | 'inc_replace'

PARTITION_OPTION:
    partition_option_list
    | subpartition_option_list

Parameters (applicable only to INFILE)

Parameter	Description
parallel(N)	The degree of parallelism for data loading. The default value is 4. We recommend that you set the value to a number in the range [0, maximum number of CPU cores in the tenant].
load_batch_size(M)	The size of the batch for each insert. The default value of `M` is `100`. We recommend that you set the value to a number in the range [100, 1000].
APPEND \| direct() \|NO_DIRECT	The hint for enabling direct load. Notice During a direct load task, we do not recommend that you upgrade OceanBase Database, because this may cause the direct load task to fail. `APPEND` is equivalent to using `direct(true, 0)` by default. It also enables online statistics collection (the `GATHER_OPTIMIZER_STATISTICS` hint). `direct()` specifies the following parameters: `bool`: specifies whether the data to be written needs to be sorted. `true` indicates that sorting is required. `false` indicates that sorting is not required. Note When you import data from an ordered file or backup data (sorted by the primary key), you can set `need_sort` to `false` to skip the sorting process. If the data is not sorted by the primary key and sorting is not explicitly enabled (i.e., `need_sort` is set to `true`), an error will be triggered. `int`: specifies the maximum number of rows that can be incorrectly imported. `load_mode`: an optional parameter that specifies the mode of direct load. The value must be enclosed in single quotes. Valid values: `full`: the default value, indicating full load. `inc`: incremental load, which supports the `INSERT` and `IGNORE` semantics. `inc_replace`: incremental load without checking for duplicate primary keys, equivalent to incremental load with the `REPLACE` semantics. `NO_DIRECT`: specifies that a single SQL statement must not use direct load. If an SQL statement contains this hint, the statement ignores other direct load hints and executes a normal load.
REMOTE_OSS \| LOCAL	An optional parameter. `REMOTE_OSS` specifies whether to read data from an object storage service. If you use this parameter, it must be the endpoint of an object storage service. Supported object storage services: Alibaba Cloud Object Storage Service (OSS) (the endpoint is in the `oss://` format), AWS Simple Storage Service (S3) (the endpoint is in the `s3://` format), and object storage services compatible with S3, such as Huawei Cloud Object Storage Service (OBS) and Google Cloud Storage (GCS) (both use the `s3://` format). `LOCAL` specifies whether to read data from the local file system of the client. If you do not use the `LOCAL` parameter, data is read from the file system of the server (OBServer node).
file_name	The path and name of the input file. `file_name` can be in the following formats: When the import file is stored on an OBServer node or client: `/$PATH/$FILENAME`. When the import file is stored on an object storage service: `oss://$PATH/$FILENAME/?host=$HOST&access_id=$ACCESS_ID&access_key=$ACCESSKEY`. The parameters are described as follows: `$PATH`: the path of the file in the bucket. `$FILENAME`: the name of the file to be accessed. `$HOST`: the hostname or CDN-accelerated domain name of the object storage service, which specifies the endpoint of the object storage service. `$ACCESS_ID`: the Access Key ID required to access the object storage service, used for authentication. `$ACCESSKEY`: the Access Key Secret required to access the object storage service, used for authentication. Note When you import a file from an object storage service, make sure that the following conditions are met: You have the necessary permissions to access the bucket and file. You must have sufficient permissions to read the specified bucket and file. This typically involves setting up access permissions in the OSS console or through OSS API and configuring the Access Key ID and Access Key Secret as appropriate credentials. The database server can connect to the specified `$HOST` address to access the object storage service. If you are using the CDN-accelerated domain name of the object storage service, make sure that the CDN configuration is correct and that the network connection is normal.
table_name	The name of the table to which the data is imported. Supports partitioned and non-partitioned tables.
CHARACTER SET 'charset_name'	An optional parameter that specifies the character set of the source file. If this parameter is not specified, the system uses UTF-8 to parse the source file.
PARTITION_OPTION	The name of the partition to be imported. partition_option_list: the list of partitions to be imported. Multiple partitions are separated by commas (,). subpartition_option_list: the list of subpartitions to be imported. Multiple subpartitions are separated by commas (,). Note Partitioning is supported only for direct load. Ordinary LOAD DATA statements are not supported. That is, if you do not add the direct load hint or do not set the direct load configuration, specifying partitions in a LOAD DATA statement does not take effect.
COMPRESSION	Specifies the compression format of the file. `AUTO`: Detects the compression algorithm based on the file name suffix. When the `AUTO` option is specified, different file name suffixes correspond to different compression formats. `.gz`: GZIP compression. `.deflate`: DEFLATE compression. `.zst/.zstd`: ZSTD compression. `NONE`: Indicates that the file is not compressed. `GZIP`: GZIP compression. `DEFLATE`: DEFLATE compression without metadata. `ZSTD`: ZSTD compression. You can explicitly specify the compression format of the file or let the system detect the compression format based on the file name suffix.
FIELDS \| COLUMNS	Specifies the format of the fields. `ENCLOSED BY` : Specifies the modifier for the exported value. `TERMINATED BY` : Specifies the delimiter for the exported column. `ESCAPED BY` : Specifies the character to be ignored for the exported value.
LINES STARTING BY	Specifies the starting delimiter for the line.
LINES TERMINATED BY	Specifies the ending delimiter for the line.
IGNORE number { LINES \| ROWS }	Specifies the number of rows to be ignored. `LINES` specifies the number of rows at the beginning of the file. `ROWS` specifies the number of rows at the beginning of the file, which are separated by the field delimiter. By default, the system maps the fields in the input file to the columns in the table. If the input file does not contain all the columns, the missing columns are filled with the following values: Character type: empty string. Numeric type: 0. Date type: `0000-00-00`. Note When you import data from multiple files, the system behaves the same as when you import data from a single file.
column_name_var	Optional. Specifies the name of the imported column. To represent a null value in the file, use `\N`.
SET col_name = {expr \| DEFAULT}	Optional. Specifies an expression or a default value for a column that is not read from the file. For example, `SET dates=CURRENT_TIMESTAMP` sets the `dates` column to the current timestamp.

Supported format: This form only supports the CSV format.

Bypass mode and non-bypass mode: You can use the Hint to enable bypass import.

Wildcard rules for multi-file direct load (INFILE only)

To facilitate multi-file imports, the wildcard feature is introduced, applicable for server-side and OSS file imports, but not for client-side file imports. On the server side, wildcards can match filenames and directories. For OSS, only filename wildcards * and ? are supported. For more details, refer to the direct load documentation.

II. LOAD DATA FROM (Recommended)

This method reads data from a URL table, supporting various formats such as CSV, PARQUET, ORC, and ODPS. The execution path differs from the INFILE method, making it suitable for batch import scenarios like AP. It is the currently recommended approach. When executed, it is equivalent to INSERT INTO ... SELECT ...: LOAD DATA FROM FILES(...) INTO TABLE table_name is equivalent to INSERT INTO table_name SELECT * FROM FILES(...). The hints are consistent with those of the INSERT statement.

Syntax of LOAD DATA FROM

Note

When this option is used, the statement is equivalent to INSERT INTO table_name SELECT ... FROM url_table_function_expr. Therefore, the hint in this statement is consistent with that of INSERT statements (such as direct and parallel). In addition, the parameters specific to the INFILE clause such as load_batch_size are no longer used.

LOAD DATA
    [/*+ insert_hint */]
    [REPLACE | IGNORE]
    FROM { url_table_function_expr |
         ( SELECT expression_list FROM url_table_function_expr ) }
    INTO TABLE table_name
    [PARTITION(PARTITION_OPTION)]
    [(column_name_var [, column_name_var] ...)]
    [LOG ERRORS
        [INTO 'logfile_string']
        [REJECT LIMIT {integer | UNLIMITED}]
        [BADFILE 'badfile_string']]

url_table_function_expr:

  FILES (
    LOCATION = '<string>',
    {
      FORMAT = (
        TYPE = 'CSV',
        LINE_DELIMITER = '<string>' | <expr>,
        FIELD_DELIMITER = '<string>' | <expr>,
        PARSE_HEADER = { TRUE | FALSE },
        ESCAPE = '<character>' | <expr>,
        FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>,
        ENCODING = 'charset',
        NULL_IF = ('<string>' | <expr>, '<string>' | <expr> ...),
        SKIP_HEADER = <int>,
        SKIP_BLANK_LINES = { TRUE | FALSE },
        TRIM_SPACE = { TRUE | FALSE },
        EMPTY_FIELD_AS_NULL = { TRUE | FALSE }
      )
      | FORMAT = ( TYPE = 'PARQUET' | 'ORC' )
    },
    [PATTERN = '<regex_pattern>']
  )
  | SOURCE (
      TYPE = 'ODPS',
      ACCESSID = '<string>',
      ACCESSKEY = '<string>',
      ENDPOINT = '<string>',
      TUNNEL_ENDPOINT = '<string>',
      PROJECT_NAME = '<string>',
      SCHEMA_NAME = '<string>',
      TABLE_NAME = '<string>',
      QUOTA_NAME = '<string>',
      COMPRESSION_CODE = '<string>'
    )

PARTITION_OPTION:
    partition_option_list
    | subpartition_option_list

Parameters for the LOAD DATA FROM clause

Parameter	Description
Hint	The same as the hint of the INSERT statement (this option is equivalent to the INSERT INTO ... SELECT ... statement).
url_table_function_expr	Specifies the data source and format by using FILES or SOURCE.
table_name、PARTITION_OPTION	Same as in INFILE clause.
column_name_var	Optional. Specifies the column names to be imported.
LOG ERRORS	Optional. Enables error diagnosis during the import process. For more information, see log_errors.

Supported formats: CSV, PARQUET, and ORC (by using FILES); and ODPS (by using SOURCE). For more information on diagnosis, see log_errors.

FILES

The FILES keyword is specified by using LOCATION, FORMAT, and PATTERN.

LOCATION specifies the path where external table files are stored. Generally, the data files of an external table are stored in a dedicated directory. The directory may contain subdirectories. When a table is created, the external table automatically collects all files in the directory.
- The local LOCATION clause is specified in the format LOCATION = '[file://] local_file_path', where local_file_path can be either a relative path or an absolute path. The path must be a subpath of the directory specified by the secure_file_priv parameter. If you specify a relative path, the current directory must be the installation directory of OceanBase Database. The secure_file_priv parameter specifies the path from which the OBServer nodes can read files.
- The remote LOCATION clause is specified as follows:
  - LOCATION = '{oss|S3}://$ACCESS_ID:$ACCESS_KEY@$HOST:s3_region/remote_file_path', where $ACCESS_ID, $ACCESS_KEY, and $HOST are required to access an OSS or S3 service, and s3_region specifies the region of the S3 service. These parameters will be stored in the system table of OceanBase Database in encrypted mode.
  - LOCATION = 'hdfs://${hdfs_namenode_address}:${port}/PATH.localhost' where port specifies the port number of HDFS, and PATH specifies the directory path in HDFS.
    - If Kerberos authentication is used, the LOCATION clause is specified as follows: LOCATION = 'hdfs://localhost:port/user?principal=xxx&keytab=xxx&krb5conf=xxx&configs=xxx'. The parameters in this clause are described as follows:
      - principal: the username used for Kerberos authentication.
      - keytab: the keytab file path for user authentication.
      - krb5conf: the path to the Kerberos configuration file.
      - configs: specifies additional HDFS configurations. The default value is empty, but in Kerberos mode, you must specify this parameter. For example, you can set the dfs.data.transfer.protection parameter to authentication,privacy to enable data transfer protection.
  Notice
  
  When specifying the object storage path, parameters are separated by the & symbol. Make sure that the parameter values you specify contain only uppercase and lowercase letters, digits, \/-_$+= and wildcards. If you specify other characters, the settings may fail.
FORMAT specifies the attributes of the file reading format, including the CSV, PARQUET, and ORC formats.
- When TYPE = 'CSV', the following fields are included:
  - LINE_DELIMITER: specifies the line delimiter of CSV files. The default value is LINE_DELIMITER='\n'.
  - FIELD_DELIMITER: optional. specifies the delimiter of CSV files. The default value is FIELD_DELIMITER='\t'.
  - PARSE_HEADER: optional. specifies whether to use the first row of the CSV file as the column names. The default value is FALSE, indicating that the first row is not used as the column names.
  - ESCAPE: specifies the escape character of CSV files, which can only be one character. The default value is ESCAPE ='\'.
  - FIELD_OPTIONALLY_ENCLOSED_BY: optional. specifies the delimiter that encloses the field value in the CSV file. The default value is empty. If you specify this parameter, this delimiter is only applied to fields of specific data types such as CHAR, VARCHAR, TEXT, and JSON.
  - ENCODING: specifies the character set of the files. If not specified, the default value is UTF8MB4.
  - NULL_IF: specifies the string to be handled as NULL. The default value is empty.
  - SKIP_HEADER: specifies to skip the file header and the number of skipped lines.
  - SKIP_BLANK_LINES: specifies whether to skip blank lines. The default value is FALSE, indicating not to skip blank lines.
  - TRIM_SPACE: specifies whether to remove the head and tail spaces of the fields in the file. The default value is FALSE, indicating not to remove the head and tail spaces of the fields in the file.
  - EMPTY_FIELD_AS_NULL: specifies whether to handle empty strings as NULL. The default value is FALSE, indicating that empty strings are not handled as NULL.
- When TYPE = 'PARQUET' or TYPE = 'ORC', there are no additional fields.
PATTERN specifies a regular expression for filtering the files in the LOCATION directory. For each file path in the LOCATION directory, if the path matches the regular expression, the external table accesses the file; otherwise, the external table skips the file. If this option is not specified, all files in the LOCATION directory are accessible.

SOURCE

The SOURCE keyword does not contain other clauses. In this case, TYPE = 'ODPS' and the following fields are available:

ACCESSID: specifies the ID of the ODPS user.
ACCESSKEY: specifies the password of the ODPS user.
ENDPOINT: specifies the connection address of the ODPS service.
TUNNEL_ENDPOINT: specifies the connection address of the Tunnel data transmission service.
TUNNEL_ENDPOINT: specifies the connection address of the Tunnel data transmission service.
PROJECT_NAME: specifies the project where the table to be queried is located.
SCHEMA_NAME: an optional field. Specifies the schema of the table to be queried.
TABLE_NAME: specifies the name of the table to be queried.
QUOTA_NAME: an optional field. Specifies whether to use the specified quota.
COMPRESSION_CODE: an optional field. Specifies the compression format of the data source. Valid values: ZLIB, ZSTD, LZ4, and ODPS_LZ4. If this parameter is not specified, compression is not enabled.

log_errors

LOG ERRORS: enables error diagnostics during the import process, allowing failed rows to be recorded instead of terminating the entire operation at the first error. When used with the REJECT LIMIT clause, it controls the maximum number of allowed error rows.
INTO 'logfile_string': an optional field. Specifies the file in the target directory where error information will be written. If INTO 'logfile_string' is not specified, error information is only recorded in the warning buffer, which can be viewed using show warnings. logfile_string specifies the directory for storing error information. The format is as follows:

Note

The INTO 'logfile_string' parameter is supported starting from V4.4.0.
- If error information is stored locally, logfile_string is in the format of [file://] local_file_path, where local_file_path can be a relative or absolute path. If a relative path is specified, the current directory must be the installation directory of OceanBase Database. secure_file_priv specifies the file paths that OBServer nodes are allowed to access. local_file_path must be a subpath of the secure_file_priv path.
- If error information is stored remotely (refer to the Location section in the syntax for creating external tables), logfile_string is in the following format:
  - {oss\|s3}://$ACCESS_ID:$ACCESS_KEY@$HOST:s3_region/remote_file_path, where $ACCESS_ID, $ACCESS_KEY, and $HOST are the access information required to access Alibaba Cloud OSS, AWS S3, or object storage compatible with the S3 protocol. s3_region specifies the region selected when using S3. These sensitive access information are stored in the system tables of the database in an encrypted manner.
  - hdfs://localhost:port/PATH, where localhost is the address of HDFS, port is the port number of HDFS, and PATH is the directory path in HDFS. For Kerberos authentication, the address is in the format of hdfs://localhost:port/user?principal=xxx&keytab=xxx&krb5conf=xxx&configs=xxx.
OceanBase Database allows you to set tenant-level parameters to configure the compression algorithm and maximum size of a single diagnostic log file. For more information, see load_data_diagnosis_log_compression and load_data_diagnosis_log_max_size.
REJECT LIMIT: an optional field. Specifies the maximum number of allowed error rows:
- Default value: 0. This means that no error rows are allowed, and the operation will fail at the first error.
- integer: the maximum number of error rows allowed on a single machine. For example, 10 means that up to 10 error rows can be encountered on a single machine.
- UNLIMITED: allows an unlimited number of error rows.
BADFILE 'badfile_string': specifies the path where error data files are stored. The format of badfile_string is the same as that of logfile_string.

Note

The BADFILE 'badfile_string' parameter is supported starting from V4.4.0.

Notice

If the LOG ERRORS clause is not specified, the default behavior is normal import, which means the operation will fail at the first error.

If the LOG ERRORS clause is specified but the REJECT LIMIT clause is not, it is equivalent to setting LIMIT to 0. In this case, the operation will fail at the first error, but the first error will be recorded, and the error code will be a diagnostic-related error, such as "reject limit reached".

View error logs

OceanBase Database supports the following SQL statement to view the error logs of the export operation:

SELECT * FROM READ_ERROR_LOG('diagnosis_log_path');

In this statement, diagnosis_log_path specifies the path of the error log. When executed, this SQL statement is equivalent to the following URL external table statement:

SELECT *
FROM FILES (
    LOCATION = 'diagnosis_log_path/'
    FORMAT(
        TYPE = 'csv'
        FIELD_DELIMITER = ','
        FIELD_OPTIONALLY_ENCLOSED_BY = '\',
        PARSE_HEADER  = true
        )
    [, PATTERN = 'filename']
);

Example:

If the specified log path is a file name (not ending with /):

SELECT * FROM READ_ERROR_LOG('diagnosis/log/path/filename');

The corresponding URL external table statement is as follows (the file name is used as a pattern to filter files): The corresponding URL external table statement is as follows (the file name is used as a pattern to filter files):

SELECT *
FROM FILES (
    LOCATION = 'diagnosis/log/path/',
    FORMAT (
        TYPE = 'csv'
        FIELD_DELIMITER = ','
        FIELD_OPTIONALLY_ENCLOSED_BY = '\',
        PARSE_HEADER = true
        ),
    PATTERN = 'filename'
);

If the specified log path is a folder (ending with /):

SELECT * FROM READ_ERROR_LOG('diagnosis/log/path/');

The corresponding URL external table statement is as follows (no pattern is used):

SELECT *
FROM FILES (
    LOCATION = 'diagnosis/log/path/',
    FORMAT (
        TYPE = 'csv'
        FIELD_DELIMITER = ','
        FIELD_OPTIONALLY_ENCLOSED_BY = '\',
        PARSE_HEADER = true
        )
);

Wildcard rules for multi-file direct load (INFILE only)

To facilitate multi-file import, the wildcard feature is introduced for both server-side and OSS file imports, but not for client-side file imports.

Wildcard usage for server-side files
- Matching rules:
  - Match a file name: load data /*+ parallel(20) direct(true, 0) */ infile '/xxx/test.*.csv' replace into table t1 fields terminated by '|';
  - Matching a directory: load data /*+ parallel(20) direct(true, 0) */ infile '/aaa*bb/test.1.csv' replace into table t1 fields terminated by '|';
  - Match both a directory and a file name: load data /*+ parallel(20) direct(true, 0) */ infile '/aaa*bb/test.*.csv' replace into table t1 fields terminated by '|';
- Considerations:
  - At least one matching file must exist; otherwise, error code 4027 is returned.
  - For the input load data /*+ parallel(20) direct(true, 0) */ infile '/xxx/test.1*.csv,/xxx/test.6*.csv' replace into table t1 fields terminated by '|';, /xxx/test.1*.csv,/xxx/test.6*.csv is considered a whole match. If no match is found, error code 4027 is returned.
  - Only POSIX-compliant wildcard patterns are supported, such as test.6*(6|0).csv and test.6*({0.csv,6.csv}|.csv). Although these patterns can be found using the ls command, the GLOB function cannot match them, resulting in an error code 4027.
Wildcard usage for files in the cloud object storage service (OSS)
- Matching rules:
  
  Match a file name: load data /*+ parallel(20) direct(true, 0) */ remote_oss infile 'oss://xxx/test.*.csv?host=xxx&access_id=xxx&access_key=xxx' replace into table t1 fields terminated by '|';
- Considerations:
  - Directory matching is not supported. For example, load data /*+ parallel(20) direct(true, 0) */ remote_oss infile 'oss://aa*bb/test.*.csv?host=xxx&access_id=xxx&access_key=xxx' replace into table t1 fields terminated by '|'; will return OB_NOT_SUPPORTED.
  - Only the * and ? wildcards are supported for file names. Other wildcards, although allowed in input, cannot match any results.

Examples

I. Examples of LOAD DATA INFILE

Import data from a file on the server (OBServer node)

Example 1: Import data from a file on the server.

Set the global secure path.

Notice

For security reasons, you can only modify the global variable secure_file_priv by executing an SQL statement through a local socket connection. For more information, see secure_file_priv.
```
obclient> SET GLOBAL secure_file_priv = "/";
```
Log out.

Note

Since secure_file_priv is a GLOBAL variable, you need to execute \q to log out and make the change take effect.
```
obclinet> \q
```
The return result is as follows:
```
Bye
```

After reconnecting to the database, use the LOAD DATA statement to import data.

Normal import.

obclient> LOAD DATA INFILE '/home/admin/test.csv' INTO TABLE t1;

Enable direct load by using the APPEND hint.

LOAD DATA /*+ PARALLEL(4) APPEND */ INFILE '/home/admin/test.csv' INTO TABLE t1;

Example 2: Enable direct load by using the APPEND hint.

LOAD DATA /*+ PARALLEL(4) APPEND */
   INFILE '/home/admin/a.csv'
   INTO TABLE t;

Example 3: Specify the character set of the source file.

If the source file is encoded in a non-UTF-8 format, such as GBK, you must specify the character set of the source file using CHARACTER SET. Otherwise, you may encounter an error when importing characters such as Chinese due to a mismatch in encoding (for example, ERROR 1366 (HY000): Incorrect string value).

obclient> LOAD DATA INFILE '/tmp/t1.csv' INTO TABLE t1 CHARACTER SET 'gbk' FIELDS TERMINATED BY ',' LINES TERMINATED BY '\r\n';
Query OK, 2 rows affected

If the source file is encoded in UTF-8, you can specify CHARACTER SET 'utf8mb4'.

obclient> LOAD DATA INFILE '/tmp/t1.csv' INTO TABLE t1 CHARACTER SET 'utf8mb4' FIELDS TERMINATED BY ',' LINES TERMINATED BY '\r\n';
Query OK, 2 rows affected

II. Examples of LOAD DATA FROM

Example 4: Import a CSV file (FROM FILES).

Import all columns from the test1.csv file.

load data  /*+ direct(true,0) parallel(2)*/
from files(
  location = "data/csv",
  format = (
    type = 'csv',
    field_delimiter = ',',
    parse_header = true,
    skip_blank_lines = true
  ),
  pattern = 'test1.csv')
into table t1;

Read the c1 and c2 columns from the test1.csv file in the data/csv directory and import them into the col1 and col2 columns of the t1 table.

load data  /*+ direct(true,0) parallel(2)*/
from (
  select c1, c2 from files(
      location = 'data/csv'
        format = (
        type = 'csv',
        field_delimiter = ',',
        parse_header = true,
        skip_blank_lines = true
      ),
      pattern = 'test1.csv'))
into table t1 (col1, col2);

Example 5: Import a PARQUET file.

load data  /*+ direct(true,0) parallel(2)*/
from files(
  location = "data/parquet",
  format = ( type = 'PARQUET'),
  pattern = 'test1.parquet')
into table t1;

Example 6: Import an ORC file.

load data  /*+ direct(true,0) parallel(2)*/
from files(
  location = "data/orc",
  format = ( type = 'ORC'),
  pattern = 'test1.orc')
into table t1;

Example 7: Import an ODPS file.

load data  /*+ direct(true,0) parallel(2)*/
from source (
  type = 'ODPS',
  accessid = '$ODPS_ACCESSID',
  accesskey = '******',
  endpoint= '$ODPS_ENDPOINT',
  project_name = 'example_project',
  schema_name = '',
  table_name = 'example_table',
  quota_name = '',
  compression_code = '')
into table t1;

Import data from a local file

Example 1: Import data from a local file to a table in OceanBase Database.

Start the client by running the following command in the terminal or command prompt window.

obclient --local-infile -hxxx.xxx.xxx.xxx -P2881 -usys@oracle001 -p******

The return result is as follows:

Welcome to the OceanBase.  Commands end with ; or \g.
Your OceanBase connection id is 3221548006
Server version: OceanBase 4.2.2.0 (r100000032024010510-75c47d4be18a399e13c5309de1a81da5caf4e7c0) (Built Jan  5 2024 10:17:55)

Copyright (c) 2000, 2018, OceanBase and/or its affiliates. All rights reserved.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

obclient [SYS]>

Notice

To use the LOAD DATA LOCAL INFILE feature, use OBClient of V2.2.4 or later.

In the client, execute the LOAD DATA LOCAL INFILE statement to load the local data file.

obclient [SYS]> LOAD DATA LOCAL INFILE '/home/admin/test_data/tbl1.csv' INTO TABLE tbl1 FIELDS TERMINATED BY ',';

The return result is as follows:

Query OK, 3 rows affected
Records: 3  Deleted: 0  Skipped: 0  Warnings: 0

Example 2: Import a compressed file by setting COMPRESSION.

LOAD DATA LOCAL INFILE '/your/file/lineitem.tbl.gz'
    INTO TABLE lineitem
    COMPRESSION GZIP
    FIELDS TERMINATED BY '|';

Example 3: Specify a partition to directly import data.

Specify a partition to directly import data.

load data /*+ direct(true,0) parallel(2) load_batch_size(100) */
infile "$FILE_PATH"
into table t1 partition(p0, p1)
fields terminated by '|' enclosed by '' lines starting by '' terminated by '\n';

Specify a subpartition to directly import data.

load data /*+ direct(true,0) parallel(2) load_batch_size(100) */
infile "$FILE_PATH"
into table t1 partition(p0sp0, p1sp1)
fields terminated by '|' enclosed by '' lines starting by '' terminated by '\n';

Import data from an OSS file

Example 1: Enable direct import by using the direct(bool, int) hint, and the file to be directly imported is stored in OSS.

LOAD DATA /*+ direct(true,1024) parallel(16) */ REMOTE_OSS INFILE 'oss://antsys-oceanbasebackup/backup_rd/xiaotao.ht/lineitem2.tbl?host=***.oss-cdn.***&access_id=***&access_key=***' INTO TABLE tbl1 FIELDS TERMINATED BY ',';

Import data from a server-side file as a URL external table (with LOG ERRORS diagnostics)

Notice

The commands involving IP addresses in the example have been desensitized. When verifying, please fill in the actual IP address of your machine.

The following example demonstrates how to import data from an external file located on the server (OBServer node) and in OceanBase Database's Oracle mode. The steps are as follows:

Create directories on the OBServer node. The /home/admin/test_csv directory stores the external data, /home/admin/test_into stores the error logs, and /home/admin/test_badfile stores the error data files.
```
[admin@xxx /home/admin]# mkdir -p /home/admin/{test_csv,test_into,test_badfile}
```
Prepare the external file. In the /home/admin/test_csv directory, create a file named type_cast.csv.
Prepare the external file. In the /home/admin/test_csv directory, create a file named type_cast.csv.
```
[admin@xxx /home/admin/test_csv]# vi type_cast.csv
```
The content of the file is as follows:
```
1,2,3
2,4,af
3,4,5
ds,6,32
4,5,6
5,2,3
6,v4,af
7,4,5
kj,a6,32
8,5,6
```
Set the import file path.

Notice

For security reasons, when setting the system variable secure_file_priv, you can only modify the global variable by connecting to the database through a local socket. For more information, see secure_file_priv.
1. Execute the following command to log in to the machine where the OBServer node is located.
2. Execute the following command to log in to the machine where the OBServer node is located.
```
ssh admin@10.10.10.1
```
3. Execute the following command to connect to the tenant oracle001 using a local Unix socket.
```
obclient -S /home/admin/oceanbase/run/sql.sock -usys@oracle001 -p******
```
4. Execute the following SQL command to set the import and export to unlimited.
```
SET GLOBAL secure_file_priv = "/";
```

Reconnect to the tenant oracle001.

Here is an example:

obclient -h10.10.10.1 -P2881 -usys@oracle001 -p****** -A

Create a table named test_tbl1.

CREATE TABLE test_tbl1(col1 INT, col2 INT, col3 INT);

Set the compression algorithm used for diagnostic logs to AUTO.
```
ALTER SYSTEM SET load_data_diagnosis_log_compression = 'AUTO';
```
For more information about setting the compression algorithm used for diagnostic logs, see load_data_diagnosis_log_compression.
Set the maximum size of a single diagnostic log file to 1 KB. If the exported log exceeds 1 KB, a second file will be automatically generated and the export will continue.
```
ALTER SYSTEM SET load_data_diagnosis_log_max_size = '1K';
```
For more information about setting the size of a single diagnostic log file, see load_data_diagnosis_log_max_size.

Use the LOAD DATA statement to import data from a URL external table into the test_tbl1 table, specifying error diagnostics, with error logs stored in the /home/admin/test_into/ directory and error data files stored in the /home/admin/test_badfile/ directory.

LOAD DATA FROM FILES(
    LOCATION = '/home/admin/test_csv/',
    FORMAT = (
        TYPE = 'csv',
        FIELD_DELIMITER = ','),
    PATTERN = 'type_cast.csv')
    INTO TABLE test_tbl1
    LOG ERRORS
        INTO '/home/admin/test_into/'
        REJECT LIMIT UNLIMITED
        BADFILE '/home/admin/test_badfile/';

The return result is as follows:

Query OK, 6 rows affected, 4 warnings
Records: 6  Deleted: 0  Skipped: 0  Warnings: 4

Use the read_error_log statement to view the content of the error log.

SELECT * FROM READ_ERROR_LOG('/home/admin/test_into');

The return result is as follows:

+------------+---------------+-------------+-------------------------------------------------------------------------------------------------------+
| ERROR CODE | FILE NAME     | LINE NUMBER | ERROR MESSAGE                                                                                         |
+------------+---------------+-------------+-------------------------------------------------------------------------------------------------------+
| -5114      | type_cast.csv | 4           | fail to scan file type_cast.csv at line 4 for column "SYS"."TEST_TBL1"."COL1", error: Invalid numeric |
| -5114      | type_cast.csv | 9           | fail to scan file type_cast.csv at line 9 for column "SYS"."TEST_TBL1"."COL1", error: Invalid numeric |
| -5114      | type_cast.csv | 7           | fail to scan file type_cast.csv at line 7 for column "SYS"."TEST_TBL1"."COL2", error: Invalid numeric |
| -5114      | type_cast.csv | 2           | fail to scan file type_cast.csv at line 2 for column "SYS"."TEST_TBL1"."COL3", error: Invalid numeric |
+------------+---------------+-------------+-------------------------------------------------------------------------------------------------------+
4 rows in set

View the data in the test_tbl1 table.

SELECT * FROM test_tbl1;

The return result is as follows:

+------+------+------+
| COL1 | COL2 | COL3 |
+------+------+------+
|    1 |    2 |    3 |
|    3 |    4 |    5 |
|    4 |    5 |    6 |
|    5 |    2 |    3 |
|    7 |    4 |    5 |
|    8 |    5 |    6 |
+------+------+------+
6 rows in set

View the content of the bad file through the URL external table.

SELECT *
FROM FILES (
    LOCATION = '/home/admin/test_badfile',
    FORMAT (
        TYPE = 'csv',
        FIELD_DELIMITER = ','),
    PATTERN = 'data.bad');

The return result is as follows:

+------+------+------+
| C1   | C2   | C3   |
+------+------+------+
| ds   | 6    | 32   |
| kj   | a6   | 32   |
| 6    | v4   | af   |
| 2    | 4    | af   |
+------+------+------+
4 rows in set

References

For more information about how to use the LOAD DATA statement, see Import data by using the LOAD DATA statement.
For more information about how to use the LOAD DATA statement to perform a direct load, see Import data by using the LOAD DATA statement.

OceanBase

Customer Stories

Documentation

LOAD DATA

Purpose

Note

General considerations

1. LOAD DATA INFILE

Note

Notice

LOAD DATA INFILE syntax

Parameters (applicable only to INFILE)

Notice

Note

Note

Note

Note

Wildcard rules for multi-file direct load (INFILE only)

II. LOAD DATA FROM (Recommended)

Syntax of LOAD DATA FROM

Note

Parameters for the LOAD DATA FROM clause

FILES

Notice

SOURCE

log_errors

Note

Note

Notice

View error logs

Wildcard rules for multi-file direct load (INFILE only)

Examples

I. Examples of LOAD DATA INFILE

Import data from a file on the server (OBServer node)

Notice

Note

II. Examples of LOAD DATA FROM

Import data from a local file

Notice

Import data from an OSS file

Import data from a server-side file as a URL external table (with LOG ERRORS diagnostics)

Notice

Notice

References