LOAD DATA

2025-12-02 11:32:56  Updated

Description

This statement is used to load data from external files to a database table.

Currently, the LOAD DATA statement can import text files in the CSV format. The import process comprises the following steps:

  1. Parse the file: OceanBase Database reads data from the file based on the filename you specified and parses the data in the file in parallel or serially based on the specified degree of parallelism.

  2. Distribute the data: As an distributed database, OceanBase Database distributes the parsed data to OBServer nodes. LOAD DATA calculates the OBServer node to which each data record should be sent.

  3. Insert the data: After the data is received by the destination OBServer node, the node executes an INSERT statement to insert the data into the corresponding partition.

Limitations and considerations

Limitations

  • Tables with triggers (triggered on DML operations) are not supported.
  • Local (LOCAL) data files are not supported.

Considerations

To increase the data import rate, OceanBase Database uses parallel design for the LOAD DATA operation. During this process, the data to be imported is divided into multiple subtasks for parallel execution. Each subtask is processed as an independent transaction. The subtasks are executed in random order. Therefore, note the following considerations:

  • The atomicity of the entire data import cannot be guaranteed.
  • For tables without primary keys, the order of data written to the table can be different from the order of data in the file.

Privilege requirements

To execute the LOAD DATA statement, you must have the FILE privilege and the INSERT privilege on the destination table. For more information about OceanBase Database privileges, see Privilege types in MySQL mode.

Syntax

LOAD DATA [hint_options]
    [REMOTE_OSS]
    INFILE 'file_name'
    [IGNORE | REPLACE]
    INTO TABLE table_name
    [PARTITION (partition_name [, partition_name] ...)]
    [CHARACTER SET charset_name_or_default]
    [field_opt]
    [line_opt]
    [{IGNORE | GENERATED} number {LINES | ROWS}]
    [(column_name_var [, column_name_var] ...)]
    [SET load_set_list]
    [load_data_extended_option_list];

hint_options:
    [/*+ PARALLEL(N) [load_batch_size(M)] [APPEND | direct(bool, int)] */]

charset_name_or_default:
    `string_charset_name`
    | BINARY
    | DEFAULT

field_opt:
    {COLUMNS | FIELDS}
        [TERMINATED BY 'string']
        [[OPTIONALLY] ENCLOSED BY 'char']
        [ESCAPED BY 'char']

line_opt:
    LINES
        [STARTING BY 'string']
        [TERMINATED BY 'string']

load_set_list:
    load_set_element [, load_set_element ...]

load_set_element:
    column_definition_ref = {expr | DEFAULT}

load_data_extended_option_list:
    load_data_extended_option [load_data_extended_option ...]

load_data_extended_option:
    LOGFILE [=] string_value
    | REJECT LIMIT [=] int_num
    | BADFILE [=] string_value

Parameters

Parameter Description
hint_options Optional. Specifies hint options. For more information, see hint_options.
REMOTE_OSS Optional. Specifies to read data files from the Alibaba Cloud OSS file system. If you specify this option, the data files are read from the Alibaba Cloud OSS file system. If you do not specify this option, the data files are read from the file system of the server where the OBServer node resides.
file_name Specifies the path and name of the input file. The file_name parameter is in the following format:
  • If the import file is located on the OBServer node or the client: /$PATH/$FILENAME.
  • If the import file is located on OSS: oss://$PATH/$FILENAME/?host=$HOST&access_id=$ACCESS_ID&access_key=$ACCESSKEY.
The parameters are described as follows:
  • $PATH: the path where the file is stored in the bucket, indicating the directory of the file.
  • $FILENAME: the name of the file, indicating the specific file to access.
  • $HOST: the host name of the Alibaba Cloud OSS service or the domain name of the CDN-accelerated URL, namely, the address of the OSS service to be accessed.
  • $ACCESS_ID: the Access Key ID required for accessing the Alibaba Cloud OSS service, which is used for authentication.
  • $ACCESSKEY: the Access Key Secret required for accessing the Alibaba Cloud OSS service, which is used for authentication.

Note

When you import a file from OSS, make sure that the following conditions are met:

  • You have the permissions to access the specified OSS bucket and file. You need to have the permissions to read the specified file. Typically, you need to set the permissions through the OSS console or by using OSS API and configure the access key (Access Key ID and Access Key Secret) as the credentials with appropriate permissions.
  • The database server can connect to the specified $HOST address over the network to access the OSS service. If a CDN-accelerated domain name is used, make sure that the CDN is correctly configured and the network connection is normal.

IGNORE | REPLACE Optional, used to specify the behavior when there is a conflict between the source data and the unique key of the target table. LOAD DATA determines whether the data is duplicated based on the primary key of the table. If the table does not have a primary key, there is no difference between the REPLACE and IGNORE options. If this option is omitted, when duplicate data is encountered, LOAD DATA will log the erroneous data records into the log file. The specific options are described as follows:
  • IGNORE: Indicates ignoring the conflicting rows.
  • REPLACE: Indicates replacing the existing data, i.e., overwriting the conflicting rows.

Note

When using the REPLACE or IGNORE clause, if the parallelism is greater than 1, the final inserted record of the conflicting rows may differ from the result of serial execution. If it is necessary to strictly ensure the insertion result of conflicting records, do not specify the parallelism for this statement (or set the parallelism to 1).

table_name Specifies the name of the destination table, namely, the table to which the data is to be imported. Partitioned and non-partitioned tables are supported.
PARTITION (partition_name [, partition_name ...]) Optional. If the destination table is a partitioned table, you can use the PARTITION clause to explicitly specify the partitions to which the data is to be imported. Multiple partitions are separated with commas (,).

Note

The PARTITION clause takes effect only in syntax and has no effect in execution.

CHARACTER SET charset_name_or_default Optional. Specifies the character set of the import file. For more information, see charset_name_or_default.
field_opt Optional. Specifies the field options to define how the field delimiter, text qualifier, and field escape character are interpreted. For more information, see field_opt.
line_opt Optional. Specifies how to process the content of rows, including the start and end characters of rows. For more information, see line_opt.
{IGNORE | GENERATED} number {LINES | ROWS} Optional. Specifies the options to process the content of data files. The LINES and ROWS options are synonyms and both indicate rows in the file. The options are described as follows:
  • IGNORE number LINES: specifies to skip the first number lines of the file.
  • The effect of GENERATED number LINES is the same as that of IGNORE number LINES.
column_name_var Optional. Specifies the column names in the destination table, namely, the columns to which the data from the data file is to be imported. If you do not specify this option, the data in the fields in the input file is imported to the destination table by sequentially matching columns. If the input file does not contain all the columns of the destination table, the missing columns are filled with default values based on the following rules:
  • Character type: an empty string.
  • Value type: 0.
  • Date type: 0000-00-00.
SET load_set_list Optional. Allows you to set or modify the values of fields before they are imported to the destination table, namely, to convert or calculate data when it is being imported. load_set_list is a comma-separated list. Each item load_set_element in the list specifies how to set or convert a specific field (column). For more information, see load_set_element.
load_data_extended_option_list Optional. Provides further control over the data loading process. Each load_data_extended_option can be used to define the behavior of a specific row. For more information, see load_data_extended_option.

hint_options

  • PARALLEL(N): specifies the degree of parallelism for loading data. The default value of N is 4.

  • load_batch_size(M): specifies the batch size for inserting data. The default value of M is 100. We recommend that you set the value of M in the range of [100, 1000].

  • APPEND | direct(bool, int): enables the direct load feature by using a hint.

    • APPEND: enables the direct load feature, which allows space to be allocated and data to be written directly in data files. By default, the APPEND hint is equivalent to direct(true, 0). In this case, online statistics collection is supported (GATHER_OPTIMIZER_STATISTICS hint). For more information, see Online statistics collection.

    • direct(bool, int): enables the direct load feature. The parameters are described as follows:

      • bool: specifies whether the data to be written needs to be sorted. The value true indicates that the data needs to be sorted, and the value false indicates that the data does not need to be sorted.
      • int: the maximum number of error rows tolerated.

    For more information about how to use the LOAD DATA statement to enable the direct load feature, see Bypass data import.

charset_name_or_default

  • string_charset_name: the name of the character set, which is usually a string value, such as gbk or latin1.
  • BINARY: specifies the binary character set, which means no conversion, that is, conversion from or to the binary character set.
  • DEFAULT: specifies to use the default character set of the database to interpret character data in files. The default character set of OceanBase Database is utf8mb4.

field_opt

  • {COLUMNS | FIELDS}: specifies the keyword COLUMNS or FIELDS, which are synonyms, to indicate the columns in the data file.

  • TERMINATED BY 'string': specifies the delimiter for fields in the data file, namely, the end character for exported columns. When a row of the file is read, the specified end character indicates the end of one field and the beginning of the next field.

  • [OPTIONALLY] ENCLOSED BY 'char': Specifies the character that can optionally enclose (quote) the beginning and end of fields in the data file, i.e., setting a modifier for the exported values. If the field contains delimiters (e.g., commas), this field is often enclosed by double quotes.

    • OPTIONALLY: an optional modifier that indicates that the enclosing characters are not mandatory. When you use the LOAD DATA statement, this means that some fields may be enclosed while others are not. This does not affect the normal operation of the statement.
  • ESCAPED BY 'char': specifies the escape character for escaping special characters in the data file, namely, the characters to be ignored for exported values. For example, if a field value contains a double quotation mark character, this character must be escaped by preceding it with a backslash ("Hello \"world\"").

For example, you can use the FIELDS keyword to configure the field handling mode of the LOAD DATA statement, the TERMINATED BY keyword to set the delimiter for fields (comma , in this example), the OPTIONALLY ENCLOSED BY keyword to specify that field values can be enclosed with double quotation marks ("), and the ESCAPED BY keyword to specify the backslash (\) as the escape character to handle special characters in field values.

Notice

The following SQL statement is an example of the field_opt option of the LOAD DATA statement. It is not a complete LOAD DATA statement, so you cannot execute it.

...
FIELDS 
    TERMINATED BY ','
    OPTIONALLY ENCLOSED BY '"'
    ESCAPED BY '\\'
...

line_opt

  • LINES: keyword that indicates the following parameters in the LOAD DATA statement apply to the rows in the file.
  • STARTING BY 'string': specifies the start delimiter of a row, which is the identifier of the start of each row.
  • TERMINATED BY 'string': specifies the end delimiter of a row, which is the identifier of the end of each row.

For example, you can set the start delimiter of each row to the specific string ***, namely, the data in each row starts with ***. You can set the end delimiter of a row to the line break character (\n).

Notice

The following SQL statement is an example of the line_opt option in the LOAD DATA statement. It is not a complete LOAD DATA statement, so you cannot execute it.

...
LINES
    STARTING BY '***'
    TERMINATED BY '\n'
...

load_set_element

column_definition_ref PARSER_SYNTAX_ERROR {expr | DEFAULT}: specifies a single setting or conversion operation.

  • column_definition_ref: the name of the destination table column. The data in this column will be set or converted during the import process.

  • PARSER_SYNTAX_ERROR: keyword that indicates a syntax error occurred in the parser.

  • {expr | DEFAULT}:

    • expr: an expression that defines how the values in the column are calculated or converted. The expression can operate on the original values in the data file. For example, you can use an expression to concatenate two fields or perform a mathematical operation on a numeric field.
    • DEFAULT: specifies that the values in the column are set to the default values. The default values are usually specified when the column is defined.

For example, set the value of the col_name1 column to xxxx for each imported data record, and set the value of the col_name2 column to its default value.

Notice

The following SQL statement is an example of the load_set_element option in the LOAD DATA statement. It is not a complete LOAD DATA statement, so you cannot execute it.

...
SET col_name1 = 'xxxx', col_name2 = DEFAULT;
...

load_data_extended_option

Notice

The load_data_extended_option option is supported only in syntax. It does not take effect during execution.

  • LOGFILE [=] string_value: specifies a log file where specific events or errors that occur during data loading are recorded. string_value is the path and file name of the destination log file. For example, LOGFILE = 'load_data.log' records the log information in the file named load_data.log.

  • REJECT LIMIT [=] int_num: specifies the maximum number of errors allowed before the LOAD DATA operation is terminated. int_num is an integer that represents the number of errors. If more than this number of errors occur during data loading, the entire loading operation will fail. For example, REJECT LIMIT = 10 will stop loading data after 10 errors occur.

  • BADFILE [=] string_value: Similar to the LOGFILE parameter, it is used to specify a file that stores data rows identified as invalid or unable to be imported during the data loading process. The string_value is the path and name of the file where these data rows are stored. For example, BADFILE = 'bad_data.txt' places the bad data rows into the bad_data.txt file.

Examples

Example 1: Import data from a file on the server (OBServer node)

  1. Set the global secure path.

    Notice

    For security reasons, when you set the system variable secure_file_priv, you can connect to the database only through a local socket to execute the SQL statement that modifies the global variable. For more information, see secure_file_priv.

    SET GLOBAL secure_file_priv = "/";
    
  2. Log out.

    Note

    Since secure_file_priv is a GLOBAL variable, you must execute \q to exit and make the change take effect.

    obclinet> \q
    
  3. Reconnect to the database and use the LOAD DATA statement to import data.

    • Perform a regular import.

      LOAD DATA 
        INFILE '/home/admin/test.csv'
        INTO TABLE t1
        FIELDS
          TERMINATED BY ','
          ENCLOSED BY '"'
          LINES TERMINATED BY '\n';
      
    • Use the APPEND hint to enable direct load.

      LOAD DATA /*+ PARALLEL(4) APPEND */
        INFILE '/home/admin/test.csv'
        INTO TABLE t1
        FIELDS
          TERMINATED BY ','
          ENCLOSED BY '"'
          LINES TERMINATED BY '\n';
      

Example 2: Import data from a file in Alibaba Cloud OSS

Use the direct(bool, int) hint to enable direct load. The file to be imported can be located in Alibaba Cloud OSS.

LOAD DATA /*+ direct(true,1024) parallel(16) */
  REMOTE_OSS INFILE 'oss://antsys-oceanbasebackup/backup_rd/xiaotao.ht/lineitem2.tbl?host=***.oss-cdn.***&access_id=***&access_key=***'
  INTO TABLE tbl1
  FIELDS
    TERMINATED BY ',';

References

Contact Us