LOAD DATA |V3.2.4|OceanBase Database| docs|Distributed Database

LOAD DATA

Last Updated：2024-10-23 06:02:36 Updated

Purpose

You can use this statement to import data from an external file.

Currently, OceanBase Database does not support the LOCAL INFILE statement that is used to load data files. This is because OceanBase Database is a distributed database in which data is read and written by the leader and CSV files must be uploaded to the server where the leader is located before they can be loaded. Therefore, a NAS file system is required for storing the files. In OceanBase Database, you can use the LOAD DATA statement only to load the local input file on an OBServer. Therefore, you must copy a file to an OBServer before importing it.

You can use the LOAD DATA statement to import a CSV text file in the following process:

Parse the file: OceanBase Database reads data from a file based on the file name that you enter and determines whether to perform parallel or serial parsing of data from the input file based on the specified degree of parallelism.
Distribute the data: OceanBase Database is a distributed database. Data of each partition may be distributed across different OBServers. The LOAD DATA statement is used to process the parsed data and determine to which OBServer the data is sent.
Insert the data: After the destination OBServer receives the data, it executes the INSERT statement to insert the data into the corresponding partition.

To import data from an external file, you must have the FILE privilege.

Notice

If the REPLACE or IGNORE clause is used and the degree of parallelism is greater than 1, the finally inserted records of conflicting rows may differ from the results of serial execution. If you want to strictly guarantee the insertion of conflicting records, do not specify the degree of parallelism for the statement (or set the degree of parallelism to 1).

Syntax

LOAD DATA
     [/*+ parallel(N) load_batch_size(M)*/]
    INFILE 'file_name'
    [REPLACE  IGNORE]
    INTO TABLE table_name
    [{FIELDS  COLUMNS}
        [TERMINATED BY 'string']
        [[OPTIONALLY] ENCLOSED BY 'char']
        [ESCAPED BY 'char']
    ]
    [LINES
        [STARTING BY 'string']
        [TERMINATED BY 'string']
    ]
    [IGNORE number {LINES  ROWS}]
    [(column_name_var
        [, column_name_var] ...)]

Parameters

Parameter	Description
parallel(N)	Specifies the degree of parallelism for loading data. The default value of `N` is `4`.
load_batch_size(M)	Specify the batch size of each insertion. The default value of `M` is `1000`. Recommended value range: [100, 1000].
file_name	Specifies the path and name of the input file. Currently, you can load only local files from OBServers.
REPLACE IGNORE	If a unique key conflict occurs, `REPLACE` indicates that conflicting rows are overwritten, and `IGNORE` indicates that conflicting rows are ignored. The `LOAD DATA` statement checks whether a table contains duplicate data based on its primary key. If the table does not have a primary key, the `REPLACE` and `IGNORE` options are equivalent. If duplicate data exists, the `LOAD DATA` statement records the incorrect data to a log file by default.
table_name	Specifies the name of the table from which data is imported. Partitioned and non-partitioned tables are supported.
FIELDS COLUMNS	Specifies the format of the field. `ENCLOSED BY`: specifies the modifier of the exported value. `TERMINATED BY`: specifies the end character of the exported column. `ESCAPED BY` specifies the characters ignored for the exported value.
LINES STARTING BY	Specifies the start character of the line.
LINES TERMINATED BY	Specifies the end character of the line.
IGNORE number { LINES ROWS }	Ignore the first few lines. `LINES` indicates the first few lines of the file. `ROWS` indicates the first few rows of data specified by the field delimiter. By default, fields in the input file are mapped to columns in the destination table one by one. If the input file does not contain all the columns, the missing columns are filled based on the following rules: Character data type: null string Numeric data type: 0 Date data type: `0000-00-00`
column_name_var	Specifies the name of the imported column.

Examples

Import data from an external file.

Set a global security path.

obclient> SET GLOBAL secure_file_priv = ""
Query OK, 0 rows affected (0.03 sec)
obclinet> \q
Bye

Import data from an external file.

obclient> LOAD DATA INFILE 'test.sql' INTO TABLE t1;
Query OK, 0 rows affected (0.01 sec)