Purpose
You can use this statement to import data from an external file.
Currently, OceanBase Database does not support the LOCAL INFILE statement that is used to load data files. This is because OceanBase Database is a distributed database in which data is read and written by the leader and CSV files must be uploaded to the server where the leader is located before they can be loaded. Therefore, a NAS file system is required for storing the files. In OceanBase Database, you can use the LOAD DATA statement only to load the local input file on an OBServer. Therefore, you must copy a file to an OBServer before importing it.
You can use the LOAD DATA statement to import a CSV text file in the following process:
Parse the file: OceanBase Database reads data from a file based on the file name that you enter and determines whether to perform parallel or serial parsing of data from the input file based on the specified degree of parallelism.
Distribute the data: OceanBase Database is a distributed database. Data of each partition may be distributed across different OBServers. The
LOAD DATAstatement is used to process the parsed data and determine to which OBServer the data is sent.Insert the data: After the destination OBServer receives the data, it executes the
INSERTstatement to insert the data into the corresponding partition.
To import data from an external file, you must have the FILE privilege.
Notice
If the
REPLACEorIGNOREclause is used and the degree of parallelism is greater than1, the finally inserted records of conflicting rows may differ from the results of serial execution. If you want to strictly guarantee the insertion of conflicting records, do not specify the degree of parallelism for the statement (or set the degree of parallelism to1).
Syntax
LOAD DATA
[/*+ parallel(N) load_batch_size(M)*/]
INFILE 'file_name'
[REPLACE IGNORE]
INTO TABLE table_name
[{FIELDS COLUMNS}
[TERMINATED BY 'string']
[[OPTIONALLY] ENCLOSED BY 'char']
[ESCAPED BY 'char']
]
[LINES
[STARTING BY 'string']
[TERMINATED BY 'string']
]
[IGNORE number {LINES ROWS}]
[(column_name_var
[, column_name_var] ...)]
Parameters
| Parameter | Description |
|---|---|
| parallel(N) | Specifies the degree of parallelism for loading data. The default value of N is 4. |
| load_batch_size(M) | Specify the batch size of each insertion. The default value of M is 1000. Recommended value range: [100, 1000]. |
| file_name | Specifies the path and name of the input file. Currently, you can load only local files from OBServers. |
| REPLACE IGNORE | If a unique key conflict occurs, REPLACE indicates that conflicting rows are overwritten, and IGNORE indicates that conflicting rows are ignored. The LOAD DATA statement checks whether a table contains duplicate data based on its primary key. If the table does not have a primary key, the REPLACE and IGNORE options are equivalent. If duplicate data exists, the LOAD DATA statement records the incorrect data to a log file by default. |
| table_name | Specifies the name of the table from which data is imported. Partitioned and non-partitioned tables are supported. |
| FIELDS COLUMNS | Specifies the format of the field.
|
| LINES STARTING BY | Specifies the start character of the line. |
| LINES TERMINATED BY | Specifies the end character of the line. |
| IGNORE number { LINES ROWS } | Ignore the first few lines. LINES indicates the first few lines of the file. ROWS indicates the first few rows of data specified by the field delimiter. By default, fields in the input file are mapped to columns in the destination table one by one. If the input file does not contain all the columns, the missing columns are filled based on the following rules:
|
| column_name_var | Specifies the name of the imported column. |
Examples
Import data from an external file.
Set a global security path.
obclient> SET GLOBAL secure_file_priv = "" Query OK, 0 rows affected (0.03 sec) obclinet> \q ByeImport data from an external file.
obclient> LOAD DATA INFILE 'test.sql' INTO TABLE t1; Query OK, 0 rows affected (0.01 sec)