Description
The statement is used to import data from an external source.
In OceanBase Database, the LOAD DATA statement can load input files from OceanBase Database and Alibaba Cloud OSS, where the files are stored on OBServer nodes.
Note
- The current version of OceanBase Database does not support using the
LOCAL INFILEsyntax to load data files. - Tables with triggers are not supported.
The LOAD DATA statement can currently import text files in the CSV format. The entire import process consists of the following steps:
Parse the file: OceanBase Database reads data from the file based on the file name entered by the user and parses the data in the file in parallel or serially based on the specified degree of parallelism.
Distribute data: As OceanBase Database is a distributed database, data in partitions can be distributed across different OBServer nodes. The
LOAD DATAstatement calculates the destination OBServer node to which parsed data should be sent.Insert data: After the data is received by the destination OBServer node, the
INSERTstatement is executed locally to insert the data into the corresponding partition.
To import data from external files, you must have the FILE privilege. You can use the GRANT FILE ON *.* TO $user_name; statement to grant the privilege to the user, such as $user_name, who needs to execute the LOAD DATA statement.
Notice
If the parallelism is greater than 1 and you use the REPLACE or IGNORE clause, the last inserted record for a conflicted record may be different when the statement is executed in parallel and when it is executed serially. If you want to strictly ensure the insertion result for conflicted records, do not specify the parallelism for this statement or set the parallelism to 1.
Syntax
LOAD DATA
[ [/*+ PARALLEL(N) load_batch_size(M) APPEND */]
| [/*+ PARALLEL(N) direct(bool, int) */] ]
[REMOTE_OSS] INFILE 'file_name'
[REPLACE | IGNORE]
INTO TABLE table_name
[{FIELDS | COLUMNS}
[TERMINATED BY 'string']
[[OPTIONALLY] ENCLOSED BY 'char']
[ESCAPED BY 'char']
]
[LINES
[STARTING BY 'string']
[TERMINATED BY 'string']
]
[IGNORE number {LINES | ROWS}]
[(column_name_var
[, column_name_var] ...)]
Parameter description
| Parameter | Description |
|---|---|
| parallel(N) | The degree of parallelism for loading data. The default value of N is 4. |
| load_batch_size(M) | The number of rows to be inserted in one batch. The default value of M is 100. We recommend that you set this parameter to a value within the range of [100, 1000]. |
| APPEND | Use this hint to enable direct load, which allocates space and writes data directly in the data file. By default, the APPEND hint is equivalent to direct(true, 0). You can also use the GATHER_OPTIMIZER_STATISTICS hint to collect statistics online. |
| direct | Use this hint to enable direct load. The bool parameter in the direct(bool, int) specifies whether to sort the given CSV file, where true indicates that the file needs to be sorted, and the int parameter specifies the maximum number of error rows that can be tolerated. |
| REMOTE_OSS | Specifies whether to read data from the OSS file system.
NoticeIf you specify this parameter, the |
| file_name | The path and name of the input file. The format of the file_name parameter is described as follows:
NoteMake sure that the following information is correct before you import a file from the OSS file system:
|
| REPLACE | IGNORE | Specifies the action to take when a unique key conflict occurs. If you specify REPLACE, the conflicting row is overwritten. If you specify IGNORE, the conflicting row is ignored. By default, the LOAD DATA statement uses the primary key of a table to check for duplicate data. If the table does not have a primary key, the REPLACE and IGNORE options are equivalent. By default, when the LOAD DATA statement encounters duplicate data, it records the error in a log file. |
| table_name | The name of the table to which the imported data is loaded. The table can be a partitioned or non-partitioned table. |
| FIELDS | COLUMNS | Specifies the format of fields.
|
| LINES STARTING BY | Specifies the delimiter for the start of lines. |
| LINES TERMINATED BY | Specifies the delimiter for the end of lines. |
| IGNORE number { LINES | ROWS } | Specifies to ignore the first several lines. The keyword LINES indicates several lines in a file, and the keyword ROWS indicates several rows separated by field delimiters. By default, the LOAD DATA statement matches each field in the input file with each column in the table. If the input file does not contain all the columns, the missing columns are filled with default values that are specified in the following rules:
|
| column_name_var | The name of the column to be imported. |
Examples
Example 1: Import data from a file on the server (OBServer node)
Set the global secure path.
Notice
For security reasons, when you set the system variable
secure_file_priv, you can connect to the database only through a local socket to execute the SQL statement that modifies this global variable. For more information, see secure_file_priv.obclient> SET GLOBAL secure_file_priv = "/";Exit.
Note
After you set the
secure_file_privvariable, you must exit and then log in again for the change to take effect, becausesecure_file_privis aGLOBALvariable.obclinet> \qThe return result is as follows:
ByeReconnect to the database and use the
LOAD DATAstatement to import data.Perform a regular import.
obclient> LOAD DATA INFILE '/home/admin/test.csv' INTO TABLE t1;Use the
APPENDhint to enable direct load.LOAD DATA /*+ PARALLEL(4) APPEND */ INFILE '/home/admin/test.csv' INTO TABLE t1;
Example 2: Import data from an OSS file
Notice
When you use an object storage path, make sure that the values of the parameters in the object storage path are composed of uppercase and lowercase letters, digits, and the following symbols: \/-_$+=. Otherwise, the setting may fail.
Use the direct(bool, int) hint to enable direct load for a file that is stored in OSS.
LOAD DATA /*+ direct(true,1024) parallel(16) */ REMOTE_OSS INFILE 'oss://antsys-oceanbasebackup/backup_rd/xiaotao.ht/lineitem2.tbl?host=***.oss-cdn.***&access_id=***&access_key=***' INTO TABLE tbl1 FIELDS TERMINATED BY ',';