Description
The statement is used to import data from an external source.
In OceanBase Database, the LOAD DATA statement can load input files from OceanBase Database and OBServer nodes in Alibaba Cloud OSS.
Note
- The current version of OceanBase Database does not support using the
LOCAL INFILEsyntax to load data files. - Tables with triggers are not supported.
The LOAD DATA statement can currently import text files in the CSV format. The entire import process consists of the following steps:
Parse the file: OceanBase Database reads data from the file with the name specified by the user and parses data in the file in parallel or in series based on the specified degree of parallelism.
Distribute data: As a distributed database, OceanBase Database distributes data across OBServer nodes. After data is parsed, the
LOAD DATAstatement calculates where to send the data.Insert data: After the target OBServer node receives the data, it executes the
INSERTstatement to insert the data into the corresponding partition.
To import data from external files, you must have the FILE privilege. You can use the GRANT FILE ON *.* TO $user_name; statement to grant the privilege to the user who needs to execute the LOAD DATA statement, where $user_name is the name of the user who needs to execute the LOAD DATA statement.
Syntax
LOAD DATA
[ [/*+ PARALLEL(N) load_batch_size(M) APPEND */]
| [/*+ PARALLEL(N) direct(bool, int) */] ]
[REMOTE_OSS] INFILE 'file_name'
INTO TABLE table_name
[{FIELDS | COLUMNS}
[TERMINATED BY 'string']
[[OPTIONALLY] ENCLOSED BY 'char']
[ESCAPED BY 'char']
]
[LINES
[STARTING BY 'string']
[TERMINATED BY 'string']
]
[IGNORE number {LINES | ROWS}]
[(column_name_var
[, column_name_var] ...)]
Parameter description
| Parameter | Description |
|---|---|
| parallel(N) | The degree of parallelism for loading data. If N is specified, it must be an integer greater than 0. The default value is 4. |
| load_batch_size(M) | The number of rows to be inserted in one batch. If M is specified, it must be an integer greater than 0. The default value is 100. We recommend that you set this parameter to a value between 100 and 1000. |
| APPEND | Use this hint to enable direct load, which allocates space and writes data directly in the data file. By default, the APPEND hint is equivalent to the direct(true, 0) hint. It also enables online collection of optimizer statistics using the GATHER_OPTIMIZER_STATISTICS hint. |
| direct | Use this hint to enable direct load. The direct(bool, int) parameter specifies whether to sort the given CSV file and the maximum number of allowed error rows. |
| REMOTE_OSS | Specifies whether to read data from the OSS file system.
NoticeIf you specify this parameter, the |
| file_name | The path and name of the input file. The format of the file_name parameter is as follows:
NoteBefore you import a file from OSS, make sure that:
|
| table_name | The name of the table to which the imported data is loaded. The table can be a partitioned or non-partitioned table. |
| FIELDS | COLUMNS | The formats of fields.
|
| LINES STARTING BY | The starting delimiter of a line. |
| LINES TERMINATED BY | The ending delimiter of a line. |
| IGNORE number { LINES | ROWS } | Specifies to ignore the first several lines. LINES indicates several lines in the file, and ROWS indicates several rows separated by field delimiters. By default, the system matches each field in the input file with the corresponding column in the table. If the input file does not contain all the columns in the table, the missing columns are filled with default values, which are described in the following example.
|
| column_name_var | The name of the column to be imported. |
Examples
Example 1: Import data from a file on the server (OBServer node)
Set the global secure path.
Notice
For security reasons, when you set the system variable
secure_file_priv, you can connect to the database only through a local socket to execute the SQL statement that modifies the global variable. For more information, see secure_file_priv.obclient> SET GLOBAL secure_file_priv = "";Log out.
Note
Since
secure_file_privis aGLOBALvariable, you must execute\qto log out for the modification to take effect.obclinet> \qThe return result is as follows:
ByeReconnect to the database and use the
LOAD DATAstatement to import data.Perform a regular import.
obclient> LOAD DATA INFILE '/home/admin/test.csv' INTO TABLE t1;Use the
APPENDhint to enable direct load.LOAD DATA /*+ PARALLEL(4) APPEND */ INFILE '/home/admin/test.csv' INTO TABLE t1;
Example 2: Import data from an OSS file
Notice
When you use an object storage path, make sure that the values of the parameters in the object storage path are composed of uppercase and lowercase letters, digits, and the following characters: \/-_$+=. Otherwise, the setting may fail.
Use the direct(bool, int) hint to enable direct load, where the imported file can be stored in Alibaba Cloud OSS.
LOAD DATA /*+ direct(true,1024) parallel(16) */ REMOTE_OSS INFILE 'oss://antsys-oceanbasebackup/backup_rd/xiaotao.ht/lineitem2.tbl?host=***.oss-cdn.***&access_id=***&access_key=***' INTO TABLE tbl1 FIELDS TERMINATED BY ',';