Description
The statement is used to import data from an external source.
In OceanBase Database, the LOAD DATA statement can import files from OceanBase Database and Object Storage Service (OSS) to OBServer nodes.
Note
- The current version of OceanBase Database does not support using the
LOCAL INFILEsyntax to load data files. - Tables with triggers are not supported.
The LOAD DATA statement can import text files in CSV format from external sources. The import process consists of the following steps:
Parse the file: OceanBase Database reads data from the file with the name specified by the user and parses data in the file in parallel or serially based on the specified degree of parallelism.
Distribute data: As a distributed database, OceanBase Database distributes data across OBServer nodes. After data is parsed, the
LOAD DATAstatement calculates where the data should be sent based on the distribution rules of OceanBase Database.Insert data: After the data is received by the target OBServer node, the node executes an
INSERTstatement to insert the data into the corresponding partition.
To import data from external files, you must have the FILE privilege. You can use the GRANT FILE ON *.* TO $user_name; statement to grant the privilege to the user who needs to execute the LOAD DATA statement, where $user_name is the name of the user who needs to execute the LOAD DATA statement.
Syntax
LOAD DATA
[ [/*+ PARALLEL(N) load_batch_size(M) APPEND */]
| [/*+ PARALLEL(N) direct(bool, int) */] ]
[REMOTE_OSS] INFILE 'file_name'
INTO TABLE table_name
[{FIELDS | COLUMNS}
[TERMINATED BY 'string']
[[OPTIONALLY] ENCLOSED BY 'char']
[ESCAPED BY 'char']
]
[LINES
[STARTING BY 'string']
[TERMINATED BY 'string']
]
[IGNORE number {LINES | ROWS}]
[(column_name_var
[, column_name_var] ...)]
Parameter description
| Parameter | Description |
|---|---|
| parallel(N) | The degree of parallelism for loading data. If N is specified, the default value is 4. |
| load_batch_size(M) | The number of rows to be inserted at a time. If M is specified, the default value is 100. We recommend that you set this parameter to a value within the range of [100,1000]. |
| APPEND | Use the hint to enable direct load, which allocates space and writes data directly in the data file. By default, the APPEND hint is equivalent to direct(true, 0). Additionally, it enables online collection of optimizer statistics (GATHER_OPTIMIZER_STATISTICS hint). |
| direct | Use the hint to enable direct load. direct(bool, int) The bool parameter specifies whether to sort the given CSV file, where true indicates that sorting is required. The int parameter specifies the maximum number of allowed error rows. |
| REMOTE_OSS | Specifies whether to read data from the OSS file system.
NoticeIf you specify this parameter, the |
| file_name | The path and name of the input file. The format of the file_name parameter is as follows:
NoteMake sure that the following information is correct before you import a file from OSS:
|
| table_name | The name of the table to which the imported data is to be loaded. The table can be a partitioned or non-partitioned table. |
| FIELDS | COLUMNS | The formats of fields.
|
| LINES STARTING BY | The starting delimiter of a line. |
| LINES TERMINATED BY | The ending delimiter of a line. |
| IGNORE number { LINES | ROWS } | Specifies to ignore the first several lines. If LINES is specified, the first several lines in the file are ignored. If ROWS is specified, the first several lines specified by the field delimiter are ignored. By default, the system matches each field in the input file with each column in the table. If the input file does not contain all the columns, the missing columns are filled with default values, which are described in the following example:
|
| column_name_var | The name of the imported column. |
Examples
Example 1: Import data from a file on the server (OBServer node)
Set the global security path.
Notice
For security reasons, when you set the system variable
secure_file_priv, you can connect to the database only through a local socket to execute the SQL statement that modifies the global variable. For more information, see secure_file_priv.obclient> SET GLOBAL secure_file_priv = "/";Log out.
Note
After you set the
secure_file_privvariable, you must exit and then log in again for the change to take effect, becausesecure_file_privis aGLOBALvariable.obclinet> \qThe return result is as follows:
ByeReconnect to the database and use the
LOAD DATAstatement to import data.Perform a regular import.
obclient> LOAD DATA INFILE '/home/admin/test.csv' INTO TABLE t1;Use the
APPENDhint to enable direct load.LOAD DATA /*+ PARALLEL(4) APPEND */ INFILE '/home/admin/test.csv' INTO TABLE t1;
Example 2: Import data from an OSS file
Notice
When you use an object storage path, make sure that the values of the parameters in the object storage path are composed of uppercase and lowercase letters, digits, and the following characters: \/-_$+=. Otherwise, the setting may fail.
Use the direct(bool, int) hint to enable direct load for a file that is stored in OSS.
LOAD DATA /*+ direct(true,1024) parallel(16) */ REMOTE_OSS INFILE 'oss://antsys-oceanbasebackup/backup_rd/xiaotao.ht/lineitem2.tbl?host=***.oss-cdn.***&access_id=***&access_key=***' INTO TABLE tbl1 FIELDS TERMINATED BY ',';