Description
This statement is used to import data from an external source.
Notice
- Tables with triggers are not supported.
- To import data from an external file, you must have the
FILEprivilege and meet the following requirements:- When loading a server-side file, you must set the system variable secure_file_priv in advance to specify the path that is accessible for the user when importing or exporting files.
- When loading a local file, you must add the
--local-infile[=1]option when you start the MySQL client or OBClient to enable the data loading feature from the local file system.
The LOAD DATA statement in OceanBase Database supports loading data from the following types of input files:
Server-side files: files that are stored on the OBServer node. You can use the
LOAD DATA INFILEstatement to load data from server-side files into a database table.Client-side files: files that are stored in the local file system of the client. You can use the
LOAD DATA LOCAL INFILEstatement to load data from client-side files into a database table.Note
The MySQL mode of OceanBase Database V4.2.2 and later supports loading data from a local data file by using the
LOAD DATA LOCAL INFILEsyntax. When you execute theLOAD DATA LOCAL INFILEstatement, the system automatically adds theIGNOREoption.OSS files: files that are stored in an OSS file system. You can use the
LOAD DATA REMOTE_OSS INFILEstatement to load data from OSS files into a database table.
The LOAD DATA statement supports importing CSV files. The process is as follows:
Parse the file: OceanBase Database reads data from the file based on the file name that you entered and parses the data in the file, in parallel or in series, based on the specified degree of parallelism.
Distribute data: As a distributed database, OceanBase Database distributes data over OBServer nodes. The
LOAD DATAstatement calculates where to send the parsed data based on the distribution of data over OBServer nodes.Insert data: After the target OBServer node receives the data, it executes the
INSERTstatement to insert the data into the corresponding partition.
Syntax
LOAD DATA
[/*+ PARALLEL(N) load_batch_size(M) APPEND */]
|[/*+ PARALLEL(N) direct(bool, int) */]
[REMOTE_OSS | LOCAL] INFILE 'file_name'
[REPLACE | IGNORE]
INTO TABLE table_name
[{FIELDS | COLUMNS}
[TERMINATED BY 'string']
[[OPTIONALLY] ENCLOSED BY 'char']
[ESCAPED BY 'char']
]
[LINES
[STARTING BY 'string']
[TERMINATED BY 'string']
]
[IGNORE number {LINES | ROWS}]
[(column_name_var
[, column_name_var] ...)]
Parameter description
| Parameter | Description | |||||||
|---|---|---|---|---|---|---|---|---|
| parallel(N) | The degree of parallelism for loading data. If N is not specified, it is set to 4 by default. |
|||||||
| load_batch_size(M) | The number of rows to be inserted at a time. If M is not specified, it is set to 100 by default. We recommend that you set the value to an integer between 100 and 1000. |
|||||||
| APPEND | Enables direct load by using a hint. This allows space to be allocated and data to be written directly in the data file. By default, the APPEND hint is equivalent to direct(true, 0). It also enables online collection of optimizer statistics with the GATHER_OPTIMIZER_STATISTICS hint. |
|||||||
| direct | Enables direct load by using a hint. direct(bool, int) contains the following parameters:
|
|||||||
| REMOTE_OSS | LOCAL | Optional. Specifies whether to read data from the OSS file system or the local file system of the client. If this parameter is not specified, data is read from the file system of the server, that is, the OBServer node.
|
|||||||
| file_name | The path and name of the input file. The format of the file_name parameter is as follows:
NoteWhen you import a file from OSS, make sure of the following:
|
|||||||
| REPLACE | IGNORE | Specifies the action to take when a unique key conflict occurs. If you specify REPLACE, the conflicting row is overwritten. If you specify IGNORE, the conflicting row is ignored. By default, if a row to be loaded is identical to a row in the table, the LOAD DATA statement writes the identical row to a log file.
Notice
|
|||||||
| table_name | The name of the table to which the imported data is loaded. The table can be a partitioned or non-partitioned table. | |||||||
| FIELDS | COLUMNS | Specifies the format of fields.
|
|||||||
| LINES STARTING BY | The line delimiter. | |||||||
| LINES TERMINATED BY | The line terminator. | |||||||
| IGNORE number { LINES | ROWS } | The number of the first few lines to be ignored. LINES indicates the first few lines of the file, and ROWS indicates the first few data rows separated by field separators. By default, the system matches the columns in the input file one by one with the columns in the table. If the input file does not contain all the columns, the missing columns are filled with default values. The default values are described in the following table.
<th.Default value
|
|||||||
| column_name_var | The name of the column to be imported. |
Examples
Import data from a server file
Example 1: Import data from a server file.
Set the global security path.
obclient> SET GLOBAL secure_file_priv = "/" Query OK, 0 rows affected obclinet> \q ByeNote
Since
secure_file_privis aGLOBALvariable, you must enter\qto make the setting take effect.Reconnect to the database and import data from an external file.
obclient> LOAD DATA INFILE 'test.sql' INTO TABLE t1; Query OK, 0 rows affected
Example 2: Use the APPEND hint to enable direct load.
LOAD DATA /*+ PARALLEL(4) APPEND */
INFILE '/home/admin/a.csv'
INTO TABLE t;
Example 3: Use the direct(bool, int) hint to enable direct load. Files for direct load can be stored in Alibaba Cloud OSS.
load data /*+ parallel(1) direct(false,0)*/ remote_oss infile 'oss://antsys-oceanbasebackup/backup_rd/xiaotao.ht/lineitem2.tbl?host=***.oss-cdn.***&access_id=***&access_key=***' into table lineitem fields terminated by '|' enclosed by '' lines starting by '' terminated by '\n';
Import data from a local client file
Example 4: Use the following statement to import data from a local file to a table in OceanBase Database.
Open the terminal or command prompt window and enter the following command to start the client.
obclient --local-infile -hxxx.xxx.xxx.xxx -P2881 -uroot@mysql001 -p****** -A -DtestThe return result is as follows:
Welcome to the OceanBase. Commands end with ; or \g. Your OceanBase connection id is 3221719526 Server version: OceanBase 4.2.2.0 (r100000022023121309-f536833402c6efe9364d5a4b61830a858ef24d82) (Built Dec 13 2023 09:58:18) Copyright (c) 2000, 2018, OceanBase and/or its affiliates. All rights reserved. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. obclient [test]>Notice
To use the
LOAD DATA LOCAL INFILEfeature, you must use OBClient V2.2.4 or later. If you do not have the required version of OBClient, you can connect to the database using a MySQL client.In the client, execute the
LOAD DATA LOCAL INFILEstatement to load the local data file.obclient [test]> LOAD DATA LOCAL INFILE '/home/admin/test_data/tbl1.csv' INTO TABLE tbl1 FIELDS TERMINATED BY ',';The return result is as follows:
Query OK, 3 rows affected Records: 3 Deleted: 0 Skipped: 0 Warnings: 0
Import data from an OSS file
Notice
When you use an object storage path, separate the parameters in the object storage path with &. Make sure that the values of the parameters you entered contain only uppercase and lowercase letters, digits, and the following symbols: \/-_$+=. If you entered other characters, the setting may fail.
Use the direct(bool, int) hint to enable direct load. Files for direct load can be stored in Alibaba Cloud OSS.
LOAD DATA /*+ direct(true,1024) parallel(16) */ REMOTE_OSS INFILE 'oss://antsys-oceanbasebackup/backup_rd/xiaotao.ht/lineitem2.tbl?host=***.oss-cdn.***&access_id=***&access_key=***' INTO TABLE tbl1 FIELDS TERMINATED BY ',';
References
- For more information about how to connect to OceanBase Database, see Overview of connection methods.
- For more information about examples that use the
LOAD DATAstatement, see Import data by using the LOAD DATA statement. - For more information about examples of direct load by using the
LOAD DATAstatement, see Import data by using the LOAD DATA statement.