Description
The statement is used to import data from an external source.
Notice
- Tables with triggers are not supported.
- To import data from an external file, you need to have the
FILEprivilege and meet the following requirements:- When you import a server file, you must set the system variable secure_file_priv in advance to specify the path that is accessible for importing or exporting files.
- When you import a local file, you must add the
--local-infile[=1]option when you start the MySQL/[OBClient](https://www.oceanbase.com/softwarecenter-cloud) client to enable the data loading feature from the local file system.
The LOAD DATA statement in OceanBase Database allows you to load data from the following types of input files:
Server (OBServer) node files: files stored on the OBServer nodes of OceanBase Database. You can use the
LOAD DATA INFILEstatement to load data from server node files to database tables.Client (local) files: files stored in the local file system of the client. You can use the
LOAD DATA LOCAL INFILEstatement to load data from client local files to database tables.Note
When OceanBase Database executes the
LOAD DATA LOCAL INFILEstatement, the system automatically adds theIGNOREoption.OSS files: files stored in the OSS file system. You can use the
LOAD DATA REMOTE_OSS INFILEstatement to load data from OSS files to database tables.
The LOAD DATA statement currently supports importing data from CSV files. The entire import process consists of the following steps:
Parse the file: OceanBase Database reads data from the file based on the specified file name, and then parses the data in the file based on the specified degree of parallelism.
Distribute data: As a distributed database, OceanBase Database distributes data in partitions across OBServer nodes. The
LOAD DATAstatement calculates which OBServer node the parsed data should be sent to.Insert data: After the data is received by the target OBServer node, the node executes an
INSERTstatement to insert the data into the corresponding partition.
Syntax
LOAD DATA
[/*+ PARALLEL(N) [load_batch_size(M)] [APPEND | direct(bool, int, [load_mode])] */]
[REMOTE_OSS | LOCAL] INFILE 'file_name'
[REPLACE | IGNORE]
INTO TABLE table_name
[COMPRESSION [=] {AUTO|NONE|GZIP|DEFLATE|ZSTD}]
[{FIELDS | COLUMNS}
[TERMINATED BY 'string']
[[OPTIONALLY] ENCLOSED BY 'char']
[ESCAPED BY 'char']
]
[LINES
[STARTING BY 'string']
[TERMINATED BY 'string']
]
[IGNORE number {LINES | ROWS}]
[(column_name_var
[, column_name_var] ...)]
load_mode:
'full'
| 'inc_replace'
Parameter description
| Parameter | Description |
|---|---|
| parallel(N) | The degree of parallelism for loading data. The default value is 4. |
| load_batch_size(M) | The batch size for inserting data. The default value of M is 100. We recommend that you set this value to a number within the range of [100,1000]. |
| APPEND | direct() | The hint to enable direct load.
NoticeDuring a direct load task, we recommend that you do not upgrade OceanBase Database, because this may cause the direct load task to fail.
LOAD DATA statement to enable direct load, see Import data or files by using the LOAD DATA statement. |
| REMOTE_OSS | LOCAL | Optional.
|
| file_name | The path and name of the input file. The format of the file_name parameter is as follows:
NoteTo import a file from Alibaba Cloud OSS, make sure of the following:
|
| REPLACE | IGNORE | REPLACE specifies to overwrite the conflicting rows, and IGNORE specifies to ignore the conflicting rows. When you use the LOAD DATA statement, it determines whether the data is a duplicate based on the primary key of the table. If the table does not have a primary key, the REPLACE and IGNORE options are equivalent. By default, when duplicate data is encountered, the LOAD DATA statement writes the data to an error log file.
Notice
|
| table_name | The name of the table to which the imported data is to be imported.
|
| COMPRESSION | The compression format of the file. The valid values are as follows:
|
| FIELDS | COLUMNS | The format of fields.
|
| LINES STARTING BY | The starting delimiter of lines. |
| LINES TERMINATED BY | The ending delimiter of lines. |
| IGNORE number { LINES | ROWS } | The first several lines to be ignored. LINES specifies to ignore the first several lines of the file. ROWS specifies to ignore the first several data rows separated by field separators. By default, the system matches each field in the input file with a column in the table one by one. If the input file does not contain all the columns in the table, the missing columns are filled with default values. The default values are described in the following list:
|
| column_name_var | The name of the column to be imported. |
Wildcard rules for importing multiple files
To facilitate importing multiple files, the wildcard feature is introduced for server-side and OSS file imports. This feature is not applicable to client-side file imports.
Use of wildcards on the server
Matching rules:
File name matching:
load data /*+ parallel(20) direct(true, 0) */ infile '/xxx/test.*.csv' replace into table t1 fields terminated by '|';Directory matching:
load data /*+ parallel(20) direct(true, 0) */ infile '/aaa*bb/test.1.csv' replace into table t1 fields terminated by '|';Matching both directories and file names:
load data /*+ parallel(20) direct(true, 0) */ infile '/aaa*bb/test.*.csv' replace into table t1 fields terminated by '|';
Considerations:
At least one matching file must exist. Otherwise, error code 4027 is returned.
For the input
load data /*+ parallel(20) direct(true, 0) */ infile '/xxx/test.1*.csv,/xxx/test.6*.csv' replace into table t1 fields terminated by '|';,/xxx/test.1*.csv,/xxx/test.6*.csvwill be considered as a whole for matching. If no matching is found, error code 4027 will be returned.Only wildcards supported by the POSIX GLOB function can be used. For example,
test.6*(6|0).csvandtest.6*({0.csv,6.csv}|.csv)can be used, but they cannot be matched by the GLOB function. In this case, error code 4027 will be returned, even though these wildcards can be matched by thelscommand.
Use of wildcards in Cloud Object Storage Service (OSS)
Matching rules:
File name matching:
load data /*+ parallel(20) direct(true, 0) */ remote_oss infile 'oss://xxx/test.*.csv?host=xxx&access_id=xxx&access_key=xxx' replace into table t1 fields terminated by '|';Considerations:
Directory matching is not supported. For example,
load data /*+ parallel(20) direct(true, 0) */ remote_oss infile 'oss://aa*bb/test.*.csv?host=xxx&access_id=xxx&access_key=xxx' replace into table t1 fields terminated by '|';will returnOB_NOT_SUPPORTED.Only
*and?are supported for file name wildcards. Other wildcards, although allowed, cannot match any results.
Examples
Note
When you use \N to represent NULL values by using the LOAD DATA statement, the secure_file_priv variable must be set to a global variable. After you set the variable, you must exit MySQL by using the \q command for the setting to take effect.
Import data from a server file
Example 1: Import data from a server file.
Set the global security path.
obclient> SET GLOBAL secure_file_priv = "/" Query OK, 0 rows affected obclinet> \q ByeNote
After you set the
secure_file_privvariable, you must exit MySQL by using the\qcommand for the setting to take effect.Reconnect to the database and import data from an external file.
obclient> LOAD DATA INFILE 'test.sql' INTO TABLE t1; Query OK, 0 rows affected
Example 2: Use the APPEND hint to enable direct load.
LOAD DATA /*+ PARALLEL(4) APPEND */
INFILE '/home/admin/a.csv'
INTO TABLE t;
Import data from a local client file
Example 1: Import data from a local file to a table in OceanBase Database.
Open the terminal or command prompt and enter the following command to start the client.
obclient --local-infile -hxxx.xxx.xxx.xxx -P2881 -uroot@mysql001 -p****** -A -DtestThe return result is as follows:
Welcome to the OceanBase. Commands end with ; or \g. Your OceanBase connection id is 3221719526 Server version: OceanBase 4.2.2.0 (r100000022023121309-f536833402c6efe9364d5a4b61830a858ef24d82) (Built Dec 13 2023 09:58:18) Copyright (c) 2000, 2018, OceanBase and/or its affiliates. All rights reserved. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. obclient [test]>Note
To use the
LOAD DATA LOCAL INFILEfeature, you must use OBClient V2.2.4 or later. If you do not have OBClient of the required version, you can also use a MySQL client to connect to the database.In the client, execute the
LOAD DATA LOCAL INFILEstatement to load the local data file.obclient [test]> LOAD DATA LOCAL INFILE '/home/admin/test_data/tbl1.csv' INTO TABLE tbl1 FIELDS TERMINATED BY ',';The return result is as follows:
Query OK, 3 rows affected Records: 3 Deleted: 0 Skipped: 0 Warnings: 0
Example 2: Directly import a compressed file by setting the COMPRESSION parameter.
LOAD DATA LOCAL INFILE '/your/file/lineitem.tbl.gz'
INTO TABLE lineitem
COMPRESSION GZIP
FIELDS TERMINATED BY '|';
Import data from an OSS file
Note
When you use an object storage path, separate the parameters of the object storage path with &. Make sure that the values of the parameters contain only uppercase and lowercase letters, numbers, and the following characters: \/-_$+=. If the values contain other characters, the setting may fail.
Example 1: Use the direct(bool, int) hint to enable direct load, where the source file is stored in Alibaba Cloud OSS.
load data /*+ parallel(1) direct(false,0)*/ remote_oss infile 'oss://antsys-oceanbasebackup/backup_rd/xiaotao.ht/lineitem2.tbl?host=***.oss-cdn.***&access_id=***&access_key=***' into table lineitem fields terminated by '|' enclosed by '' lines starting by '' terminated by '\n';
References
- For more information about how to connect to OceanBase Database, see Overview of connection methods.
- For more information about examples that use the
LOAD DATAstatement, see Import data by using the LOAD DATA statement. - For more information about examples of direct load using the
LOAD DATAstatement, see Import data by using the LOAD DATA statement.