Description
The statement is used to import data from an external source.
Notice
- Tables with triggers are not supported.
- To import data from an external file, you need to have the
FILEprivilege and meet the following requirements:- When you import a server file, you must set the system variable secure_file_priv in advance to specify the path that is accessible for importing or exporting files.
- When you import a local file, you must add the
--local-infile[=1]option when you start the MySQL/[OBClient](https://www.oceanbase.com/softwarecenter-cloud) client to enable the data loading feature from the local file system.
The LOAD DATA statement in OceanBase Database allows you to load data from the following types of input files:
Server (OBServer) node files: files that are stored on the OBServer nodes of OceanBase Database. You can use the
LOAD DATA INFILEstatement to load data from server node files to database tables.Client (local) files: files that are stored in the local file system of the client. You can use the
LOAD DATA LOCAL INFILEstatement to load data from client local files to database tables.Note
When OceanBase Database executes the
LOAD DATA LOCAL INFILEstatement, the system automatically adds theIGNOREoption.OSS files: files that are stored in the OSS file system. You can use the
LOAD DATA REMOTE_OSS INFILEstatement to load data from OSS files to database tables.
The LOAD DATA statement currently supports importing data from CSV files. The entire import process consists of the following steps:
Parse the file: OceanBase Database reads data from the file based on the specified file name, and then parses the data in the file based on the specified degree of parallelism.
Distribute data: As OceanBase Database is a distributed database, the data in different partitions may be distributed on different OBServer nodes. The
LOAD DATAstatement calculates where the parsed data should be sent to based on the OBServer nodes.Insert data: After the data is received by the target OBServer node, an
INSERTstatement is executed locally to insert the data into the corresponding partition.
Syntax
LOAD DATA
[/*+ PARALLEL(N) [load_batch_size(M)] [APPEND | direct(bool, int, [load_mode])] */]
[REMOTE_OSS | LOCAL] INFILE 'file_name'
[REPLACE | IGNORE]
INTO TABLE table_name
[{FIELDS | COLUMNS}
[TERMINATED BY 'string']
[[OPTIONALLY] ENCLOSED BY 'char']
[ESCAPED BY 'char']
]
[LINES
[STARTING BY 'string']
[TERMINATED BY 'string']
]
[IGNORE number {LINES | ROWS}]
[(column_name_var
[, column_name_var] ...)]
load_mode:
'full'
| 'inc_replace'
Parameter description
| Parameter | Description |
|---|---|
| parallel(N) | The degree of parallelism for loading data. The default value of N is 4. |
| load_batch_size(M) | The batch size for inserting data. The default value of M is 100. We recommend that you set this parameter to a value within the range of [100,1000]. |
| APPEND | direct() | The hint to enable direct load.
LOAD DATA statement to perform direct load, see Import data or files by using the LOAD DATA statement. |
| REMOTE_OSS | LOCAL | Optional.
|
| file_name | The path and name of the input file. The format of file_name is described as follows:
NoteTo import a file from OSS, make sure of the following:
|
| REPLACE | IGNORE | REPLACE specifies to overwrite the conflicting row in case of a unique key conflict, and IGNORE specifies to ignore the conflicting row. When you use the LOAD DATA statement, it checks for duplicate data based on the primary key of the table. If the table does not have a primary key, the REPLACE and IGNORE options are equivalent. By default, if duplicate data is found, the LOAD DATA statement writes the data to an error file.
Notice
|
| table_name | The name of the table to which the imported data is to be imported. The table can be a partitioned or non-partitioned table. |
| FIELDS | COLUMNS | The format of fields.
|
| LINES STARTING BY | The starting delimiter of a line. |
| LINES TERMINATED BY | The ending delimiter of a line. |
| IGNORE number { LINES | ROWS } | The first several lines or rows of a file to be ignored. By default, the system matches the fields in the input file with the columns in the table one by one. If the input file does not contain all the columns in the table, the missing columns are filled with default values. The default values are described in the following table.
|
| column_name_var | The name of the column to be imported. |
Considerations
Wildcard rules for direct load
The Hint direct keyword is used in the LOAD DATA statement to specify whether to enable direct load. When direct load is enabled, only one file can be specified that is stored in Object Storage Service (OSS). To import multiple files, you must execute multiple load data statements. To facilitate importing multiple files in the cluster file system (server_disk), the file wildcard feature is introduced. This feature applies to file sources of the server_disk and OSS types, but not to client disks (client_disk).
Enable direct load for server_disk
Examples:
Matching files:
load data /*+ parallel(20) direct(true, 0) */ infile '/xxx/test.*.csv' replace into table t1 fields terminated by '|';Matching directories:
load data /*+ parallel(20) direct(true, 0) */ infile '/aaa*bb/test.1.csv' replace into table t1 fields terminated by '|';Matching both directories and filenames:
load data /*+ parallel(20) direct(true, 0) */ infile '/aaa*bb/test.*.csv' replace into table t1 fields terminated by '|';
Considerations:
At least one matching file must exist. Otherwise, the error
OB_FILE_NOT_EXISTis returned.For the input
load data /*+ parallel(20) direct(true, 0) */ infile '/xxx/test.1*.csv,/xxx/test.6*.csv' replace into table t1 fields terminated by '|';,/xxx/test.1*.csv,/xxx/test.6*.csvis considered as a whole for matching. If no file matches the pattern, the errorOB_FILE_NOT_EXISTis returned.Only wildcard characters supported by the POSIX GLOB function can be used. For example,
test.6*(6|0).csvandtest.6*({0.csv,6.csv}|.csv)can be used although they can be listed by using thelscommand. However, the GLOB function cannot match them, and the errorOB_FILE_NOT_EXISTis returned.
Enable direct load for oss
Examples:
- Matching files:
load data /*+ parallel(20) direct(true, 0) */ remote_oss infile 'oss://xxx/test.*.csv?host=xxx&access_id=xxx&access_key=xxx' replace into table t1 fields terminated by '|';
- Matching files:
Considerations:
Directory matching is not supported. For example, the statement
load data /*+ parallel(20) direct(true, 0) */ remote_oss infile 'oss://aa*bb/test.*.csv?host=xxx&access_id=xxx&access_key=xxx' replace into table t1 fields terminated by '|';returns the errorOB_NOT_SUPPORTED.Only the wildcard characters
*and?are supported for matching filenames. Other wildcard characters, although allowed, cannot match any files.
For more information, see Import data or files by using the LOAD DATA statement
Examples
Import data from a server file
Example 1: Import data from a server file.
Set the global secure path.
obclient> SET GLOBAL secure_file_priv = "/" Query OK, 0 rows affected obclinet> \q ByeNote
Since
secure_file_privis aGLOBALvariable, you need to execute\qto make the setting take effect.Reconnect to the database and import data from an external file.
obclient> LOAD DATA INFILE 'test.sql' INTO TABLE t1; Query OK, 0 rows affected
Example 2: Use the APPEND hint to enable direct load.
LOAD DATA /*+ PARALLEL(4) APPEND */
INFILE '/home/admin/a.csv'
INTO TABLE t;
Example 3: Use the direct(bool, int) hint to enable direct load. The direct load file can be stored in Alibaba Cloud OSS.
load data /*+ parallel(1) direct(false,0)*/ remote_oss infile 'oss://antsys-oceanbasebackup/backup_rd/xiaotao.ht/lineitem2.tbl?host=***.oss-cdn.***&access_id=***&access_key=***' into table lineitem fields terminated by '|' enclosed by '' lines starting by '' terminated by '\n';
Import data from a local client file
Example 4: Use the following statement to import data from a local file to a table in OceanBase Database.
Open the terminal or command prompt window and enter the following command to start the client.
obclient --local-infile -hxxx.xxx.xxx.xxx -P2881 -uroot@mysql001 -p****** -A -DtestThe return result is as follows:
Welcome to the OceanBase. Commands end with ; or \g. Your OceanBase connection id is 3221719526 Server version: OceanBase 4.2.2.0 (r100000022023121309-f536833402c6efe9364d5a4b61830a858ef24d82) (Built Dec 13 2023 09:58:18) Copyright (c) 2000, 2018, OceanBase and/or its affiliates. All rights reserved. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. obclient [test]>Notice
To use the
LOAD DATA LOCAL INFILEfeature, you must use OBClient V2.2.4 or later. If you do not have OBClient of the required version, you can also use a MySQL client to connect to the database.In the client, execute the
LOAD DATA LOCAL INFILEstatement to load the local data file.obclient [test]> LOAD DATA LOCAL INFILE '/home/admin/test_data/tbl1.csv' INTO TABLE tbl1 FIELDS TERMINATED BY ',';The return result is as follows:
Query OK, 3 rows affected Records: 3 Deleted: 0 Skipped: 0 Warnings: 0
Import data from an OSS file
Notice
When you use object storage path parameters, separate the parameters with & characters. Make sure that the values you enter for the parameters contain only uppercase and lowercase letters, numbers, and the following characters: \/-_$+=. If you enter characters other than the preceding ones, the setup may fail.
Use the direct(bool, int) hint to enable direct load, and the direct load file can be stored in Alibaba Cloud OSS.
LOAD DATA /*+ direct(true,1024) parallel(16) */ REMOTE_OSS INFILE 'oss://antsys-oceanbasebackup/backup_rd/xiaotao.ht/lineitem2.tbl?host=***.oss-cdn.***&access_id=***&access_key=***' INTO TABLE tbl1 FIELDS TERMINATED BY ',';
References
- For more information about how to connect to OceanBase Database, see Overview of connection methods.
- For more information about examples that use the
LOAD DATAstatement, see Import data by using the LOAD DATA statement. - For more information about examples of direct load using the
LOAD DATAstatement, see Import data by using the LOAD DATA statement.