Description
The statement is used to import data from an external source.
Notice
- Tables with triggers are not supported.
- To import data from an external file, you need to have the
FILEprivilege and meet the following requirements:- When you import a file from the server, you must set the system variable secure_file_priv in advance to specify the path that is accessible for importing or exporting files.
- When you import a local file from the client, you must add the
--local-infile[=1]option when you start the MySQL/[OBClient](https://www.oceanbase.com/softwarecenter-cloud) client to enable the data loading feature from the local file system.
The LOAD DATA statement in OceanBase Database allows you to load data from the following types of input files:
Server-side (OBServer node) files: files stored on the OBServer nodes of OceanBase Database. You can use the
LOAD DATA INFILEstatement to load data from server-side files to database tables.Client-side (local) files: files stored in the local file system of the client. You can use the
LOAD DATA LOCAL INFILEstatement to load data from client-side files to database tables.Note
When OceanBase Database executes the
LOAD DATA LOCAL INFILEstatement, the system automatically adds theIGNOREoption.OSS files: files stored in the OSS file system. You can use the
LOAD DATA REMOTE_OSS INFILEstatement to load data from OSS files to database tables.
The LOAD DATA statement currently supports importing data from CSV files. The entire import process consists of the following steps:
Parse the file: OceanBase Database reads data from the file based on the specified file name, and then parses the data in the file in parallel or in series based on the specified parallelism.
Distribute data: As a distributed database, OceanBase Database distributes data in partitions across OBServer nodes. The
LOAD DATAstatement determines which OBServer node the parsed data should be sent to for further processing.Insert data: When the data is received by the target OBServer node, the node executes an
INSERTstatement to insert the data into the corresponding partition on the node.
Syntax
LOAD DATA
[/*+ PARALLEL(N) load_batch_size(M) APPEND */]
|[/*+ PARALLEL(N) direct(bool, int) */]
[REMOTE_OSS | LOCAL] INFILE 'file_name'
[REPLACE | IGNORE]
INTO TABLE table_name
[{FIELDS | COLUMNS}
[TERMINATED BY 'string']
[[OPTIONALLY] ENCLOSED BY 'char']
[ESCAPED BY 'char']
]
[LINES
[STARTING BY 'string']
[TERMINATED BY 'string']
]
[IGNORE number {LINES | ROWS}]
[(column_name_var
[, column_name_var] ...)]
Parameters
| Parameter | Description |
|---|---|
| parallel(N) | The degree of parallelism for data loading. The default value of N is 4. |
| load_batch_size(M) | The batch size for insertion. The default value of M is 100. We recommend that you set this parameter to a value within the range of [100,1000]. |
| APPEND | Specifies the hint for enabling direct load, which allocates space and writes data directly into data files. By default, the APPEND hint is equivalent to direct(true, 0). It also enables online statistics collection with the GATHER_OPTIMIZER_STATISTICS hint. |
| direct | Specifies the hint for enabling direct load. The direct(bool, int) parameter specifies whether to sort the CSV file and the maximum number of tolerated error lines. |
| REMOTE_OSS | LOCAL | Optional.
|
| file_name | The path and name of the input file. The format of the file_name parameter depends on where the import file is located:
NoteTo import a file from OSS, make sure that the following conditions are met:
|
| REPLACE | IGNORE | Specifies the action to take when a unique key conflict occurs. If you specify REPLACE, conflicting rows are overwritten. If you specify IGNORE, conflicting rows are ignored. LOAD DATA determines whether data is duplicated based on the primary key of the table. If the table does not have a primary key, the REPLACE and IGNORE options are equivalent. By default, if a duplicate is found, LOAD DATA writes the error records to a log file.
Notice
|
| table_name | The name of the table to which the imported data is to be loaded. The table can be a partitioned or non-partitioned table. |
| FIELDS | COLUMNS | The format of fields.
|
| LINES STARTING BY | The starting delimiter of lines. |
| LINES TERMINATED BY | The ending delimiter of lines. |
| IGNORE number { LINES | ROWS } | The number of lines to be ignored at the beginning of the file. LINES indicates lines in the file, and ROWS indicates lines of data separated by field separators. By default, LOAD DATA matches each field in the input file with a column in the table. If the input file does not contain all the columns, the missing columns are filled with default values, which are described as follows:
|
| column_name_var | The name of the column to be imported. |
Considerations
Wildcard rules for direct load
The Hint direct keyword is specified in the LOAD DATA statement to indicate whether to perform direct load. When performing a direct load from files stored in the cloud object storage service (OSS), you can specify only one file. To import multiple files, you must use multiple load data statements. To facilitate importing multiple files in the cluster file system (server_disk), the file wildcard feature is introduced for server_disk and OSS file sources, but not for client_disk.
Perform direct load in the server_disk file system
server_disk examples:
Match files:
load data /*+ parallel(20) direct(true, 0) */ infile '/xxx/test.*.csv' replace into table t1 fields terminated by '|';Match directories:
load data /*+ parallel(20) direct(true, 0) */ infile '/aaa*bb/test.1.csv' replace into table t1 fields terminated by '|';Match both directories and filenames:
load data /*+ parallel(20) direct(true, 0) */ infile '/aaa*bb/test.*.csv' replace into table t1 fields terminated by '|';
server_disk considerations:
At least one matching file must exist. Otherwise, the error
OB_FILE_NOT_EXISTis returned.For
load data /*+ parallel(20) direct(true, 0) */ infile '/xxx/test.1*.csv,/xxx/test.6*.csv' replace into table t1 fields terminated by '|';,/xxx/test.1*.csv,/xxx/test.6*.csvis considered as a whole for matching. If no file matches the pattern, the errorOB_FILE_NOT_EXISTis returned.Only POSIX GLOB functions supported wildcards are supported. For example,
test.6*(6|0).csvandtest.6*({0.csv,6.csv}|.csv)can be matched by the GLOB function because they can be found by using thelscommand. However, if you specify such a pattern that cannot be matched, the errorOB_FILE_NOT_EXISTis returned.
Perform direct load from cloud object storage service (OSS)
oss examples:
- Match files:
load data /*+ parallel(20) direct(true, 0) */ remote_oss infile 'oss://xxx/test.*.csv?host=xxx&access_id=xxx&access_key=xxx' replace into table t1 fields terminated by '|';
- Match files:
oss considerations:
Directory matching is not supported. For example,
load data /*+ parallel(20) direct(true, 0) */ remote_oss infile 'oss://aa*bb/test.*.csv?host=xxx&access_id=xxx&access_key=xxx' replace into table t1 fields terminated by '|';returns the errorOB_NOT_SUPPORTED.Only
*and?are supported as wildcard characters in filenames. Other wildcard characters, although allowed, cannot match any files.
For more information, see Import data or files by using the LOAD DATA statement
Examples
Import data from a server file
Example 1: Import data from a server file.
Set the global security path.
obclient> SET GLOBAL secure_file_priv = "/" Query OK, 0 rows affected obclinet> \q ByeNote
Since
secure_file_privis aGLOBALvariable, you need to execute\qto exit and make the setting take effect.Reconnect to the database and import data from an external file.
obclient> LOAD DATA INFILE 'test.sql' INTO TABLE t1; Query OK, 0 rows affected
Example 2: Use the APPEND hint to enable direct load.
LOAD DATA /*+ PARALLEL(4) APPEND */
INFILE '/home/admin/a.csv'
INTO TABLE t;
Example 3: Use the direct(bool, int) hint to enable direct load. The direct load file can be stored in Alibaba Cloud OSS.
load data /*+ parallel(1) direct(false,0)*/ remote_oss infile 'oss://antsys-oceanbasebackup/backup_rd/xiaotao.ht/lineitem2.tbl?host=***.oss-cdn.***&access_id=***&access_key=***' into table lineitem fields terminated by '|' enclosed by '' lines starting by '' terminated by '\n';
Import data from a local client file
Example 4: Use the following statement to import data from a local file to a table in OceanBase Database.
Open the terminal or command prompt window and enter the following command to start the client.
obclient --local-infile -hxxx.xxx.xxx.xxx -P2881 -uroot@mysql001 -p****** -A -DtestThe return result is as follows:
Welcome to the OceanBase. Commands end with ; or \g. Your OceanBase connection id is 3221719526 Server version: OceanBase 4.2.2.0 (r100000022023121309-f536833402c6efe9364d5a4b61830a858ef24d82) (Built Dec 13 2023 09:58:18) Copyright (c) 2000, 2018, OceanBase and/or its affiliates. All rights reserved. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. obclient [test]>Notice
To use the
LOAD DATA LOCAL INFILEfeature, you must use OBClient V2.2.4 or later. If you do not have OBClient V2.2.4 or later, you can also use a MySQL client to connect to the database.In the client, execute the
LOAD DATA LOCAL INFILEstatement to load the local data file.obclient [test]> LOAD DATA LOCAL INFILE '/home/admin/test_data/tbl1.csv' INTO TABLE tbl1 FIELDS TERMINATED BY ',';The return result is as follows:
Query OK, 3 rows affected Records: 3 Deleted: 0 Skipped: 0 Warnings: 0
Import data from an OSS file
Notice
When you use an object storage path, make sure that the values of its parameters are in uppercase or lowercase letters, numbers, and contain only the following characters: \/-_$+=, and wildcard characters. If the characters other than the aforesaid ones are included, the setting may fail.
Use the direct(bool, int) hint to enable direct load, and the imported file can be stored in Alibaba Cloud OSS.
LOAD DATA /*+ direct(true,1024) parallel(16) */ REMOTE_OSS INFILE 'oss://antsys-oceanbasebackup/backup_rd/xiaotao.ht/lineitem2.tbl?host=***.oss-cdn.***&access_id=***&access_key=***' INTO TABLE tbl1 FIELDS TERMINATED BY ',';
References
- For more information about how to connect to OceanBase Database, see Overview of connection methods.
- For more information about examples that use the
LOAD DATAstatement, see Import data by using the LOAD DATA statement. - For more information about examples that use direct load, see Import data by using the LOAD DATA statement.