Purpose
This statement is used to import data from an external file.
Notice
- Tables with triggers cannot be loaded using the
LOAD DATAstatement. - To import data from an external file, you must have the
FILEprivilege and the following settings:- When loading server-side files, you must set the system variable secure_file_priv to specify the path for accessing files during import or export.
- When loading client-side files, you must add the
--local-infile[=1]option when starting the MySQL/OBClient client to enable loading data from the local file system.
OceanBase Database supports the following input file types for the LOAD DATA statement:
Server-side (OBServer node) files: These files are stored on the OBServer node of OceanBase Database. You can use the
LOAD DATA INFILEstatement to load data from these files into a database table.Client-side (local) files: These files are stored on the local file system of the client. You can use the
LOAD DATA LOCAL INFILEstatement to load data from these files into a database table.Note
Starting from OceanBase Database V4.2.2 in MySQL mode, you can use the
LOAD DATA LOCAL INFILEstatement to load data from a local file. When you execute theLOAD DATA LOCAL INFILEstatement, the system automatically adds theIGNOREoption.OSS files: These files are stored on the OSS file system. You can use the
LOAD DATA REMOTE_OSS INFILEstatement to load data from these files into a database table.
The LOAD DATA statement can currently import CSV files. The entire import process can be divided into the following steps:
Parse the file: OceanBase Database reads the data from the file based on the file name provided by the user. It then determines whether to parse the data in parallel or sequentially based on the specified parallelism.
Distribute the data: Since OceanBase Database is a distributed database, data in different partitions may be stored on different OBServer nodes. The
LOAD DATAstatement calculates the parsed data and determines which OBServer node the data should be sent to.Insert the data: Once the target OBServer node receives the data, it executes the
INSERTstatement to insert the data into the corresponding partition.
Syntax
LOAD DATA
[/*+ PARALLEL(N) load_batch_size(M) APPEND */]
|[/*+ PARALLEL(N) direct(bool, int) */]
[REMOTE_OSS | LOCAL] INFILE 'file_name'
[REPLACE | IGNORE]
INTO TABLE table_name
[{FIELDS | COLUMNS}
[TERMINATED BY 'string']
[[OPTIONALLY] ENCLOSED BY 'char']
[ESCAPED BY 'char']
]
[LINES
[STARTING BY 'string']
[TERMINATED BY 'string']
]
[IGNORE number {LINES | ROWS}]
[(column_name_var
[, column_name_var] ...)]
Parameters
| Parameter | Description |
|---|---|
| parallel(N) | The degree of parallelism for data loading. N defaults to 4. |
| load_batch_size(M) | The batch size for each insert. M defaults to 100. The recommended value range is [100,1000]. |
| APPEND | Enables direct load by using the Hint. This option allows you to directly allocate space in the data file and write data. APPEND Hint is equivalent to using direct(true, 0) by default. It also supports the GATHER_OPTIMIZER_STATISTICS Hint for online statistics collection. |
| direct | Enables direct load by using the Hint. In direct(bool, int), the bool parameter indicates whether the CSV file needs to be sorted. true indicates that sorting is required. The int parameter indicates the maximum number of error rows that can be tolerated. |
| REMOTE_OSS | LOCAL | Optional.
|
| file_name | The path and name of the input file. file_name has the following format:
NoteWhen importing a file from OSS, ensure the following information:
|
| REPLACE | IGNORE | If a unique key conflict occurs, REPLACE indicates to overwrite the conflicting row, and IGNORE indicates to ignore the conflicting row. LOAD DATA determines whether data is duplicated based on the table's primary key. If the table does not have a primary key, the REPLACE and IGNORE options have no effect. By default, when duplicate data is encountered, LOAD DATA records the erroneous data in the log file.
Notice
|
| table_name | The name of the table to which data is imported. Supports partitioned and non-partitioned tables. |
| FIELDS | COLUMNS | Specifies the format of the fields.
|
| LINES STARTING BY | Specifies the row starter. |
| LINES TERMINATED BY | Specifies the row terminator. |
| IGNORE number { LINES | ROWS } | Ignores the specified number of rows. LINES indicates the specified number of rows in the file, and ROWS indicates the specified number of rows of data separated by field delimiters. By default, each field in the input file is matched with a column in the table. If the input file does not contain all the columns, the missing columns are filled in with the following rules:
|
| column_name_var | The name of the imported column. |
Considerations
Wildcard rules for direct load
In the LOAD DATA statement, use the Hint direct option to specify whether to perform a direct load. To facilitate multi-file loading, the wildcard feature is introduced, which applies to file sources of the server_disk and OSS types but not to the client_disk type.
How to use direct load for files on the server disk
server_disk example:
Match a file name:
load data /*+ parallel(20) direct(true, 0) */ infile '/xxx/test.*.csv' replace into table t1 fields terminated by '|';Match a directory:
load data /*+ parallel(20) direct(true, 0) */ infile '/aaa*bb/test.1.csv' replace into table t1 fields terminated by '|';Match both a directory and a file name:
load data /*+ parallel(20) direct(true, 0) */ infile '/aaa*bb/test.*.csv' replace into table t1 fields terminated by '|';
server_disk considerations:
At least one matching file must exist; otherwise, an error
OB_FILE_NOT_EXISTis returned.For the input
load data /*+ parallel(20) direct(true, 0) */ infile '/xxx/test.1*.csv,/xxx/test.6*.csv' replace into table t1 fields terminated by '|';,/xxx/test.1*.csv,/xxx/test.6*.csvis considered a whole match. If no match is found, an errorOB_FILE_NOT_EXISTis returned.Only POSIX-compatible wildcard patterns are supported, such as
test.6*(6|0).csvandtest.6*({0.csv,6.csv}|.csv). Although these patterns can be found using thelscommand, the GLOB function cannot match them, resulting in an errorOB_FILE_NOT_EXIST.
How to use direct load for files on the OSS
oss example:
- Match a file name:
load data /*+ parallel(20) direct(true, 0) */ remote_oss infile 'oss://xxx/test.*.csv?host=xxx&access_id=xxx&access_key=xxx' replace into table t1 fields terminated by '|';
- Match a file name:
oss considerations:
Directory matching is not supported. For example, the statement
load data /*+ parallel(20) direct(true, 0) */ remote_oss infile 'oss://aa*bb/test.*.csv?host=xxx&access_id=xxx&access_key=xxx' replace into table t1 fields terminated by '|';will returnOB_NOT_SUPPORTED.Only the
*and?wildcards are supported for file names. Other wildcards, although allowed, cannot match any results.
For more information, see Use the LOAD DATA statement to bypass data/file import.
Examples
Note
When you use the LOAD DATA statement to load data, you can use \N to represent NULL.
Import data from a file on the server
Example 1: Import data from a file on the server.
Set the global secure file path.
obclient> SET GLOBAL secure_file_priv = "/" Query OK, 0 rows affected obclinet> \q ByeNote
Since
secure_file_privis aGLOBALvariable, you need to execute\qto exit and make the setting effective.After reconnecting to the database, import data from an external file.
obclient> LOAD DATA INFILE 'test.sql' INTO TABLE t1; Query OK, 0 rows affected
Example 2: Use the APPEND hint to enable direct load.
LOAD DATA /*+ PARALLEL(4) APPEND */
INFILE '/home/admin/a.csv'
INTO TABLE t;
Example 3: Use the direct(bool, int) hint to enable direct load. The direct load file can be on the OSS.
load data /*+ parallel(1) direct(false,0)*/ remote_oss infile 'oss://antsys-oceanbasebackup/backup_rd/xiaotao.ht/lineitem2.tbl?host=***.oss-cdn.***&access_id=***&access_key=***' into table lineitem fields terminated by '|' enclosed by '' lines starting by '' terminated by '\n';
Import data from a local file
Example 4: Use the following statement to import data from a local file to a table in OceanBase Database.
Open the terminal or command prompt window and enter the following command to start the client.
obclient --local-infile -hxxx.xxx.xxx.xxx -P2881 -uroot@mysql001 -p****** -A -DtestThe return result is as follows:
Welcome to the OceanBase. Commands end with ; or \g. Your OceanBase connection id is 3221719526 Server version: OceanBase 4.2.2.0 (r100000022023121309-f536833402c6efe9364d5a4b61830a858ef24d82) (Built Dec 13 2023 09:58:18) Copyright (c) 2000, 2018, OceanBase and/or its affiliates. All rights reserved. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. obclient [test]>Notice
To use the
LOAD DATA LOCAL INFILEfeature, you must use OBClient of V2.2.4 or later. If you do not have an OBClient client of a specified version, you can also use MySQL client to connect to the database.In the client, execute the
LOAD DATA LOCAL INFILEstatement to load the local data file.obclient [test]> LOAD DATA LOCAL INFILE '/home/admin/test_data/tbl1.csv' INTO TABLE tbl1 FIELDS TERMINATED BY ',';The return result is as follows:
Query OK, 3 rows affected Records: 3 Deleted: 0 Skipped: 0 Warnings: 0
Import data from an OSS file
Notice
When you use an object storage path, the parameters of the object storage path are separated by the & symbol. Make sure that the parameter values you enter contain only uppercase and lowercase letters, digits, \/-_$+=, and wildcards. If you enter other characters, the settings may fail.
Use the direct(bool, int) hint to enable direct load. The direct load file can be on the OSS.
LOAD DATA /*+ direct(true,1024) parallel(16) */ REMOTE_OSS INFILE 'oss://antsys-oceanbasebackup/backup_rd/xiaotao.ht/lineitem2.tbl?host=***.oss-cdn.***&access_id=***&access_key=***' INTO TABLE tbl1 FIELDS TERMINATED BY ',';
References
- For more information about how to connect to OceanBase Database, see Overview of connection methods.
- For more information about how to use the
LOAD DATAstatement, see Import data by using the LOAD DATA statement. - For more information about how to bypass import data by using the
LOAD DATAstatement, see Bypass import data by using the LOAD DATA statement.