Purpose
You can use this statement to import data from an external file.
Notice
- Do not use the
LOAD DATAstatement on tables with triggers. - To import data from an external file, you must have the
FILEprivilege and configure the following settings:- To load files from an OBServer node, you must configure the system variable secure_file_priv to specify the path that can be accessed during file import or export.
- To load local files on a client, you must add the
--local-infile[=1]option when you start the MySQL client or OceanBase Command-Line Client (OBClient), to enable data loading from the local file system.
OceanBase Database supports the following input files for the LOAD DATA statement:
Files on an OBServer node: You can execute the
LOAD DATA INFILEstatement to load data from files on an OBServer node into database tables.Files in the local file system of the client: You can execute the
LOAD DATA LOCAL INFILEstatement to load data from files in the file system of the local client into database tables.Note
Starting from V4.2.2, OceanBase Database in MySQL mode supports the
LOAD DATA LOCAL INFILEsyntax to load local data files. When you execute theLOAD DATA LOCAL INFILEstatement, the system automatically adds theIGNOREoption to the statement.Files in an object storage service system: You can execute the
LOAD DATA REMOTE_OSS INFILEstatement to load data from files in an object storage service system into database tables.
You can use the LOAD DATA statement to import a CSV text file in the following process:
Parse the file: OceanBase Database reads data from a file based on the file name that you enter and determines whether to perform parallel or serial parsing of data from the input file based on the specified degree of parallelism (DOP).
Distribute the data: OceanBase Database is a distributed database. Data of each partition may be distributed across different OBServer nodes. The
LOAD DATAstatement is used to process the parsed data and determine to which OBServer node the data is sent.Insert the data: After the destination OBServer node receives the data, it executes the
INSERTstatement to insert the data into the corresponding partition.
Syntax
LOAD DATA
[/*+ PARALLEL(N) load_batch_size(M) APPEND */]
|[/*+ PARALLEL(N) direct(bool, int) */]
[REMOTE_OSS | LOCAL] INFILE 'file_name'
[REPLACE | IGNORE]
INTO TABLE table_name
[{FIELDS | COLUMNS}
[TERMINATED BY 'string']
[[OPTIONALLY] ENCLOSED BY 'char']
[ESCAPED BY 'char']
]
[LINES
[STARTING BY 'string']
[TERMINATED BY 'string']
]
[IGNORE number {LINES | ROWS}]
[(column_name_var
[, column_name_var] ...)]
Parameters
| Parameter | Description |
|---|---|
| parallel(N) | The DOP for loading data. The default value of N is 4. |
| load_batch_size(M) | The batch size of each insertion. The default value of M is 100. Recommended value range: [100,1000]. |
| APPEND | A hint for enabling the direct load feature. This feature allows you to directly allocate space and write data to data files. The APPEND hint is equivalent to direct(true, 0), and can collect statistics online like the GATHER_OPTIMIZER_STATISTICS hint does. |
| direct | A hint for enabling the direct load feature. The bool parameter in direct(bool, int) specifies whether the given CSV file is ordered. The value true indicates that the file is ordered. The int parameter specifies the maximum number of erroneous rows allowed. |
| REMOTE_OSS | LOCAL | Optional. Valid values:
|
| file_name | The path and file name of the file to import. You can specify a value in either of the following formats:
NoteWhen you import a file from an object storage service system, make sure that the following conditions are met:
|
| REPLACE | IGNORE | If a unique key conflict occurs, REPLACE specifies to overwrite the conflicting rows, and IGNORE specifies to ignore conflicting rows. The LOAD DATA statement checks whether a table contains duplicate data based on its primary key. If the table does not have a primary key, the REPLACE and IGNORE options are equivalent. If duplicate data exists, the LOAD DATA statement records the incorrect data to a log file by default.
Notice
|
| table_name | The name of the table from which data is imported. Partitioned and non-partitioned tables are supported. |
| FIELDS | COLUMNS | The format of the field.
|
| LINES STARTING BY | The start character of the line. |
| LINES TERMINATED BY | The end character of the line. |
| IGNORES number { LINES | ROWS } | Specifies to ignore the first few lines. LINES indicates the first few lines of the file. ROWS indicates the first few rows of data specified by the field delimiter. By default, fields in the input file are mapped to columns in the destination table one by one. If the input file does not contain all the columns, the missing columns are filled based on the following mappings:
|
| column_name_var | The name of the imported column. |
Considerations
Use wildcards for direct load
In the LOAD DATA statement, the direct keyword is used as a hint to specify to use direct load. When you import multiple data files from the file system of a cluster or from an object storage service system, you can use wildcards to specify the file names. This method is inapplicable when the file source is a client disk.
Import data from the file system of a cluster in direct load mode
Here are some examples:
Use wildcards to match file names:
load data /*+ parallel(20) direct(true, 0) */ infile '/xxx/test.*.csv' replace into table t1 fields terminated by '|';.Use wildcards to match directories:
load data /*+ parallel(20) direct(true, 0) */ infile '/aaa*bb/test.1.csv' replace into table t1 fields terminated by '|';.Use wildcards to match directories and file names:
load data /*+ parallel(20) direct(true, 0) */ infile '/aaa*bb/test.*.csv' replace into table t1 fields terminated by '|';.
Take note of the following considerations:
At least one file must be matched. Otherwise, the error
OB_FILE_NOT_EXISTwill be returned.For the input
load data /*+ parallel(20) direct(true, 0) */ infile '/xxx/test.1*.csv,/xxx/test.6*.csv' replace into table t1 fields terminated by '|';,/xxx/test.1*.csv,/xxx/test.6*.csvwill be matched as a whole. If no file is matched, the errorOB_FILE_NOT_EXISTwill be returned.Only wildcards compatible with the GLOB function in Portable Operating System Interface (POSIX) are supported. For example,
test.6*(6|0).csvandtest.6*({0.csv,6.csv}|.csv)can be added to thelscommand but are not supported by the GLOB function, and therefore the errorOB_FILE_NOT_EXISTwill be returned.
Import data from an object storage service system in direct load mode
Here is an example:
- Use wildcards to match file names:
load data /*+ parallel(20) direct(true, 0) */ remote_oss infile 'oss://xxx/test.*.csv?host=xxx&access_id=xxx&access_key=xxx' replace into table t1 fields terminated by '|';.
- Use wildcards to match file names:
Take note of the following considerations:
You cannot use wildcards to match a directory. For example, if you execute the statement
load data /*+ parallel(20) direct(true, 0) */ remote_oss infile 'oss://aa*bb/test.*.csv?host=xxx&access_id=xxx&access_key=xxx' replace into table t1 fields terminated by '|';, the errorOB_NOT_SUPPORTEDwill be returned.Only the asterisk (
*) and question mark (?) are supported as the wildcards for file names. You can enter other wildcards but these wildcards cannot match any result.
For more information, see Import data through direct load by using the LOAD DATA statement.
Examples
Note
When you use the LOAD DATA statement to import data, you can use \N to represent NULL.
Import data from a file on an OBServer node
Example 1: Import data from a file on an OBServer node.
Set a global security path.
obclient> SET GLOBAL secure_file_priv = "/" Query OK, 0 rows affected obclient> \q ByeNote
secure_file_privis aGLOBALvariable. Therefore, you need to run\qto exit for the settings to take effect.After you reconnect to the database, import data from an external file.
obclient> LOAD DATA INFILE 'test.sql' INTO TABLE t1; Query OK, 0 rows affected
Example 2: Use the APPEND hint to enable direct load.
LOAD DATA /*+ PARALLEL(4) APPEND */
INFILE '/home/admin/a.csv'
INTO TABLE t;
Example 3: Use the direct(bool, int) hint to enable direct load. Import data from a file in an object storage service system in direct load mode.
load data /*+ parallel(1) direct(false,0)*/ remote_oss infile 'oss://antsys-oceanbasebackup/backup_rd/xiaotao.ht/lineitem2.tbl?host=***.oss-cdn.***&access_id=***&access_key=***' into table lineitem fields terminated by '|' enclosed by '' lines starting by '' terminated by '\n';
Import data from a local file on the client
Example 4: Perform the following steps to import data from a local file to a table in OceanBase Database:
Open a terminal or Command Prompt and enter the following command to start the client:
obclient --local-infile -hxxx.xxx.xxx.xxx -P2881 -uroot@mysql001 -p****** -A -DtestThe return result is as follows:
Welcome to the OceanBase. Commands end with ; or \g. Your OceanBase connection id is 3221719526 Server version: OceanBase 4.2.2.0 (r100000022023121309-f536833402c6efe9364d5a4b61830a858ef24d82) (Built Dec 13 2023 09:58:18) Copyright (c) 2000, 2018, OceanBase and/or its affiliates. All rights reserved. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. obclient [test]>Notice
To use the
LOAD DATA LOCAL INFILEfeature, use OBClient V2.2.4 or later. If you do not have OBClient of the required version, you can also use a MySQL client to connect to OceanBase Database.Execute the
LOAD DATA LOCAL INFILEstatement on the client to load data from a local file.obclient [test]> LOAD DATA LOCAL INFILE '/home/admin/test_data/tbl1.csv' INTO TABLE tbl1 FIELDS TERMINATED BY ',';The return result is as follows:
Query OK, 3 rows affected Records: 3 Deleted: 0 Skipped: 0 Warnings: 0
Import data from a file in an object storage service system
Use the direct(bool, int) hint to enable direct load. Import data from a file in an object storage service system in direct load mode.
LOAD DATA /*+ direct(true,1024) parallel(16) */ REMOTE_OSS INFILE 'oss://antsys-oceanbasebackup/backup_rd/xiaotao.ht/lineitem2.tbl?host=***.oss-cdn.***&access_id=***&access_key=***' INTO TABLE tbl1 FIELDS TERMINATED BY ',';
References
- For more information about how to connect to OceanBase Database, see Overview.
- For more information about how to use the
LOAD DATAstatement, see Import data by using the LOAD DATA statement. - For more examples about using the
LOAD DATAstatement for direct load, see Import data through direct load by using the LOAD DATA statement.