Description
The statement is used to import data from an external source.
Notice
- Tables with triggers are not supported.
- To import data from an external file, you need to have the
FILEprivilege and the following settings:- When you import a server file, you must set the system variable secure_file_priv in advance to specify the path that is accessible for importing or exporting files.
- When you import a local file from the client, you must add the
--local-infile[=1]option when you start the MySQL client or OBClient to enable data loading from the local file system.
The LOAD DATA statement in OceanBase Database supports the following input files:
Server (OBServer) node files: files stored on the OBServer nodes of OceanBase Database. You can use the
LOAD DATA INFILEstatement to load data from server node files to database tables.Client (local) files: files stored in the local file system of the client. You can use the
LOAD DATA LOCAL INFILEstatement to load data from client local files to database tables.OSS files: files stored in the OSS file system. You can use the
LOAD DATA REMOTE_OSS INFILEstatement to load data from OSS files to database tables.
The LOAD DATA statement currently supports importing CSV text files. The entire import process can be divided into the following phases:
Parse the file: OceanBase Database reads data from the file based on the input filename and determines whether to parse the input file in parallel or in series based on the specified degree of parallelism.
Distribute data: As OceanBase Database is a distributed database, the data in different partitions may be distributed on different OBServer nodes. The
LOAD DATAstatement calculates the data parsed from the file and determines which OBServer node the data should be sent to.Insert data: After the data is received by the target OBServer node, the
INSERTstatement is executed locally to insert the data into the corresponding partition.
To import data from an external file, you must have the FILE privilege. You can execute the GRANT FILE ON *.* TO $user_name; statement to grant the privilege to the user who needs to execute the LOAD DATA statement, for example, $user_name.
Syntax
LOAD DATA
[/*+ PARALLEL(N) [load_batch_size(M)] [APPEND | direct(bool, int, [load_mode])] */]
[REMOTE_OSS | LOCAL] INFILE 'file_name'
INTO TABLE table_name
[COMPRESSION [=] {AUTO|NONE|GZIP|DEFLATE|ZSTD}]
[{FIELDS | COLUMNS}
[TERMINATED BY 'string']
[[OPTIONALLY] ENCLOSED BY 'char']
[ESCAPED BY 'char']
]
[LINES
[STARTING BY 'string']
[TERMINATED BY 'string']
]
[IGNORE number {LINES | ROWS}]
[(column_name_var
[, column_name_var] ...)]
load_mode:
'full'
| 'inc_replace'
Parameters
| Parameter | Description |
|---|---|
| parallel(N) | The degree of parallelism for loading data. The default value is 4. |
| load_batch_size(M) | The batch size for inserting data. The default value of M is 100. We recommend that you set M in the range of [100,1000]. |
| APPEND | direct() | The hint to enable direct load.
NoticeDuring a direct load task, we recommend that you do not upgrade OceanBase Database, because this may cause the direct load task to fail.
LOAD DATA statement for direct load, see Import data or files by using the LOAD DATA statement. |
| REMOTE_OSS | LOCAL | Optional.
|
| file_name | The path and name of the input file. The format of file_name is described as follows:
NoteWhen you import a file from OSS, make sure of the following:
|
| table_name | The name of the table to which the imported data is to be loaded. The table can be a partitioned or non-partitioned table. |
| COMPRESSION | The file compression format. Valid values:
|
| FIELDS | COLUMNS | The format of fields.
|
| LINES STARTING BY | The starting delimiter of lines. |
| LINES TERMINATED BY | The ending delimiter of lines. |
| IGNORE number { LINES | ROWS } | The system will ignore the first number lines. The keyword LINES specifies to ignore the first number lines in the file, and the keyword ROWS specifies to ignore the first number rows of data separated by the field delimiter. By default, the system matches the input file's fields with the table's columns one by one. If the input file does not contain all the columns, the missing columns will be filled with default values, which are described as follows:
|
| column_name_var | The name of the column to be imported. |
Wildcard rules for importing multiple files in parallel
To facilitate the import of multiple files, the wildcard feature is introduced for server-side and OSS file imports. This feature is not applicable to client-side file imports.
Use of wildcards on the server
Matching rules:
Matching filenames:
load data /*+ parallel(20) direct(true, 0) */ infile '/xxx/test.*.csv' replace into table t1 fields terminated by '|';Matching directories:
load data /*+ parallel(20) direct(true, 0) */ infile '/aaa*bb/test.1.csv' replace into table t1 fields terminated by '|';Matching both directories and filenames:
load data /*+ parallel(20) direct(true, 0) */ infile '/aaa*bb/test.*.csv' replace into table t1 fields terminated by '|';
Considerations:
At least one matching file must exist. Otherwise, error code 4027 is returned.
For the input
load data /*+ parallel(20) direct(true, 0) */ infile '/xxx/test.1*.csv,/xxx/test.6*.csv' replace into table t1 fields terminated by '|';,/xxx/test.1*.csv,/xxx/test.6*.csvwill be considered as a whole for matching. If no match is found, error code 4027 will be returned.Only wildcard characters supported by the POSIX GLOB function can be used. For example,
test.6*(6|0).csvandtest.6*({0.csv,6.csv}|.csv)can be used, but they cannot be matched by the GLOB function. In this case, error code 4027 will be returned, even though these wildcard characters can be found by using thelscommand.
Use of wildcards in cloud object storage services (OSS)
Matching rules:
Matching filenames:
load data /*+ parallel(20) direct(true, 0) */ remote_oss infile 'oss://xxx/test.*.csv?host=xxx&access_id=xxx&access_key=xxx' replace into table t1 fields terminated by '|';Considerations:
Directory matching is not supported. For example,
load data /*+ parallel(20) direct(true, 0) */ remote_oss infile 'oss://aa*bb/test.*.csv?host=xxx&access_id=xxx&access_key=xxx' replace into table t1 fields terminated by '|';will returnOB_NOT_SUPPORTED.Only
*and?are supported for filename wildcards. Other wildcard characters, although allowed, cannot match any results.
Example
Import data from files on the server (OBServer node)
Example 1: Import data from a server file.
Set the global security path.
Notice
For security reasons, when you set the system variable
secure_file_priv, you can connect to the database only through a local socket to execute the SQL statement that modifies the global variable. For more information, see secure_file_priv.obclient> SET GLOBAL secure_file_priv = "/";Log out from the database.
Note
After you set the
secure_file_privvariable, you must execute\qto log out from the database for the change to take effect.obclinet> \qThe return result is as follows:
ByeReconnect to the database and use the
LOAD DATAstatement to import data.Perform regular import.
obclient> LOAD DATA INFILE '/home/admin/test.csv' INTO TABLE t1;Use the
APPENDhint to enable direct load.LOAD DATA /*+ PARALLEL(4) APPEND */ INFILE '/home/admin/test.csv' INTO TABLE t1;
Example 2: Use the APPEND hint to enable direct load.
LOAD DATA /*+ PARALLEL(4) APPEND */
INFILE '/home/admin/a.csv'
INTO TABLE t;
Import data from a local file on the client
Example 1: Import data from a local file to a table in OceanBase Database.
Open the terminal or command prompt and enter the following command to start the client.
obclient --local-infile -hxxx.xxx.xxx.xxx -P2881 -usys@oracle001 -p******The return result is as follows:
Welcome to the OceanBase. Commands end with ; or \g. Your OceanBase connection id is 3221548006 Server version: OceanBase 4.2.2.0 (r100000032024010510-75c47d4be18a399e13c5309de1a81da5caf4e7c0) (Built Jan 5 2024 10:17:55) Copyright (c) 2000, 2018, OceanBase and/or its affiliates. All rights reserved. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. obclient [SYS]>Notice
To use the
LOAD DATA LOCAL INFILEfeature, you must use OBClient V2.2.4 or later.In the client, execute the
LOAD DATA LOCAL INFILEstatement to load the local data file.obclient [SYS]> LOAD DATA LOCAL INFILE '/home/admin/test_data/tbl1.csv' INTO TABLE tbl1 FIELDS TERMINATED BY ',';The return result is as follows:
Query OK, 3 rows affected Records: 3 Deleted: 0 Skipped: 0 Warnings: 0
Example 2: Directly import a compressed file by setting the COMPRESSION parameter.
LOAD DATA LOCAL INFILE '/your/file/lineitem.tbl.gz'
INTO TABLE lineitem
COMPRESSION GZIP
FIELDS TERMINATED BY '|';
Import data from an OSS file
Notice
When you use an object storage path, the parameters in the object storage path are separated by & characters. Make sure that the values of the parameters contain only uppercase and lowercase letters, numbers, \/-_$+=, and wildcard characters. If you enter characters other than those mentioned, the settings may fail.
Example 1: Use the direct(bool, int) hint to enable direct load for a file stored in OSS.
LOAD DATA /*+ direct(true,1024) parallel(16) */ REMOTE_OSS INFILE 'oss://antsys-oceanbasebackup/backup_rd/xiaotao.ht/lineitem2.tbl?host=***.oss-cdn.***&access_id=***&access_key=***' INTO TABLE tbl1 FIELDS TERMINATED BY ',';
References
- For more information about how to use the
LOAD DATAstatement, see Import data by using the LOAD DATA statement. - For more information about how to perform direct load by using the
LOAD DATAstatement, see Import data by using the LOAD DATA statement