You can use the append/direct() hint in the LOAD DATA statement to specify whether to use direct load.
Full bypass import
Limitations
- You cannot execute two write statements (that is, write to one table) at the same time during the import. This is because the table is locked during the import, and only read operations are supported.
- The statement is not supported in triggers.
- Tables with generated columns (some indexes generate hidden generated columns, for example, KEY
idx_c2(c2(16)) GLOBAL) are not supported. - Data imports of more than 2 MB per row are not supported.
- Liboblog and flashback queries are not supported.
- LOAD DATA is supported in multi-row transactions, and each transaction is actively committed during execution.
Considerations
OceanBase Database optimizes the data import rate of the LOAD DATA statement by using parallel processing technology. The statement divides data into multiple subtasks for parallel execution. Each subtask is an independent transaction, and the execution order is not fixed. Therefore, for a table without a primary key, the data write order may be different from the order in the original file.
Syntax
LOAD DATA /*+ [APPEND | direct(need_sort,max_error,'full')] parallel(N) */ [REMOTE_OSS | LOCAL] INFILE 'file_name' ...
For more information about the syntax of the LOAD DATA statement, see LOAD DATA.
Parameters:
| Parameter | Description |
|---|---|
| APPEND | direct() | The hint to enable direct load.
|
| parallel(N) | The parallelism of data loading. This parameter is required, and the value must be a positive integer greater than 1. |
| REMOTE_OSS | LOCAL | Optional.
|
Note
In this context, OSS refers to Object Storage Service (OSS), Cloud Object Storage (COS), Simple Storage Service (S3), and other object storage services.
Support for wildcard-based import of multiple files
To facilitate the import of multiple files, the file wildcard feature is introduced, applicable to file sources of types server_disk and OSS, but not to client_disk.
Import data from the cluster's file system (server_disk) using wildcards
server_disk examples:
Matching filenames:
load data /*+ parallel(20) direct(true, 0) */ infile '/xxx/test.*.csv' replace into table t1 fields terminated by '|';Matching directories:
load data /*+ parallel(20) direct(true, 0) */ infile '/aaa*bb/test.1.csv' replace into table t1 fields terminated by '|';Matching both directories and filenames:
load data /*+ parallel(20) direct(true, 0) */ infile '/aaa*bb/test.*.csv' replace into table t1 fields terminated by '|';
server_disk considerations:
At least one matching file must exist. Otherwise, error code 4027 is returned.
For the input
load data /*+ parallel(20) direct(true, 0) */ infile '/xxx/test.1*.csv,/xxx/test.6*.csv' replace into table t1 fields terminated by '|';,/xxx/test.1*.csv,/xxx/test.6*.csvis considered an overall match. If no match is found, error code 4027 is returned.Only wildcard characters supported by the POSIX GLOB function can be used. For example,
test.6*(6|0).csvandtest.6*({0.csv,6.csv}|.csv)can be used, but if they cannot be matched by the GLOB function, error code 4027 will be returned. Although these wildcard characters can be used in thelscommand, they will not match any results.
Import data from Alibaba Cloud Object Storage Service (OSS) using wildcards
OSS examples:
- Matching filenames:
load data /*+ parallel(20) direct(true, 0) */ remote_oss infile 'oss://xxx/test.*.csv?host=xxx&access_id=xxx&access_key=xxx' replace into table t1 fields terminated by '|';
- Matching filenames:
OSS considerations:
Directory matching is not supported. For example,
load data /*+ parallel(20) direct(true, 0) */ remote_oss infile 'oss://aa*bb/test.*.csv?host=xxx&access_id=xxx&access_key=xxx' replace into table t1 fields terminated by '|';will returnOB_NOT_SUPPORTED.The wildcard characters supported for filenames are only
*and?. Other wildcard characters, although allowed, will not match any results.
Examples
Note
The following example describes how to import data from a server file. OceanBase Database supports importing data to a local file by using the LOAD DATA statement with the LOCAL INFILE option. For more information about LOAD DATA LOCAL INFILE, see Import data by using the LOAD DATA statement
Log in to the server where the OBServer node to connect to resides and create test data in the
/home/admindirectory.Note
The
LOAD DATAstatement in OceanBase Database can only import input files stored locally on the OBServer node. Therefore, you need to copy the file to be imported to an OBServer node before the import.[xxx@xxx /home/admin]# ssh admin@10.10.10.1 [admin@xxx /home/admin]# vi tbl1.csv 1,11 2,22 3,33Set the import file path.
Set the system variable
secure_file_privto specify the path where the file to be imported or exported can be accessed.Notice
For security reasons, you can only connect to the database locally to execute the SQL statement that modifies the global
secure_file_privvariable through a local socket when you set this variable. For more information, see secure_file_priv.Log in to the server where the OBServer node to connect to resides.
[xxx@xxx /home/admin]# ssh admin@10.10.10.1Connect to the
mysql001tenant through a local Unix socket.obclient -S /home/admin/oceanbase/run/sql.sock -uroot@mysql001 -p******Set the import path to
/home/admin.obclient [(none)]> SET GLOBAL secure_file_priv = "/home/admin"; Query OK, 0 rows affected
Reconnect to the database and execute the
LOAD /*+ APPEND */ DATAstatement to import data.obclient [test]> CREATE TABLE tbl1(col1 INT PRIMARY KEY,col2 INT); Query OK, 0 rows affected obclient [test]> SELECT * FROM tbl1; Empty set obclient [test]> LOAD DATA /*+ direct(true,1024,'full') parallel(16) */ INFILE '/home/admin/tbl1.csv' INTO TABLE tbl1 FIELDS TERMINATED BY ','; Query OK, 3 rows affected Records: 3 Deleted: 0 Skipped: 0 Warnings: 0 obclient [test]> SELECT * FROM tbl1; +------+------+ | col1 | col2 | +------+------+ | 1 | 11 | | 2 | 22 | | 3 | 33 | +------+------+ 3 rows in set
Incremental direct load
Notice
The incremental direct load feature is currently in the experimental stage. To avoid impacting system stability, do not use it in a production environment.
Limitations
- Tables with indexes (excluding primary keys) are not supported.
- Tables with foreign keys are not supported.
- Tables or data that contain LOB data whose size exceeds the value of
OB_INROW_THRESHOLDare not supported.
Considerations
Data imported by using incremental direct load triggers a minor compaction. If a small amount of data can be imported within minutes, we recommend that you do not use incremental direct load.
No more than 8,000 partitions can be loaded at a time for incremental direct load.
Syntax
LOAD DATA /*+ [APPEND | direct(need_sort,max_error,'inc_replace')] parallel(N) */ [REMOTE_OSS | LOCAL] INFILE 'file_name' ...
For more information about the syntax of the LOAD DATA statement, see LOAD DATA.
Parameters:
| Parameter | Description |
|---|---|
| APPEND | direct() | A hint to enable direct load.
NoticeIf you set |
| parallel(N) | The degree of parallelism for loading data. This parameter is required. The value is a positive integer greater than 1. |
| REMOTE_OSS | LOCAL | Optional.
|
Note
Like full direct load, incremental direct load by using the LOAD DATA statement supports multi-file import by using wildcard characters.
Examples
The procedure for enabling incremental direct load by using the LOAD DATA statement is the same as that for enabling full direct load, except that you need to replace the value of the full field with inc_replace.
obclient [test]> LOAD DATA /*+ direct(true,1024,'inc_replace') parallel(16) */ INFILE '/home/admin/tbl1.csv' INTO TABLE tbl1 FIELDS TERMINATED BY ',';
The execution result is as follows:
Query OK, 3 rows affected
Records: 3 Deleted: 0 Skipped: 0 Warnings: 0
References
For more information about the
LOAD DATAstatement, see Import data by using the LOAD DATA statement.For more information about how to connect to a database, see Overview of connection methods.
For more information about how to drop a table, see Drop a table.