You can use the direct hint in the LOAD DATA statement to enable direct load.
Limitations
- You cannot execute two statements to concurrently write data to the same table. This is because the table is locked during data import and only read operations are supported.
- This feature is not supported for tables with triggers.
- This feature is not supported for tables with generated columns. Some indexes generate hidden generated columns, for example, KEY
idx_c2(c2(16)) GLOBAL). - This feature is not supported if the data in a single row exceeds 2 MB in size.
- Liboblog and flashback queries are not supported.
- This feature supports loading LOBs but the loading performance is poor. Therefore, LOBs are still loaded to the corresponding paths by using transactions.
- Direct load is a DDL operation and cannot be executed in a multi-row transaction that contains multiple operations.
Considerations
OceanBase Database uses parallel processing to optimize the data import rate of the LOAD DATA statement. In parallel processing, data is split into multiple subtasks for parallel execution. Each subtask is considered an independent transaction. The execution sequence of the subtasks is not fixed. For a table without a primary key, data may be written in a sequence different from that in the original file.
Syntax
LOAD DATA /*+ direct(need_sort,max_error) parallel(N) */ [REMOTE_OSS | LOCAL] INFILE 'file_name' ...
For more information about the syntax of the LOAD DATA statement, see LOAD DATA.
The following table describes the parameters in the syntax.
| Parameter | Description |
|---|---|
| direct | Specifies to enable direct load. |
| need_sort | Specifies whether OceanBase Database needs to sort the data. The value is of the bool type. Valid values:
|
| max_error | The maximum number of erroneous rows allowed. The value is of the INT type. If this value is exceeded, the LOAD DATA statement returns an error. |
| parallel(N) | Required. The degree of parallelism (DOP) for loading data. The value is an integer greater than 1. |
| REMOTE_OSS | LOCAL | Optional. Valid values:
|
Note
The following object storage services are supported: Alibaba Cloud Object Storage Service (OSS, denoted as oss://), Tencent Cloud Object Storage (COS, denoted as cos://), and Amazon Simple Storage Service (S3, denoted as s3://). The same applies throughout subsequent content in this topic.
Use wildcards to import data from multiple files
When you import multiple data files from the file system of a cluster or from an object storage service system, you can use wildcards to specify the file names. This method is inapplicable when the file source is a client disk.
Import data from the file system of a cluster in direct load mode
Here are some examples:
Use wildcards to match file names:
load data /*+ parallel(20) direct(true, 0) */ infile '/xxx/test.*.csv' replace into table t1 fields terminated by '|';.Use wildcards to match directories:
load data /*+ parallel(20) direct(true, 0) */ infile '/aaa*bb/test.1.csv' replace into table t1 fields terminated by '|';.Use wildcards to match directories and file names:
load data /*+ parallel(20) direct(true, 0) */ infile '/aaa*bb/test.*.csv' replace into table t1 fields terminated by '|';.
Take note of the following considerations:
At least one file must be matched. Otherwise, the error code 4027 will be returned.
For the input
load data /*+ parallel(20) direct(true, 0) */ infile '/xxx/test.1*.csv,/xxx/test.6*.csv' replace into table t1 fields terminated by '|';,/xxx/test.1*.csv,/xxx/test.6*.csvwill be matched as a whole. If no file is matched, the error code 4027 will be returned.Only wildcards compatible with the GLOB function in Portable Operating System Interface (POSIX) are supported. For example,
test.6*(6|0).csvandtest.6*({0.csv,6.csv}|.csv)can be added to thelscommand but are not supported by the GLOB function, and therefore the error code 4027 will be returned.
Import data from an object storage service system in direct load mode
Taking OSS as an example:
- Use wildcards to match file names:
load data /*+ parallel(20) direct(true, 0) */ remote_oss infile 'oss://xxx/test.*.csv?host=xxx&access_id=xxx&access_key=xxx' replace into table t1 fields terminated by '|';.
- Use wildcards to match file names:
Take note of the following considerations:
You cannot use wildcards to match directories. For example, if you execute the statement
load data /*+ parallel(20) direct(true, 0) */ remote_oss infile 'oss://aa*bb/test.*.csv?host=xxx&access_id=xxx&access_key=xxx' replace into table t1 fields terminated by '|';, the errorOB_NOT_SUPPORTEDwill be returned.Only the asterisk (
*) and question mark (?) are supported as the wildcards for file names. You can enter other wildcards but these wildcards cannot match any result.
Examples
Note
The following example shows how to import data from a file on a server. In OceanBase Database, you can also use the LOCAL INFILE clause in the LOAD DATA statement to import data from a local file in direct load mode. For more information about how to use LOAD DATA LOCAL INFILE, see Use the LOAD DATA statement to import data.
Log in to the server where the target OBServer node resides and create test data in the
/home/admindirectory.Note
In OceanBase Database, the
LOAD DATAstatement can import data only from a local input file on an OBServer node. Therefore, you must copy the file to import to an OBServer node before the import.[xxx@xxx /home/admin]# ssh admin@10.10.10.1 [admin@xxx /home/admin]# vi tbl1.csv 1,11 2,22 3,33Specify the path of the file to import.
Set the system variable
secure_file_privto specify the path that can be accessed for file import or export.Notice
For security reasons, you can only connect to the database through a local socket to execute the SQL statement for setting
secure_file_priv. For more information, see secure_file_priv.Log in to the server where the OBServer node to connect to resides.
[xxx@xxx /home/admin]# ssh admin@10.10.10.1Run the following command to connect to the
mysql001tenant through a local Unix socket:obclient -S /home/admin/oceanbase/run/sql.sock -uroot@mysql001 -p******Specify the path as
/home/admin.obclient [(none)]> SET GLOBAL secure_file_priv = "/home/admin"; Query OK, 0 rows affected
After you reconnect to the database, execute the
LOAD /*+ APPEND */ DATAstatement to import data.obclient [test]> CREATE TABLE tbl1(col1 INT PRIMARY KEY,col2 INT); Query OK, 0 rows affected obclient [test]> SELECT * FROM tbl1; Empty set obclient [test]> LOAD DATA /*+ direct(true,1024) parallel(16) */ INFILE '/home/admin/tbl1.csv' INTO TABLE tbl1 FIELDS TERMINATED BY ','; Query OK, 3 rows affected Records: 3 Deleted: 0 Skipped: 0 Warnings: 0 obclient [test]> SELECT * FROM tbl1; +------+------+ | col1 | col2 | +------+------+ | 1 | 11 | | 2 | 22 | | 3 | 33 | +------+------+ 3 rows in set
References
For more information about how to use the
LOAD DATAstatement, see Use the LOAD DATA statement to import data.For more information about how to connect to a database, see Overview.
For more information about how to drop a table, see Drop a table.