In the LOAD DATA statement, the direct keyword is used as a hint to indicate whether bypass import is used.
Note
- Columnstore tables support bypass import.
LOBcolumns support bypass import.
Limitations
- Only one
LOAD DATAstatement can be executed on a table at a time because the statement will lock the table. - The
LOAD DATAstatement is not supported in triggers. - The
LOAD DATAstatement is not supported in a multi-row transaction that contains multiple operations.
Considerations
OceanBase Database uses parallel processing to optimize the data import rate of the LOAD DATA statement. In parallel processing, data is split into multiple subtasks for parallel execution. Each subtask is considered an independent transaction. The execution sequence of the subtasks is not fixed. Therefore:
- Global atomicity cannot be ensured during the data import.
- For a table without a primary key, data may be written in a sequence different from that in the original file.
Syntax
The syntax for importing data from a single file is as follows:
LOAD DATA /*+ direct(need_sort,max_error) parallel(N) */ [REMOTE_OSS | LOCAL] INFILE 'file_name' ...The syntax for importing data from multiple files is as follows:
LOAD DATA /*+ direct(need_sort,max_error) parallel(N) */ [LOCAL] INFILE 'file_name1,file_name2,...' INTO TABLE tbl_name FIELDS [TERMINATED BY 'string'] ...
For more information about the syntax of the LOAD DATA statement, see LOAD DATA.
The parameters are described as follows:
| Parameter | Description |
|---|---|
| direct | Specifies to use bypass import. |
| REMOTE_OSS | LOCAL | This parameter is optional.
|
| need_sort | Specifies whether OceanBase Database needs to sort the data. The value is of the Boolean type.
|
| max_error | The maximum number of erroneous rows allowed. The value is of the INT type. If this value is exceeded, the LOAD DATA statement returns an error. |
| parallel(N) | The degree of parallelism (DOP) for loading data. The default value of N is 4. |
Use wildcards to import data from multiple files
When you import data in bypass mode, you can specify only one file in Alibaba Cloud Object Storage Service (OSS) as the file source. To import multiple files, you must execute the LOAD DATA statement repeatedly. Data files stored in the file system of a cluster must be separated with commas (,). When you import multiple data files from the file system of a cluster or from Alibaba Cloud OSS, you can use wildcards to specify the file names. This method is inapplicable when the file source is a client disk.
Import data from the file system of a cluster in bypass mode.
Here are some examples:
Use wildcards to match file names:
load data /*+ parallel(20) direct(true, 0) */ infile '/xxx/test.*.csv' replace into table t1 fields terminated by '|';Use wildcards to match a directory:
load data /*+ parallel(20) direct(true, 0) */ infile '/aaa*bb/test.1.csv' replace into table t1 fields terminated by '|';Use wildcards to match a directory and file names:
load data /*+ parallel(20) direct(true, 0) */ infile '/aaa*bb/test.*.csv' replace into table t1 fields terminated by '|';
Take note of the following considerations:
At least one file must be matched. Otherwise, the error
OB_FILE_NOT_EXISTwill be returned.For
load data /*+ parallel(20) direct(true, 0) */ infile '/xxx/test.1*.csv,/xxx/test.6*.csv' replace into table t1 fields terminated by '|';,/xxx/test.1*.csv,/xxx/test.6*.csvis matched as a whole. If no file or directory is matched, the errorOB_FILE_NOT_EXISTwill be returned.Only wildcards compatible with the GLOB function in Portable Operating System Interface (POSIX) are supported. For example,
test.6*(6|0).csvandtest.6*({0.csv,6.csv}|.csv)can be added to thelscommand but are not supported by the GLOB function, and therefore the errorOB_FILE_NOT_EXISTwill be returned.
Import data from Alibaba Cloud OSS in bypass mode:
Here is an example:
- Use wildcards to match file names:
load data /*+ parallel(20) direct(true, 0) */ remote_oss infile 'oss://xxx/test.*.csv?host=xxx&access_id=xxx&access_key=xxx' replace into table t1 fields terminated by '|';
- Use wildcards to match file names:
Take note of the following considerations:
You cannot use wildcards to match a directory. For example, if you execute the statement
load data /*+ parallel(20) direct(true, 0) */ remote_oss infile 'oss://aa*bb/test.*.csv?host=xxx&access_id=xxx&access_key=xxx' replace into table t1 fields terminated by '|';, the errorOB_NOT_SUPPORTEDwill be returned.Only the asterisk (
*) and question mark (?) are supported as the wildcards for file names. You can enter other wildcards but these wildcards cannot match any result.
Examples
Note
The following example shows how to import data from a server file. In OceanBase Database, you can also use the LOCAL INFILE clause in the LOAD DATA statement to import local files in bypass mode. For an example of the LOAD DATA LOCAL INFILE statement, see Import data by using the LOAD DATA statement.
Log on to the server where the target OBServer node resides and create test data in the
/home/admindirectory.Note
In OceanBase Database, the
LOAD DATAstatement can load only local input files on OBServer nodes. Therefore, you must copy the to-be-imported files to an OBServer node before the import.[xxx@xxx /home/admin]# ssh admin@10.10.10.1 [admin@xxx /home/admin]# vi tbl1.csv 1,11 2,22 3,33Set the path where the file to be imported is located.
Set the system variable
secure_file_privto specify the path that can be accessed for file import or export.Notice
For security reasons, when you set the system variable
secure_file_priv, you can connect to the database only through a local socket to execute the SQL statement that modifies the global variable. For more information, see secure_file_priv.Log on to the server where the OBServer node to connect to resides.
[xxx@xxx /home/admin]# ssh admin@10.10.10.1Execute the following statement to connect to the
mysql001tenant through a local Unix Socket:obclient -S /home/admin/oceanbase/run/sql.sock -uroot@mysql001 -p******Set the import path to
/home/admin.obclient [(none)]> SET GLOBAL secure_file_priv = "/home/admin"; Query OK, 0 rows affected
After you reconnect to the database, execute the
LOAD /*+ APPEND */ DATAstatement to import data.obclient [test]> CREATE TABLE tbl1(col1 INT PRIMARY KEY,col2 INT); Query OK, 0 rows affected obclient [test]> SELECT * FROM tbl1; Empty set obclient [test]> LOAD DATA /*+ direct(true,1024) parallel(16) */ INFILE '/home/admin/tbl1.csv' INTO TABLE tbl1 FIELDS TERMINATED BY ','; Query OK, 3 rows affected Records: 3 Deleted: 0 Skipped: 0 Warnings: 0 obclient [test]> SELECT * FROM tbl1; +------+------+ | col1 | col2 | +------+------+ | 1 | 11 | | 2 | 22 | | 3 | 33 | +------+------+ 3 rows in set
References
For more information about how to use the
LOAD DATAstatement, see Import data by using the LOAD DATA statement.For more information about how to connect to a database, see Overview.
For more information about how to drop a table, see Drop a table.