This topic describes how to use the LOAD DATA and INSERT INTO SELECT statements for full direct load.
Considerations
Take note of the following considerations before you use the full direct load feature:
OceanBase Database supports direct load of large object (LOB) data in V4.3.0 and later.
During direct load, the table involved is locked and no data can be written into the table. In other words, the table is read-only during the whole direct load process.
Full direct load applies to scenarios where data is imported to a large table for the first time, data sized 10 GB to 1 TB is to be migrated, or both the CPU and memory resources are insufficient. This is because direct load has a short execution path and can reduce the CPU overhead.
The
LOAD DATAstatement can be executed in a multi-row transaction. However, when theLOAD DATAstatement, which is a DDL operation, is executed, the previous transaction is automatically committed.When you use the
INSERT INTO SELECTstatement to import data, only Parallel Data Manipulation Language (PDML) data can be imported in direct load mode.
Import data in direct load mode by using the LOAD DATA statement
You can use the LOAD DATA statement in combination with the append/direct() hint to import data in direct load mode.
Notice
OceanBase Database uses parallel processing to increase the data import speed of the LOAD DATA statement. In parallel processing, data is split into multiple subtasks for parallel execution. Each subtask is considered an independent transaction. The execution sequence of the subtasks is not fixed. For a table without a primary key, data may be written in a sequence different from that in the original file.
Limitations
- You cannot execute two statements to concurrently write data to the same table. This is because the table is locked during data import and only read operations are supported.
- Full direct load is not supported for tables with triggers.
- Full direct load is not supported for tables with generated columns. Some indexes generate hidden generated columns, for example, KEY
idx_c2(c2(16)) GLOBAL). - Full direct load is not supported if the data in a single row exceeds 2 MB in size.
- Liboblog and flashback queries are not supported.
Syntax
LOAD DATA /*+ [APPEND | DIRECT(need_sort,max_error,'full')] parallel(N) */ [REMOTE_OSS | LOCAL] INFILE 'file_name' INTO TABLE table_name [COMPRESSION]...
For more information about the syntax of the LOAD DATA statement, see LOAD DATA.
The following table describes the parameters in the syntax.
| Parameter | Description |
|---|---|
| APPEND | DIRECT() | Specifies to enable direct load.
|
| parallel(N) | Required. The degree of parallelism (DOP) for loading data. The default value of N is 4. |
| REMOTE_OSS | LOCAL | Optional. Valid values:
|
| file_name | The path and file name of the file to import. You can specify a value in either of the following formats:
NoteWhen you import a file from OSS, make sure that the following conditions are met:
|
| table_name | The name of the table into which data is imported. You can specify any number of columns for the table. |
| COMPRESSION | The compression format of the input file. Valid values:
|
Examples
Note
The following example shows how to import data from a file on a server. In OceanBase Database, you can also use the LOCAL INFILE clause in the LOAD DATA statement to import data from a local file in direct load mode. For more information about how to use LOAD DATA LOCAL INFILE, see Import data by using the LOAD DATA statement.
Log in to the server where the target OBServer node resides and create test data in the
/home/admindirectory.Note
In OceanBase Database, the
LOAD DATAstatement can import data only from a local input file on an OBServer node. Therefore, you must copy the file to import to an OBServer node before the import.Log in to the server where the OBServer node resides.
[xxx@xxx /home/admin]# ssh admin@10.10.10.1Create a test data file named
tbl1.csv.[admin@xxx /home/admin]# vi tbl1.csv 1.11 2.22 3.33
Specify the path of the file to import.
Set the system variable
secure_file_privto specify the path that can be accessed for file import or export.Notice
For security reasons, you can only connect to the database through a local socket to execute the SQL statement for setting
secure_file_priv. For more information, see secure_file_priv.Log in to the server where the target OBServer node resides.
[xxx@xxx /home/admin]# ssh admin@10.10.10.1Connect to the
mysql001tenant through a local Unix socket.obclient -S /home/admin/oceanbase/run/sql.sock -uroot@mysql001 -p******Specify the path as
/home/admin.obclient [(none)]> SET GLOBAL secure_file_priv = "/home/admin"; Query OK, 0 rows affected
After you reconnect to the database, execute the
LOAD /*+ DIRECT */ DATAstatement to import data.Create a table named
tbl2.obclient [test]> CREATE TABLE tbl2(col1 INT PRIMARY KEY,col2 INT); Query OK, 0 rows affectedQuery whether the
tbl2table contains data. At this time, the query result is an empty set.obclient [test]> SELECT * FROM tbl2; Empty setImport the data in the
tbl1.csvfile to thetbl2table in direct load mode.Specify to import data to all columns of the
tbl2table.obclient [test]> LOAD DATA /*+ direct(true,1024,'full') parallel(16) */ INFILE '/home/admin/tbl1.csv' INTO TABLE tbl2 FIELDS TERMINATED BY ','; Query OK, 3 rows affected Records: 3 Deleted: 0 Skipped: 0 Warnings: 0Specify to import data to any columns of the
tbl2table. For example, you can specify to import data to thecol1andcol2columns.obclient [test]> LOAD DATA /*+ direct(true,1024,'full') parallel(16) */ INFILE '/home/admin/tbl1.csv' INTO TABLE tbl2 FIELDS TERMINATED BY ','(col1,col2); Query OK, 3 rows affected Records: 3 Deleted: 0 Skipped: 0 Warnings: 0
Verify whether data has been imported to the
tbl2table.obclient [test]> SELECT * FROM tbl2;The query result is as follows:
+------+------+ | col1 | col2 | +------+------+ | 1 | 11 | | 2 | 22 | | 3 | 33 | +------+------+ 3 rows in setThe query result shows that the data has been imported to the
tbl2table.
Log in to the server where the target OBServer node resides and create a test table named
tbl1in the/home/admindirectory.Note
In OceanBase Database, the
LOAD DATAstatement can import data only from a local input file on an OBServer node. Therefore, you must copy the file to import to an OBServer node before the import.Log in to the server where the OBServer node resides.
[xxx@xxx /home/admin]# ssh admin@10.10.10.1Create a test data file named
tbl1.csv.[admin@xxx /home/admin]# vi tbl1.csv 1.11 2.22 3.33
Specify the path of the file to import.
Set the system variable
secure_file_privto specify the path that can be accessed for file import or export.Notice
For security reasons, you can only connect to the database through a local socket to execute the SQL statement for setting
secure_file_priv. For more information, see secure_file_priv.Log in to the server where the target OBServer node resides.
[xxx@xxx /home/admin]# ssh admin@10.10.10.1Connect to the
oracle001tenant through a local Unix socket.obclient -S /home/admin/oceanbase/run/sql.sock -usys@oracle001 -p******Specify the path as
/home/admin.obclient [(none)]> SET GLOBAL secure_file_priv = "/home/admin"; Query OK, 0 rows affected
After you reconnect to the database, execute the
LOAD /*+ DIRECT */ DATAstatement to import data.Create a table named
tbl2.obclient [test]> CREATE TABLE tbl2(col1 INT PRIMARY KEY,col2 INT); Query OK, 0 rows affectedQuery whether the
tbl2table contains data. At this time, the query result is an empty set.obclient [test]> SELECT * FROM tbl2; Empty setImport the data in the
tbl1.csvfile to thetbl2table in direct load mode.Specify to import data to all columns of the
tbl2table.obclient [test]> LOAD DATA /*+ direct(true,1024,'full') parallel(16) */ INFILE '/home/admin/tbl1.csv' INTO TABLE tbl2 FIELDS TERMINATED BY ','; Query OK, 3 rows affected Records: 3 Deleted: 0 Skipped: 0 Warnings: 0Specify to import data to any columns of the
tbl2table. For example, you can specify to import data to thecol1andcol2columns.obclient [test]> LOAD DATA /*+ direct(true,1024,'full') parallel(16) */ INFILE '/home/admin/tbl1.csv' INTO TABLE tbl2 FIELDS TERMINATED BY ','(col1,col2); Query OK, 3 rows affected Records: 3 Deleted: 0 Skipped: 0 Warnings: 0
Verify whether data has been imported to the
tbl2table.obclient [test]> SELECT * FROM tbl2;The query result is as follows:
+------+------+ | col1 | col2 | +------+------+ | 1 | 11 | | 2 | 22 | | 3 | 33 | +------+------+ 3 rows in setThe query result shows that the data has been imported to the
tbl2table.
Import data in direct load mode by using the INSERT INTO SELECT statement
You can use the INSERT INTO SELECT statement in combination with the append/direct() hint and the enable_parallel_dml parameter to import data in direct load mode.
Limitations
- Only PDML data can be imported in direct load mode. For more information about PDML, see Parallel DML.
- You cannot execute two statements to concurrently write data to the same table. This is because the table is locked during data import and only read operations are supported.
- Direct load is a DDL operation and cannot be executed in a multi-row transaction that contains multiple operations.
- Direct load cannot be performed in Begin.
autocommitmust be set to1.
- Full direct load is not supported for tables with triggers.
- Full direct load is not supported for tables with generated columns. Some indexes generate hidden generated columns, for example, KEY
idx_c2(c2(16)) GLOBAL). - Liboblog and flashback queries are not supported.
Syntax
INSERT /*+ [APPEND |DIRECT(need_sort,max_error,'full')] enable_parallel_dml parallel(N) */ INTO table_name select_sentence
For more information about the syntax of the INSERT INTO statement, see INSERT (MySQL mode) and INSERT (Oracle mode).
The following table describes the parameters in the syntax.
| Parameter | Description |
|---|---|
| APPEND | DIRECT() | Specifies to enable direct load.
|
| enable_parallel_dml | Specifies whether to enable PDML.
NoteGenerally, the |
| parallel(N) | Required. The DOP for loading data. The value is an integer greater than 1. |
Examples
Import partial data in the tbl2 table to the tbl1 table in direct load mode.
Query whether the
tbl1table contains data. At this time, the query result is an empty set.obclient [test]> SELECT * FROM tbl1; Empty setQuery whether the
tbl2table contains data.obclient [test]> SELECT * FROM tbl2;The query result shows that the
tbl2table contains data.+------+------+------+ | col1 | col2 | col3 | +------+------+------+ | 1 | a1 | 11 | | 2 | a2 | 22 | | 3 | a3 | 33 | +------+------+------+ 3 rows in setUse the
LOAD /*+ DIRECT */ DATAstatement to import data.obclient [test]> INSERT /*+ DIRECT(true, 0, 'full') enable_parallel_dml parallel(16) */ INTO tbl1 SELECT t2.col1,t2.col3 FROM tbl2 t2; Query OK, 3 rows affected Records: 3 Duplicates: 0 Warnings: 0Verify whether data has been imported to the
tbl1table.obclient [test]> SELECT * FROM tbl1;The query result is as follows:
+------+------+ | col1 | col2 | +------+------+ | 1 | 11 | | 2 | 22 | | 3 | 33 | +------+------+ 3 rows in setThe query result shows that the data has been imported to the
tbl1table.(Optional) In the
Notesection of the return result of theEXPLAIN EXTENDEDstatement, check whether the data is written in direct load mode.obclient [test]> EXPLAIN EXTENDED INSERT /*+ direct(true, 0, 'full') enable_parallel_dml parallel(16) */ INTO tbl1 SELECT t2.col1,t2.col3 FROM tbl2 t2;The return result is as follows:
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Query Plan | +--------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ============================================================================== | | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | | ------------------------------------------------------------------------------ | | |0 |PX COORDINATOR | |3 |27 | | | |1 |└─EXCHANGE OUT DISTR |:EX10001 |3 |27 | | | |2 | └─INSERT | |3 |26 | | | |3 | └─EXCHANGE IN DISTR | |3 |1 | | | |4 | └─EXCHANGE OUT DISTR (RANDOM)|:EX10000 |3 |1 | | | |5 | └─SUBPLAN SCAN |ANONYMOUS_VIEW1|3 |1 | | | |6 | └─PX BLOCK ITERATOR | |3 |1 | | | |7 | └─TABLE FULL SCAN |t2 |3 |1 | | | ============================================================================== | | Outputs & filters: | | ------------------------------------- | | 0 - output(nil), filter(nil), rowset=16 | | 1 - output(nil), filter(nil), rowset=16 | | dop=16 | | 2 - output(nil), filter(nil) | | columns([{tbl1: ({tbl1: (tbl1.__pk_increment(0x7efa63627790), tbl1.col1(0x7efa63611980), tbl1.col3(0x7efa63611dc0))})}]), partitions(p0), | | column_values([T_HIDDEN_PK(0x7efa63627bd0)], [column_conv(INT,PS: (11,0),NULL,ANONYMOUS_VIEW1.col1(0x7efa63626f10))(0x7efa63627ff0)], [column_conv(INT, | | PS: (11,0),NULL,ANONYMOUS_VIEW1.col3(0x7efa63627350))(0x7efa6362ff20)]) | | 3 - output([T_HIDDEN_PK(0x7efa63627bd0)], [ANONYMOUS_VIEW1.col1(0x7efa63626f10)], [ANONYMOUS_VIEW1.col3(0x7efa63627350)]), filter(nil), rowset=16 | | 4 - output([T_HIDDEN_PK(0x7efa63627bd0)], [ANONYMOUS_VIEW1.col1(0x7efa63626f10)], [ANONYMOUS_VIEW1.col3(0x7efa63627350)]), filter(nil), rowset=16 | | dop=16 | | 5 - output([ANONYMOUS_VIEW1.col1(0x7efa63626f10)], [ANONYMOUS_VIEW1.col3(0x7efa63627350)]), filter(nil), rowset=16 | | access([ANONYMOUS_VIEW1.col1(0x7efa63626f10)], [ANONYMOUS_VIEW1.col3(0x7efa63627350)]) | | 6 - output([t2.col1(0x7efa63625ed0)], [t2.col3(0x7efa63626780)]), filter(nil), rowset=16 | | 7 - output([t2.col1(0x7efa63625ed0)], [t2.col3(0x7efa63626780)]), filter(nil), rowset=16 | | access([t2.col1(0x7efa63625ed0)], [t2.col3(0x7efa63626780)]), partitions(p0) | | is_index_back=false, is_global_index=false, | | range_key([t2.__pk_increment(0x7efa63656410)]), range(MIN ; MAX)always true | | Used Hint: | | ------------------------------------- | | /*+ | | | | USE_PLAN_CACHE( NONE ) | | PARALLEL(16) | | ENABLE_PARALLEL_DML | | DIRECT(TRUE, 0, 'FULL') | | */ | | Qb name trace: | | ------------------------------------- | | stmt_id:0, stmt_type:T_EXPLAIN | | stmt_id:1, INS$1 | | stmt_id:2, SEL$1 | | Outline Data: | | ------------------------------------- | | /*+ | | BEGIN_OUTLINE_DATA | | PARALLEL(@"SEL$1" "t2"@"SEL$1" 16) | | FULL(@"SEL$1" "t2"@"SEL$1") | | USE_PLAN_CACHE( NONE ) | | PARALLEL(16) | | ENABLE_PARALLEL_DML | | OPTIMIZER_FEATURES_ENABLE('4.3.3.0') | | DIRECT(TRUE, 0, 'FULL') | | END_OUTLINE_DATA | | */ | | Optimization Info: | | ------------------------------------- | | t2: | | table_rows:3 | | physical_range_rows:3 | | logical_range_rows:3 | | index_back_rows:0 | | output_rows:3 | | table_dop:16 | | dop_method:Global DOP | | avaiable_index_name: [tbl2] | | stats info: [version=0, is_locked=0, is_expired=0] | | dynamic sampling level:0 | | estimation method: [DEFAULT, STORAGE] | | Plan Type: | | DISTRIBUTED | | Note: | | Degree of Parallelism is 16 because of hint | | Direct-mode is enabled in insert into select | +--------------------------------------------------------------------------------------------------------------------------------------------------------------+ 77 rows in set (0.009 sec)
Import partial data in the tbl4 table to the tbl3 table in direct load mode.
Query whether the
tbl3table contains data. At this time, the query result is an empty set.obclient [test]> SELECT * FROM tbl3; Empty setQuery whether the
tbl4table contains data.obclient [test]> SELECT * FROM tbl4;The query result shows that the
tbl4table contains data.+------+------+------+ | COL1 | COL2 | COL3 | +------+------+------+ | 1 | a1 | 11 | | 2 | a2 | 22 | | 3 | a3 | 33 | +------+------+------+ 3 rows in set (0.000 sec)Use the
LOAD /*+ DIRECT */ DATAstatement to import data.obclient [test]> INSERT /*+ direct(true, 0, 'full') enable_parallel_dml parallel(16) */ INTO tbl3 SELECT t2.col1, t2.col3 FROM tbl4 t2 WHERE ROWNUM <= 10000; Query OK, 3 rows affected Records: 3 Duplicates: 0 Warnings: 0Verify whether data has been imported to the
tbl3table.obclient [test]> SELECT * FROM tbl3;The query result is as follows:
+------+------+ | col1 | col3 | +------+------+ | 1 | 11 | | 2 | 22 | | 3 | 33 | +------+------+ 3 rows in setThe query result shows that the data has been imported to the
tbl3table.(Optional) In the
Notesection of the return result of theEXPLAIN EXTENDEDstatement, check whether the data is written in direct load mode.obclient [test]> EXPLAIN EXTENDED INSERT /*+ direct(true, 0, 'full') enable_parallel_dml parallel(16) */ INTO tbl3 SELECT t2.col1,t2.col3 FROM tbl4 t2;The return result is as follows:
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Query Plan | +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | =================================================================================== | | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | | ----------------------------------------------------------------------------------- | | |0 |OPTIMIZER STATS MERGE | |3 |29 | | | |1 |└─PX COORDINATOR | |3 |29 | | | |2 | └─EXCHANGE OUT DISTR |:EX10001 |3 |27 | | | |3 | └─INSERT | |3 |27 | | | |4 | └─OPTIMIZER STATS GATHER | |3 |1 | | | |5 | └─EXCHANGE IN DISTR | |3 |1 | | | |6 | └─EXCHANGE OUT DISTR (RANDOM) |:EX10000 |3 |1 | | | |7 | └─SUBPLAN SCAN |ANONYMOUS_VIEW1|3 |1 | | | |8 | └─PX BLOCK ITERATOR | |3 |1 | | | |9 | └─COLUMN TABLE FULL SCAN|T2 |3 |1 | | | =================================================================================== | | Outputs & filters: | | ------------------------------------- | | 0 - output(nil), filter(nil), rowset=16 | | 1 - output([column_conv(NUMBER,PS: (-1,0),NULL,ANONYMOUS_VIEW1.COL1(0x7ef8f6027720))(0x7ef8f60287f0)], [column_conv(NUMBER,PS: (-1,0),NULL,ANONYMOUS_VIEW1.COL3(0x7ef8f6027b60))(0x7ef8f6030710)]), filter(nil), rowset=16 | | 2 - output([column_conv(NUMBER,PS: (-1,0),NULL,ANONYMOUS_VIEW1.COL1(0x7ef8f6027720))(0x7ef8f60287f0)], [column_conv(NUMBER,PS: (-1,0),NULL,ANONYMOUS_VIEW1.COL3(0x7ef8f6027b60))(0x7ef8f6030710)]), filter(nil), rowset=16 | | dop=16 | | 3 - output([column_conv(NUMBER,PS: (-1,0),NULL,ANONYMOUS_VIEW1.COL1(0x7ef8f6027720))(0x7ef8f60287f0)], [column_conv(NUMBER,PS: (-1,0),NULL,ANONYMOUS_VIEW1.COL3(0x7ef8f6027b60))(0x7ef8f6030710)]), filter(nil) | | columns([{TBL3: ({TBL3: (TBL3.__pk_increment(0x7ef8f6027fa0), TBL3.COL1(0x7ef8f6011d50), TBL3.COL3(0x7ef8f6012190))})}]), partitions(p0), | | column_values([T_HIDDEN_PK(0x7ef8f60283e0)], [column_conv(NUMBER,PS: (-1,0),NULL,ANONYMOUS_VIEW1.COL1(0x7ef8f6027720))(0x7ef8f60287f0)], [column_conv(NUMBER, | | PS: (-1,0),NULL,ANONYMOUS_VIEW1.COL3(0x7ef8f6027b60))(0x7ef8f6030710)]) | | 4 - output([column_conv(NUMBER,PS: (-1,0),NULL,ANONYMOUS_VIEW1.COL1(0x7ef8f6027720))(0x7ef8f60287f0)], [column_conv(NUMBER,PS: (-1,0),NULL,ANONYMOUS_VIEW1.COL3(0x7ef8f6027b60))(0x7ef8f6030710)], | | [T_HIDDEN_PK(0x7ef8f60283e0)]), filter(nil), rowset=16 | | 5 - output([T_HIDDEN_PK(0x7ef8f60283e0)], [ANONYMOUS_VIEW1.COL1(0x7ef8f6027720)], [ANONYMOUS_VIEW1.COL3(0x7ef8f6027b60)]), filter(nil), rowset=16 | | 6 - output([T_HIDDEN_PK(0x7ef8f60283e0)], [ANONYMOUS_VIEW1.COL1(0x7ef8f6027720)], [ANONYMOUS_VIEW1.COL3(0x7ef8f6027b60)]), filter(nil), rowset=16 | | dop=16 | | 7 - output([ANONYMOUS_VIEW1.COL1(0x7ef8f6027720)], [ANONYMOUS_VIEW1.COL3(0x7ef8f6027b60)]), filter(nil), rowset=16 | | access([ANONYMOUS_VIEW1.COL1(0x7ef8f6027720)], [ANONYMOUS_VIEW1.COL3(0x7ef8f6027b60)]) | | 8 - output([T2.COL1(0x7ef8f60264c0)], [T2.COL3(0x7ef8f6026f90)]), filter(nil), rowset=16 | | 9 - output([T2.COL1(0x7ef8f60264c0)], [T2.COL3(0x7ef8f6026f90)]), filter(nil), rowset=16 | | access([T2.COL1(0x7ef8f60264c0)], [T2.COL3(0x7ef8f6026f90)]), partitions(p0) | | is_index_back=false, is_global_index=false, | | range_key([T2.__pk_increment(0x7ef8f60584a0)]), range(MIN ; MAX)always true | | Used Hint: | | ------------------------------------- | | /*+ | | | | USE_PLAN_CACHE( NONE ) | | PARALLEL(16) | | ENABLE_PARALLEL_DML | | DIRECT(TRUE, 0, 'FULL') | | */ | | Qb name trace: | | ------------------------------------- | | stmt_id:0, stmt_type:T_EXPLAIN | | stmt_id:1, INS$1 | | stmt_id:2, SEL$1 | | Outline Data: | | ------------------------------------- | | /*+ | | BEGIN_OUTLINE_DATA | | PARALLEL(@"SEL$1" "T2"@"SEL$1" 16) | | FULL(@"SEL$1" "T2"@"SEL$1") | | USE_COLUMN_TABLE(@"SEL$1" "T2"@"SEL$1") | | USE_PLAN_CACHE( NONE ) | | PARALLEL(16) | | ENABLE_PARALLEL_DML | | OPTIMIZER_FEATURES_ENABLE('4.3.3.0') | | DIRECT(TRUE, 0, 'FULL') | | END_OUTLINE_DATA | | */ | | Optimization Info: | | ------------------------------------- | | T2: | | table_rows:3 | | physical_range_rows:3 | | logical_range_rows:3 | | index_back_rows:0 | | output_rows:3 | | table_dop:16 | | dop_method:Global DOP | | avaiable_index_name: [TBL4] | | stats info: [version=0, is_locked=0, is_expired=0] | | dynamic sampling level:0 | | estimation method: [DEFAULT, STORAGE] | | Plan Type: | | DISTRIBUTED | | Note: | | Degree of Parallelism is 16 because of hint | +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 82 rows in set (0.006 sec)