After you import data to OceanBase Database in a regular way, you must perform a major compaction to improve query performance. However, after you import data to OceanBase Database in full direct load mode, you can achieve high query performance without a major compaction.
OceanBase Database supports full direct load and incremental direct load. Full direct load directly generates a major SSTable. The major SSTable for columnstore tables is directly generated in the columnar storage format to quickly respond to analytical processing (AP) queries. Incremental direct load writes data to SSTables in the row-based storage format. To improve the performance of AP queries, you must perform a major compaction to convert the data into the columnar storage format.
Examples
Create a table.
OceanBase(root@test)>create table lineitem( l_orderkey bigint(20) NOT NULL , l_partkey bigint(20) NOT NULL , l_suppkey bigint(20) NOT NULL , l_linenumber bigint(20) NOT NULL , l_quantity bigint(20) NOT NULL , l_extendedprice decimal(10,2) NOT NULL , l_discount decimal(10,2) NOT NULL , l_tax decimal(10,2) NOT NULL , l_returnflag char(1) , l_linestatus char(1) , l_shipdate date , l_commitdate date , l_receiptdate date , l_shipinstruct char(25) , l_shipmode char(10) , l_comment varchar(44), primary key(l_orderkey, l_linenumber) ) with column group (each column);The return result is as follows:
Query OK, 0 row affected (0.08 sec)Import data in incremental direct load mode.
OceanBase(root@test)>load data /*+ query_timeout(10000000000) parallel(64) direct(true,0,'inc_replace') */ infile '/path/to/lineitem.tbl' into table lineitem FIELDS TERMINATED BY '|' enclosed BY '' ESCAPED BY '';The return result is as follows:
Query OK, 59986052 rows affected (45.66 sec) Records: 59986052 Delected: 0 Skipped: 0 Warnings: 0Query data.
OceanBase(root@test)>select /*+ query_timeout(10000000000) */ count(*) from lineitem;The query result is as follows:
+----------+ | count(*)| +----------+ | 59986052 | +----------+ 1 row in set (45.42 sec)Truncate data.
OceanBase(root@test)>truncate lineitem;The return result is as follows:
Query OK, 0 rows affected (0.05 sec)Import data in full direct load mode.
OceanBase(root@test)>load data /*+ query_timeout(10000000000) parallel(64) direct(true,0) */ infile '/path/to/lineitem.tbl' into table lineitem FIELDS TERMINATED BY '|' enclosed BY '' ESCAPED BY '';The return result is as follows:
Query OK, 59986052 rows affected (1 min 3.83 sec) Records: 50086052 Deleted: 0 Skipped: 0 Warnings: 0Query data.
OceanBase(root@test)>select /*+ query_timeout(10000000000) */ count(*) from lineitem;The query result is as follows:
+----------+ | count(*)| +----------+ | 59986052 | +----------+ 1 row in set (0.01 sec)
After you import data in incremental direct load mode, the data query speed is lower than that achieved with columnstore tables. After you import data in full direct load mode, the data query speed can reach that achieved with columnstore tables.
References
For more information about direct load, see Overview.