Online statistics collection refers to the process where the database optimizer collects statistics during INSERT operations, without requiring manual system package calls. This allows for real-time statistics collection during large-scale data imports, without the need to initiate separate collection tasks, reducing operational overhead and improving statistics collection performance.
OceanBase Database supports online statistics collection using the GATHER_OPTIMIZER_STATISTICS/NO_GATHER_OPTIMIZER_STATISTICS hints and the system variable _optimizer_gather_stats_on_load (which is enabled by default). This feature can also be achieved using the APPEND hint with the direct load feature.
Note
The _optimizer_gather_stats_on_load is an internal system variable that is enabled by default.
The primary scenarios for online statistics collection include:
CREATE TABLE...AS SELECT(CTAS)When the system variable
_optimizer_gather_stats_on_loadis set to True, online statistics collection is enabled by default for CTAS operations. To disable this feature, use theNO_GATHER_OPTIMIZER_STATISTICShint, as shown below:CREATE TABLE table_name AS SELECT /*+NO_GATHER_OPTIMIZER_STATISTICS*/* FROM...INSERT INTO SELECTWhen the system variable
_optimizer_gather_stats_on_loadis set to True, online statistics collection is enabled by using theGATHER_OPTIMIZER_STATISTICShint or theAPPENDhint. An example is provided below:INSERT /*+GATHER_OPTIMIZER_STATISTICS*/ INTO table_name SELECT * FROM table_nameNotice
OceanBase Database currently does not support online statistics collection through the
INSERT /*+GATHER_OPTIMIZER_STATISTICS*/ INTO table_name VALUES()...statement.PDML
Online statistics collection can be enabled using the
ENABLE_PARALLEL_DMLhint or by setting the system to parallel mode with theALTER SESSION ENABLE PARALLEL DML;command, allowing for parallel insert operations.An example is provided below:
CREATE TABLE t1(c1 INT) PARTITION BY HASH(c1) PARTITIONS 4; CREATE TABLE t2(c1 INT) PARTITION BY HASH(c1) PARTITIONS 4; INSERT INTO t1 VALUES (1),(2),(3),(4); obclient> EXPLAIN INSERT /*+ENABLE_PARALLEL_DML PARALLEL(4)*/ INTO t2 SELECT * FROM t1; +-----------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Query Plan | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ================================================================================ | | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | | -------------------------------------------------------------------------------- | | |0 |OPTIMIZER STATS MERGE | |1 |16 | | | |1 | PX COORDINATOR | |1 |16 | | | |2 | EXCHANGE OUT DISTR |:EX10001 |1 |16 | | | |3 | INSERT | |1 |15 | | | |4 | EXCHANGE IN DISTR | |1 |2 | | | |5 | EXCHANGE OUT DISTR (PKEY RANDOM)|:EX10000 |1 |2 | | | |6 | OPTIMIZER STATS GATHER | |1 |2 | | | |7 | SUBPLAN SCAN |ANONYMOUS_VIEW1|1 |2 | | | |8 | PX BLOCK ITERATOR | |1 |2 | | | |9 | TABLE SCAN |T1 |1 |2 | | | ================================================================================ | | Outputs & filters: | | ------------------------------------- | | 0 - output(nil), filter(nil), rowset=256 | | 1 - output([column_conv(NUMBER,PS:(-1,0),NULL,ANONYMOUS_VIEW1.C1)]), filter(nil), rowset=256 | | 2 - output([column_conv(NUMBER,PS:(-1,0),NULL,ANONYMOUS_VIEW1.C1)]), filter(nil), rowset=256 | | dop=4 | | 3 - output([column_conv(NUMBER,PS:(-1,0),NULL,ANONYMOUS_VIEW1.C1)]), filter(nil) | | columns([{T2: ({T2: (T2.__pk_increment, T2.C1)})}]), partitions(p[0-3]), | | column_values([T_HIDDEN_PK], [column_conv(NUMBER,PS:(-1,0),NULL,ANONYMOUS_VIEW1.C1)]) | | 4 - output([column_conv(NUMBER,PS:(-1,0),NULL,ANONYMOUS_VIEW1.C1)], [PARTITION_ID], [T_HIDDEN_PK]), filter(nil), rowset=256 | | 5 - output([column_conv(NUMBER,PS:(-1,0),NULL,ANONYMOUS_VIEW1.C1)], [PARTITION_ID], [T_HIDDEN_PK]), filter(nil), rowset=256 | | (#keys=1, [column_conv(NUMBER,PS:(-1,0),NULL,ANONYMOUS_VIEW1.C1)]), dop=4 | | 6 - output([column_conv(NUMBER,PS:(-1,0),NULL,ANONYMOUS_VIEW1.C1)], [calc_tablet_id(PARTITION_HASH(column_conv(NUMBER,PS:(-1,0),NULL,ANONYMOUS_VIEW1.C1)))]), | | filter(nil), rowset=256 | | 7 - output([ANONYMOUS_VIEW1.C1]), filter(nil), rowset=256 | | access([ANONYMOUS_VIEW1.C1]) | | 8 - output([T1.C1]), filter(nil), rowset=256 | | 9 - output([T1.C1]), filter(nil), rowset=256 | | access([T1.C1]), partitions(p[0-3]) | | is_index_back=false, is_global_index=false, | | range_key([T1.__pk_increment]), range(MIN ; MAX)always true | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------+ 35 rows in setIn the above example, the generation of the
OPTIMIZER STATS MERGEoperator indicates that online statistics collection was executed.
The collected statistics can be viewed in the corresponding views. For more information, see Overview of statistics collection methods.
An example is provided below:
CREATE TABLE t1(c1 INT) PARTITION BY HASH(c1) PARTITIONS 4;
CREATE TABLE t2(c1 INT) PARTITION BY HASH(c1) PARTITIONS 4;
INSERT INTO t1 VALUES (1),(2),(3),(4);
INSERT /*+ENABLE_PARALLEL_DML PARALLEL(4)*/ INTO t2 SELECT * FROM t1;
-- Oracle mode
OceanBase(SYS@SYS)> SELECT table_name, partition_name, num_rows, avg_row_len FROM user_tab_statistics WHERE table_name='T2';
+------------+----------------+----------+-------------+
| TABLE_NAME | PARTITION_NAME | NUM_ROWS | AVG_ROW_LEN |
+------------+----------------+----------+-------------+
| T2 | NULL | 4 | 20 |
| T2 | P0 | 1 | 20 |
| T2 | P1 | 2 | 20 |
| T2 | P2 | 1 | 20 |
| T2 | P3 | 0 | 0 |
+------------+----------------+----------+-------------+
-- MySQL mode
OceanBase(root@test)> SELECT table_name, partition_name, num_rows, avg_row_len FROM OCEANBASE.DBA_TAB_STATISTICS WHERE table_name='T2';
+------------+----------------+----------+-------------+
| table_name | partition_name | num_rows | avg_row_len |
+------------+----------------+----------+-------------+
| t2 | NULL | 4 | 20 |
| t2 | p0 | 1 | 20 |
| t2 | p1 | 1 | 20 |
| t2 | p2 | 1 | 20 |
| t2 | p3 | 1 | 20 |
+------------+----------------+----------+-------------+