The OceanBase Database optimizer allows you to manually collect statistics by using the DBMS_STATS package or the ANALYZE statement.
Note
OceanBase Database does not support the estimation of the statistics collection time. Therefore, we recommend that you set ob_query_timeout long enough to complete the collection of statistics. Especially, if schema-level statistics collection is interrupted due to timeout, the collected statistics are retained. In addition, we recommend that you increase the value of the ob_query_timeout variable at the session level, which affects only the current session without interfering with other operations.
Collect statistics by using the DBMS_STATS package
The OceanBase Database optimizer allows you to use the DBMS_STATS package to collect table-level statistics, schema-level statistics, and index statistics by calling stored procedures gather_table_stats, gather_schema_stats, and gather_index_stats respectively.
PROCEDURE gather_table_stats (
ownname VARCHAR2,
tabname VARCHAR2,
partname VARCHAR2 DEFAULT NULL,
estimate_percent NUMBER DEFAULT AUTO_SAMPLE_SIZE,
block_sample BOOLEAN DEFAULT FALSE,
method_opt VARCHAR2 DEFAULT DEFAULT_METHOD_OPT,
degree NUMBER DEFAULT NULL,
granularity VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
cascade BOOLEAN DEFAULT NULL,
stattab VARCHAR2 DEFAULT NULL,
statid VARCHAR2 DEFAULT NULL,
statown VARCHAR2 DEFAULT NULL,
no_invalidate BOOLEAN DEFAULT FALSE,
stattype VARCHAR2 DEFAULT 'DATA',
force BOOLEAN DEFAULT FALSE
);
PROCEDURE gather_schema_stats (
ownname VARCHAR2,
estimate_percent NUMBER DEFAULT AUTO_SAMPLE_SIZE,
block_sample BOOLEAN DEFAULT FALSE,
method_opt VARCHAR2 DEFAULT DEFAULT_METHOD_OPT,
degree NUMBER DEFAULT NULL,
granularity VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
cascade BOOLEAN DEFAULT NULL,
stattab VARCHAR2 DEFAULT NULL,
statid VARCHAR2 DEFAULT NULL,
statown VARCHAR2 DEFAULT NULL,
no_invalidate BOOLEAN DEFAULT FALSE,
stattype VARCHAR2 DEFAULT 'DATA',
force BOOLEAN DEFAULT FALSE
);
PROCEDURE gather_index_stats (
ownname VARCHAR2,
indname VARCHAR2,
partname VARCHAR2 DEFAULT NULL,
estimate_percent NUMBER DEFAULT AUTO_SAMPLE_SIZE,
stattab VARCHAR2 DEFAULT NULL,
statid VARCHAR2 DEFAULT NULL,
statown VARCHAR2 DEFAULT NULL,
degree NUMBER DEFAULT NULL,
granularity VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
no_invalidate BOOLEAN DEFAULT FALSE,
force BOOLEAN DEFAULT FALSE,
tabname VARCHAR2 DEFAULT NULL
);
The following table describes the parameters.
| Parameter | Description |
|---|---|
| ownname |
|
| tabname | The name of the table. |
| indname | The name of the index. |
| partname | The name of the partition. The default value is NULL. |
| estimate_percent | The percentage of data to be used to compute distribution features. Value range: [0.000001,100]. If you specify this parameter to NULL, all data is used. The default value is AUTO_SAMPLE_SIZE, which indicates that the optimizer determines the percentage of data used for computation. |
| block_sample | Specifies whether to use block sampling instead of row sampling. Default value: FALSE. |
| method_opt | The statistics collection method at the column level. For more information, see method_opt. |
| degree | The degree of parallelism (DOP) in statistics collection. Default value: NULL. |
| granularity | The granularity in statistics collection. Valid values:
|
| cascade | Specifies whether to collect index statistics on the table at the same time. Default value: TRUE. |
| stattab | This parameter is not available. |
| statid | This parameter is not available. |
| statown | This parameter is not available. |
| no_invalidate | This parameter is not available. |
| stattype | This parameter is not available. |
| force | Specifies whether to collect statistics by force and ignore the lock status. Default value: FALSE. This parameter can be executed only in the BEGIN ... END clause of a PL statement. |
method_opt
method_opt can be set in the following syntax:
method_opt:
FOR ALL [INDEXED | HIDDEN] COLUMNS [size_clause]
| FOR COLUMNS [size clause] column [size_clause] [,column [size_clause]...]
size_clause:
SIZE integer
| SIZE REPEAT
| SIZE AUTO
| SIZE SKEWONLY
column:
column_name
| (column_name [, column_name])
The following table describes the parameters.
| Parameter | Description |
|---|---|
| SIZE integer | The number of histogram buckets for the column. Value range: [1, 2048]. |
| REPEAT | Histograms are collected only for columns whose histograms have been collected. The previous number of histogram buckets set for collection is used. |
| AUTO | The OceanBase Database optimizer determines whether to collect column histograms based on column usage. The default number of histogram buckets is 254. |
| SKEWONLY | Histograms are collected only for columns with uneven data distribution. The default number of histogram buckets is 254. |
In addition, the DBMS_STATS package also provides GATHER_DATABASE_STATS_JOB_PROC to collect statistics on all tables in the entire database. The default preference settings are used for all collection strategies. For more information about how to set preferences, see Manage statistics preferences.
PROCEDURE GATHER_DATABASE_STATS_JOB_PROC();
Examples
The following examples use the DBMS_STATS package to collect statistics:
Collect global statistics on table
tbl1under useruser.CALL dbms_stats.gather_table_stats('user', 'tbl1', granularity=>'GLOBAL', method_opt=>'FOR ALL COLUMNS SIZE AUTO');Collect partition-level statistics on table
t_part1under useruser, with a DOP of 64. Only histograms of columns with uneven data distribution are collected.CALL dbms_stats.gather_table_stats('user', 't_part1', degree=>'64', granularity=>'PARTITION', method_opt=>'FOR ALL COLUMNS SIZE SKEWONLY');Collect all statistics on table
t_subpart1under useruser, with a DOP of 128. Only 50% of data is collected, with histograms of columns determined by the optimizer.CALL dbms_stats.gather_table_stats('user', 't_subpart1', degree=>'128', estimate_percent=> '50', granularity=>'ALL', method_opt=>'FOR ALL COLUMNS SIZE AUTO');
GRANULARITY association between table-level and index statistics collection strategies
By default, all index statistics on a table are collected along with the table statistics. Some statistics, including the number of rows and the average row length (which is calculated based on the corresponding columns), on an index table can be obtained from the statistics on the base table. This accelerates statistics collection and also avoids repeated full table scans. The following table lists the differences in GRANULARITY between local indexes and global indexes.
| Specified partition | GRANULARITY | Global index | Local index |
|---|---|---|---|
| No | ALL | GLOBAL | ALL |
| No | AUTO | GLOBAL | AUTO |
| No | DEFAULT | GLOBAL | DEFAULT |
| No | GLOBAL AND PARTITION | GLOBAL | GLOBAL AND PARTITION |
| No | APPROX_GLOBAL AND PARTITION | GLOBAL | APPROX_GLOBAL AND PARTITION |
| No | GLOBAL | GLOBAL | GLOBAL |
| No | PARTITION | GLOBAL | PARTITION |
| No | SUBPARTITION | Not collected | SUBPARTITION |
| Yes | ALL | GLOBAL | ALL |
| Yes | AUTO | GLOBAL | AUTO |
| Yes | DEFAULT | GLOBAL | DEFAULT |
| Yes | GLOBAL AND PARTITION | GLOBAL | GLOBAL AND PARTITION |
| Yes | APPROX_GLOBAL AND PARTITION | Not collected | APPROX_GLOBAL AND PARTITION |
| Yes | GLOBAL | GLOBAL | GLOBAL |
| Yes | PARTITION | Not collected | PARTITION |
| Yes | SUBPARTITION | Not collected | SUBPARTITION |
Collect statistics by using the ANALYZE statement
OceanBase Database allows you to use the ANALYZE statement to collect statistics in both Oracle and MySQL modes.
Syntax of the ANALYZE statement in Oracle mode
The syntax of the ANALYZE statement in Oracle mode is as follows:
analyze_stmt:
ANALYZE TABLE table_name [use_partition] analyze_statistics_clause
use_partition:
PARTITION (partition_name [,partition_name...])
| SUBPARTITION(subpartition_name [,subpartition_name...])
analyze_statistics_clause:
COMPUTE STATISTICS [analyze_for_clause]
| ESTIMATE STATISTICS [analyze_for_clause] [SAMPLE INTNUM {ROWS | PERCENTAGE}]
analyze_for_clause:
FOR TABLE
| FOR ALL [INDEXED | HIDDEN] COLUMNS [size_clause]
| FOR COLUMNS [size_clause] column [size_clause] [,column [size_clause]...]
size_clause:
SIZE integer
| SIZE REPEAT
| SIZE AUTO
| SIZE SKEWONLY
column:
column_name
| (column_name [,column_name])
In the preceding syntax, the histogram collection method for columns is specified in the same way as in the syntax of method_opt in the DBMS_STATS package. For more information, see method_opt.
The ANALYZE statement does not provide as many strategy settings as the DBMS_STATS package does. To implement statistics collection at the GLOBAL level only in an ANALYZE statement, you can set partition_name in the use_partition syntax to the table name.
The following examples collect statistics using the ANALYZE statement in Oracle mode:
Collect statistics on table
tbl1under useruser.ANALYZE TABLE tbl1 COMPUTE STATISTICS FOR ALL COLUMNS SIZE AUTO;Collect global statistics on table
t_part1under useruser. Only histograms of columns with uneven data distribution are collected.ANALYZE TABLE t_part1 PARTITION (t_part1) COMPUTE STATISTICS FOR ALL COLUMNS SIZE SKEWONLY;Collect statistics on partitions
p0sp0andp1sp2in tablet_subpart1under useruser, with histograms of columns determined by the optimizer.ANALYZE TABLE t_subpart1 SUBPARTITION(p0sp0,p1sp2) COMPUTE STATISTICS FOR ALL COLUMNS SIZE AUTO;
Syntax of the ANALYZE statement in MySQL mode
The syntax of the ANALYZE statement in MySQL mode is as follows:
analyze_stmt:
ANALYZE TABLE table_name UPDATE HISTOGRAM ON column_name_list WITH INTNUM BUCKETS
When the session variable enable_sql_extension is TRUE, you can use the syntax in Oracle mode, which is the extended syntax for the MySQL mode of OceanBase Database.
analyze_stmt:
ANALYZE TABLE table_name [use_partition] analyze_statistics_clause
The following examples collect statistics using the ANALYZE statement in MySQL mode:
Collect statistics on table
tbl1, with the number of buckets being 30 for columns.ANALYZE TABLE tbl1 UPDATE HISTOGRAM ON a, b, c, d WITH 30 BUCKETS;In MySQL mode, collect statistics on table
tbl1under usertestby using the syntax in Oracle mode.ALTER SYSTEM SET ENABLE_SQL_EXTENSION = TRUE; ANALYZE TABLE tbl1 COMPUTE STATISTICS FOR ALL COLUMNS SIZE AUTO;
Practice scenarios for manual statistics collection
Statistics collection on non-partitioned tables
When the product of the data volume and the number of columns in a non-partitioned table is within 10 million, we recommend that you run the following command to collect statistics on the table. In the following example, table test.t1 has 10 columns and a data volume of 1 million rows.
CREATE TABLE test.t1(c1 INT, c2 INT, c3 INT, c4 INT, c5 INT, c6 INT, c7 INT, c8 INT, c9 INT, c10 INT);
CREATE TABLE t2 (c1 INT PRIMARY KEY);
INSERT INTO t2 VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
INSERT /*+APPEND*/ INTO t1 SELECT * FROM (SELECT s0.c1 c1, s1.c1 c2, s2.c1 c3, s3.c1 c4, s4.c1 c5, s5.c1 c6, s0.c1 * 10 c7, s1.c1 * 10 c8, s2.c1 * 10 c9, s3.c1 * 10 c10 FROM t2 s0, t2 s1, t2 s2, t2 s3, t2 s4, t2 s5);
# Scenarios where histograms are not collected
CALL DBMS_STATS.GATHER_TABLE_STATS('test', 't1', method_opt=>'for all columns size 1');
# Scenarios where histograms are collected and the default strategy is used
CALL DBMS_STATS.GATHER_TABLE_STATS('test', 't1');
When the product of the data volume and the number of columns in a non-partitioned table exceeds 10 million, we recommend that you set an appropriate DOP based on the business requirements and system resources to speed up statistics collection. In the following example, the data volume in table test.t1 is increased to 10 million rows. The DOP is set to 8.
# Scenarios where histograms are not collected
CALL DBMS_STATS.GATHER_TABLE_STATS('test', 't1', degree=>8, method_opt=>'for all columns size 1');
# Scenarios where histograms are collected and the default strategy is used
CALL DBMS_STATS.GATHER_TABLE_STATS('test', 't1', degree=>8);
Statistics collection on partitioned tables
Different from statistics collection strategies for non-partitioned tables, statistics collection strategies for partitioned tables must cover the collection of partition statistics. When system resources are sufficient, we recommend that you double the DOP for statistics collection on partitioned tables.
In the following example, table test.t_part has 10 columns and a data volume of 1 million rows. Because partition statistics need to be collected additionally, the DOP is increased to 2.
CREATE TABLE t_part(c1 INT, c2 INT, c3 INT, c4 INT, c5 INT, c6 INT, c7 INT, c8 INT, c9 INT, c10 INT) PARTITION BY HASH(c1) PARTITIONS 128;
CREATE TABLE t2 (c1 INT PRIMARY KEY);
INSERT INTO t2 VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
INSERT /*+APPEND*/ INTO t1 SELECT * FROM (SELECT s0.c1 c1, s1.c1 c2, s2.c1 c3, s3.c1 c4, s4.c1 c5, s5.c1 c6, s0.c1 * 10 c7, s1.c1 * 10 c8, s2.c1 * 10 c9, s3.c1 * 10 c10 FROM t2 s0, t2 s1, t2 s2, t2 s3, t2 s4, t2 s5);
For partitioned tables, you can also deduce global statistics from partition-level statistics to speed up statistics collection. For example, in the same scenario, you can adjust the granularity in statistics collection rather than increasing the DOP. Here are some examples:
# Scenarios where histograms are not collected
CALL DBMS_STATS.GATHER_TABLE_STATS('test', 't_part', granularity=>'APPROX_GLOBAL AND PARTITION', method_opt=>'for all columns size 1');
# Scenarios where histograms are collected and the default strategy is used
CALL DBMS_STATS.GATHER_TABLE_STATS('test', 't_part', granularity=>'APPROX_GLOBAL AND PARTITION');
Statistics collection at the schema (database) level
In addition to manual statistics collection on a single table, OceanBase Database provides the DBMS_STATS system package for you to collect statistics on all tables under a user.
Note
This feature is supported only in OceanBase Database V4.x in Oracle and MySQL modes and OceanBase Database V3.2.x in Oracle mode.
Statistics collection on all tables under a user takes time. Therefore, we recommend that you use this feature during off-peak hours.
If the tables under the user are all small tables with no more than 1 million rows, you can use the command in the following example to collect statistics.
# Scenarios where histograms are not collected
CALL DBMS_STATS.GATHER_SCHEMA_STATS('test', method_opt=>'for all columns size 1');
# Scenarios where histograms are collected and the default strategy is used
CALL DBMS_STATS.GATHER_SCHEMA_STATS('test');
If the user has large tables with tens of millions of rows, you can increase the DOP and collect statistics during off-peak hours.
# Scenarios where histograms are not collected
CALL DBMS_STATS.GATHER_SCHEMA_STATS('test', degree=>'16', method_opt=>'for all columns size 1');
# Scenarios where histograms are collected and the default strategy is used
CALL DBMS_STATS.GATHER_SCHEMA_STATS('test', degree=>'16');
If the user has massive tables with more than 100 million rows, you can collect statistics on the massive tables separately at a high DOP. Then, you can lock the statistics on the massive tables and use the preceding command to collect statistics on all tables of the user. After the statistics are collected, you can unlock the statistics on the massive tables and then collect subsequent statistics in incremental mode. Here are some examples:
CALL DBMS_STATS.GATHER_TABLE_STATS('test', 'big_table', degree=>128, method_opt=>'for all columns size 1');
CALL DBMS_STATS.LOCK_TABLE_STATS('test','big_table');
CALL DBMS_STATS.GATHER_SCHEMA_STATS('TEST', degree=>'16', method_opt=>'for all columns size 1');
CALL DBMS_STATS.UNLOCK_TABLE_STATS('test','big_table');