The OceanBase Database optimizer allows you to manually collect statistics by using the DBMS_STATS package or executing the ANALYZE statement. We recommend that you use the DBMS_STATS package because it provides more features.
Use the DBMS_STATS package to collect statistics
You can use the DBMS_STATS package for manual statistics collection in both Oracle mode and MySQL mode of OceanBase Database V4.0. The package provides the GATHER_TABLE_STATS procedure to collect statistics of one table and the GATHER_SCHEMA_STATS procedure to collect statistics of all tables in a database. The basic definitions of the two procedures are described as follows:
Note
For more information about the DBMS_STATS package, see Overview (MySQL mode) and Overview (Oracle mode).
PROCEDURE gather_table_stats (
ownname VARCHAR2,
tabname VARCHAR2,
partname VARCHAR2 DEFAULT NULL,
estimate_percent NUMBER DEFAULT AUTO_SAMPLE_SIZE,
block_sample BOOLEAN DEFAULT FALSE,
method_opt VARCHAR2 DEFAULT DEFAULT_METHOD_OPT,
degree NUMBER DEFAULT NULL,
granularity VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
cascade BOOLEAN DEFAULT NULL,
stattab VARCHAR2 DEFAULT NULL,
statid VARCHAR2 DEFAULT NULL,
statown VARCHAR2 DEFAULT NULL,
no_invalidate BOOLEAN DEFAULT FALSE,
stattype VARCHAR2 DEFAULT 'DATA',
force BOOLEAN DEFAULT FALSE
);
PROCEDURE gather_schema_stats (
ownname VARCHAR2,
estimate_percent NUMBER DEFAULT AUTO_SAMPLE_SIZE,
block_sample BOOLEAN DEFAULT FALSE,
method_opt VARCHAR2 DEFAULT DEFAULT_METHOD_OPT,
degree NUMBER DEFAULT NULL,
granularity VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
cascade BOOLEAN DEFAULT NULL,
stattab VARCHAR2 DEFAULT NULL,
statid VARCHAR2 DEFAULT NULL,
statown VARCHAR2 DEFAULT NULL,
no_invalidate BOOLEAN DEFAULT FALSE,
stattype VARCHAR2 DEFAULT 'DATA',
force BOOLEAN DEFAULT FALSE
);
The parameters are described as follows:
ownname: the username. If the username is set toNULL, the current login username is used by default.tabname: the name of the table.partname: the name of the partition. The default value isNULL.estimate_percent: the percentage of data to be used to compute distribution features. The value range is [0.000001, 100]. If you set this parameter toNULL, all data is used. The default value isAUTO_SAMPLE_SIZE, which indicates that the optimizer determines the percentage of data used for computation. You can use the default value.block_sample: specifies whether to use block sampling instead of row sampling. The default value isFALSE.method_opt: the statistics collection method at the column level. You can use the following syntax to set this parameter:method_opt: FOR ALL [INDEXED | HIDDEN] COLUMNS [size_clause] | FOR COLUMNS [size clause] column [size_clause] [,column [size_clause]...] size_clause: SIZE integer | SIZE REPEAT | SIZE AUTO | SIZE SKEWONLY column: column_name | (column_name [, column_name])size_clausespecifies how to collect histograms from column statistics. The options ofsize_clauseare described as follows:integer: the number of histogram buckets for the column. The value range is [1, 2048].REPEAT: specifies to collect histograms only for columns whose histograms have been collected. The previous number of histogram buckets set for collection is used.AUTO: The OceanBase Database optimizer determines whether to collect column histograms based on column usage. The default number of histogram buckets is 254.SKEWONLY: specifies to collect histograms only for columns with uneven data distribution. The default number of histogram buckets is 254.
degree: the degree of parallelism (DOP) in statistics collection. The default value isNULL. You can set the DOP in preferences, and the default value is1.granularity: the granularity in statistics collection. Valid values:'GLOBAL': collects global statistics.'PARTITION': collects partition-level statistics.'SUBPARTITION': collects subpartition-level statistics.'ALL': collects global, partition-level, and subpartition-level statistics.'AUTO': uses the default method to collect statistics at all levels, including GLOBAL, PARTITION, and SUBPARTITION. It is the default value.'DEFAULT': collects global and partition-level statistics.'GLOBAL AND PARTITION': collects global and partition-level statistics.'APPROX_GLOBAL AND PARTITION': collects partition-level statistics and deduces global statistics based on the partition-level statistics.
cascade: specifies whether to collect index statistics on the table at the same time. The default value isTRUE.stattab: This parameter is unavailable.statid: This parameter is unavailable.statown: This parameter is unavailable.no_invalidate: This parameter is unavailable.stattype: This parameter is unavailable.force: specifies whether to perform forced collection and ignore the lock status. The default value isFALSE.
Examples
Example 1: Collect global statistics on the T1 table under the TEST user, with the number of buckets being 128 for all columns
call dbms_stats.gather_table_stats('TEST', 'T1', granularity=>'GLOBAL', method_opt=>'FOR ALL COLUMNS SIZE 128');
Example 2: Collect partition-level statistics on the T_PART1 table under the TEST user, with a DOP of 64. Only histograms of columns with uneven data distribution are collected
call dbms_stats.gather_table_stats('TEST', 'T_PART1', degree=>64, granularity=>'PARTITION', method_opt=>'FOR ALL COLUMNS SIZE SKEWONLY');
Example 3: Collect all statistics on the T_SUBPART1 table under the TEST user, with a DOP of 128. Only 50% of data is collected, with histograms of columns determined by the optimizer
call dbms_stats.gather_table_stats('TEST', 'T_SUBPART1', degree=>128, estimate_percent=> '50', granularity=>'ALL', method_opt=>'FOR ALL COLUMNS SIZE AUTO');
Example 4: Collect statistics on all tables under the TEST user, with a DOP of 128
call dbms_stats.gather_schema_stats('TEST', degree=>128);
Use the ANALYZE statement to collect statistics
You can execute the ANALYZE statement to collect statistics in both Oracle mode and MySQL mode of OceanBase Database. The syntax is described as follows:
analyze_stmt:
ANALYZE TABLE table_name [use_partition] analyze_statistics_clause
use_partition:
PARTITION (partition_name [,partition_name,....])
| SUBPARTITION(subpartition_name, [,subpartition_name,...])
analyze_statistics_clause:
COMPUTE STATISTICS [analyze_for_clause]
| ESTIMATE STATISTICS [analyze_for_clause] [SAMPLE INTNUM {ROWS | PERCENTAGE}]
analyze_for_clause:
FOR TABLE
| FOR ALL [INDEXED | HIDDEN] COLUMNS [size_clause]
| FOR COLUMNS [size clause] column [size_clause] [,column [size_clause]...]
size_clause:
SIZE integer
| SIZE REPEAT
| SIZE AUTO
| SIZE SKEWONLY
column:
column_name
| (column_name [, column_name])
Examples
Example 1: In Oracle mode, collect statistics on the T1 table under the TEST user, with the number of buckets being 128 for all columns
obclient> ANALYZE TABLE T1 COMPUTE STATISTICS FOR ALL COLUMNS SIZE 128;
Example 2: In Oracle mode, collect global statistics on the T_PART1 table under the TEST user. Only histograms of columns with uneven data distribution are collected
obclient> ANALYZE TABLE T_PART1 PARTITION('T_PART1') COMPUTE STATISTICS FOR ALL COLUMNS SIZE skewonly;
Example 3: In Oracle mode, collect all statistics on partitions p0sp0 and p1ps2 of the T_SUBPART1 table under the TEST user, with histograms of columns determined by the optimizer
obclient> ANALYZE TABLE T_SUBPART1 SUBPARTITION('p0sp0','p1ps2') COMPUTE STATISTICS FOR ALL COLUMNS SIZE auto;
Before you execute the preceding ANALYZE syntax in MySQL mode, you must set the enable_sql_extension parameter to TRUE. This is because native MySQL does not support the preceding syntax, and you can execute it only after you enable SQL extension.
Example 4: In MySQL mode, use the preceding syntax to collect statistics on the T1 table under the TEST user, with the number of buckets set to 128 for all columns
obclient> ALTER SYSTEM SET enable_sql_extension = true;
obclient> ANALYZE TABLE T1 COMPUTE STATISTICS FOR ALL COLUMNS SIZE 128;
The OceanBase Database optimizer supports the following ANALYZE syntax of native MySQL:
analyze_stmt:
ANALYZE TABLE table_name UPDATE HISTOGRAM ON column_name_list WITH INTNUM BUCKETS
Summary
In addition to the preceding two procedures that are commonly used for statistics collection, the DMBS_STATS package also provides the GATHER_INDEX_STATS procedure for you to collect index statistics and the GATHER_DATABASE_STATS_JOB_PROC procedure to collect statistics on all tables of all databases in a tenant.