At present, the OceanBase Database optimizer provides two options for manually collecting statistics: the DBMS_STATS package and the ANALYZE command line. We recommend that you use the DBMS_STATS package for manual collection, as it offers more features.
Use the DBMS_STATS package to collect statistics
In OceanBase Database V4.0, you can use the DBMS_STATS package for manual statistics collection in both the Oracle-compatible mode and MySQL-compatible mode. This package includes two commonly used procedures: GATHER_TABLE_STATS for collecting statistics on a specific table, and GATHER_SCHEMA_STATS for gathering statistics across all tables in a particular database. Basic definitions of these two procedures are described as follows:
Note
For more information about the DBMS_STATS package, see Overview (MySQL-compatible mode) and Overview (Oracle-compatible mode).
PROCEDURE gather_table_stats (
ownname VARCHAR2,
tabname VARCHAR2,
partname VARCHAR2 DEFAULT NULL,
estimate_percent NUMBER DEFAULT AUTO_SAMPLE_SIZE,
block_sample BOOLEAN DEFAULT FALSE,
method_opt VARCHAR2 DEFAULT DEFAULT_METHOD_OPT,
degree NUMBER DEFAULT NULL,
granularity VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
cascade BOOLEAN DEFAULT NULL,
stattab VARCHAR2 DEFAULT NULL,
statid VARCHAR2 DEFAULT NULL,
statown VARCHAR2 DEFAULT NULL,
no_invalidate BOOLEAN DEFAULT FALSE,
stattype VARCHAR2 DEFAULT 'DATA',
force BOOLEAN DEFAULT FALSE
);
PROCEDURE gather_schema_stats (
ownname VARCHAR2,
estimate_percent NUMBER DEFAULT AUTO_SAMPLE_SIZE,
block_sample BOOLEAN DEFAULT FALSE,
method_opt VARCHAR2 DEFAULT DEFAULT_METHOD_OPT,
degree NUMBER DEFAULT NULL,
granularity VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
cascade BOOLEAN DEFAULT NULL,
stattab VARCHAR2 DEFAULT NULL,
statid VARCHAR2 DEFAULT NULL,
statown VARCHAR2 DEFAULT NULL,
no_invalidate BOOLEAN DEFAULT FALSE,
stattype VARCHAR2 DEFAULT 'DATA',
force BOOLEAN DEFAULT FALSE
);
The parameters are described as follows:
ownname: The username. If the username is set to NULL, the current login username is used by default.tabname: The name of the table.partname: The name of the partition. The default value is NULL.estimate_percent: The percentage of data used to calculate its distribution characteristics. The value range is [0.000001,100]. If set toNULL, all data is used. The default value isAUTO_SAMPLE_SIZE, which indicates that the optimizer determines the percentage of data used for calculation. If there are no special requirements, you can also use the default value.block_sample: Specifies whether to use block sampling instead of row sampling. The default value isFALSE.method_opt: The statistics collection method at the column level. You can use the following syntax to set this parameter:method_opt: FOR ALL [INDEXED | HIDDEN] COLUMNS [size_clause] | FOR COLUMNS [size clause] column [size_clause] [,column [size_clause]...] size_clause: SIZE integer | SIZE REPEAT | SIZE AUTO | SIZE SKEWONLY column: column_name | (column_name [, column_name])integer: The number of histogram buckets for the column. The value range is [1,2048].REPEAT: Specifies to collect histograms only for columns whose histograms have been collected. The previous number of histogram buckets set for collection is used.AUTO: Indicates that the OceanBase Database optimizer determines whether to collect column histograms based on column usage. The default number of histogram buckets is 254.SKEWONLY: Specifies to collect histograms only for columns with uneven data distribution. The default number of histogram buckets is 254.
degree: The degree of parallelism (DOP) in statistics collection. The default value isNULL. If not set, the default DOP configured by perfs is used, which is 1.granularity: The granularity in statistics collection. Valid values:'GLOBAL': Collects global statistics.'PARTITION': Collects partition-level statistics.'SUBPARTITION': Collects subpartition-level statistics.'ALL': Collects global, partition-level, and subpartition-level statistics.'AUTO': Uses the default method to collect statistics at all levels, including GLOBAL, PARTITION, and SUBPARTITION.'AUTO'is the default value.'DEFAULT': Collects global and partition-level statistics.'GLOBAL AND PARTITION': Collects global and partition-level statistics.'APPROX_GLOBAL AND PARTITION': Collects partition-level statistics and deduces global statistics based on the partition-level statistics.
cascade: Specifies whether to collect index statistics on the table at the same time. The default value isTRUE.stattab: This parameter is not available for use currently.statid: This parameter is not available for use currently.statown: This parameter is not available for use currently.no_invalidate: This parameter is not available for use currently.stattype: This parameter is not available for use currently.force: Specifies whether to perform forced collection and ignore the lock status. The default value isFALSE.
Examples
Example 1: Collect global statistics on the T1 table under the TEST user, with the number of buckets being 128 for all columns.
call dbms_stats.gather_table_stats('TEST', 'T1', granularity=>'GLOBAL', method_opt=>'FOR ALL COLUMNS SIZE 128');
Example 2: Collect partition-level statistics on the T_PART1 table under the TEST user, with a DOP of 64. Only histograms of columns with uneven data distribution are collected.
call dbms_stats.gather_table_stats('TEST', 'T_PART1', degree=>64, granularity=>'PARTITION', method_opt=>'FOR ALL COLUMNS SIZE SKEWONLY');
Example 3: Collect all statistics on the T_SUBPART1 table under the TEST user, with a DOP of 128. Only 50% of data is collected, with histograms of columns determined by the optimizer.
call dbms_stats.gather_table_stats('TEST', 'T_SUBPART1', degree=>128, estimate_percent=> '50', granularity=>'ALL', method_opt=>'FOR ALL COLUMNS SIZE AUTO');
Example 4: Collect statistics on all tables under the TEST user, with a DOP of 128.
call dbms_stats.gather_schema_stats('TEST', degree=>128);
Use the ANALYZE statement to collect statistics
You can execute the ANALYZE statement to collect statistics in both the Oracle-compatible mode and MySQL-compatible mode of OceanBase Database. The syntax is described as follows:
analyze_stmt:
ANALYZE TABLE table_name [use_partition] analyze_statistics_clause
use_partition:
PARTITION (partition_name [,partition_name,....])
| SUBPARTITION(subpartition_name, [,subpartition_name,...])
analyze_statistics_clause:
COMPUTE STATISTICS [analyze_for_clause]
| ESTIMATE STATISTICS [analyze_for_clause] [SAMPLE INTNUM {ROWS | PERCENTAGE}]
analyze_for_clause:
FOR TABLE
| FOR ALL [INDEXED | HIDDEN] COLUMNS [size_clause]
| FOR COLUMNS [size clause] column [size_clause] [,column [size_clause]...]
size_clause:
SIZE integer
| SIZE REPEAT
| SIZE AUTO
| SIZE SKEWONLY
column:
column_name
| (column_name [, column_name])
Examples
Example 1: In Oracle-compatible mode, collect statistics on the T1 table under the TEST user, with the number of buckets being 128 for all columns.
obclient> ANALYZE TABLE T1 COMPUTE STATISTICS FOR ALL COLUMNS SIZE 128;
Example 2: In Oracle-compatible mode, collect global statistics on the T_PART1 table under the TEST user. Only histograms of columns with uneven data distribution are collected.
obclient> ANALYZE TABLE T_PART1 PARTITION('T_PART1') COMPUTE STATISTICS FOR ALL COLUMNS SIZE skewonly;
Example 3: In Oracle-compatible mode, collect all statistics on partitions p0sp0 and p1ps2 of the T_SUBPART1 table under the TEST user, with histograms of columns determined by the optimizer.
obclient> ANALYZE TABLE T_SUBPART1 SUBPARTITION('p0sp0','p1ps2') COMPUTE STATISTICS FOR ALL COLUMNS SIZE auto;
Note that in MySQL-compatible mode, before you execute the preceding ANALYZE syntax, set the enable_sql_extension parameter to TRUE. This is because native MySQL does not support the preceding syntax, and you can execute it only after you enable SQL extension.
Example 4: In MySQL-compatible mode, use the preceding syntax to collect statistics on the T1 table under the TEST user, with the number of buckets set to 128 for all columns.
obclient> ALTER SYSTEM SET enable_sql_extension = true;
obclient> ANALYZE TABLE T1 COMPUTE STATISTICS FOR ALL COLUMNS SIZE 128;
The OceanBase Database optimizer now supports the native MySQL ANALYZE syntax as well, which is detailed below:
analyze_stmt:
ANALYZE TABLE table_name UPDATE HISTOGRAM ON column_name_list WITH INTNUM BUCKETS
Summary
In addition to the preceding two procedures that are commonly used for statistics collection, the DBMS_STATS package also provides the GATHER_INDEX_STATS procedure for collecting index statistics, and the GATHER_DATABASE_STATS_JOB_PROC procedure for collecting statistics on all tables of all databases in a tenant.