At present, the OceanBase Database optimizer provides two options for manually collecting statistics: the DBMS_STATS package and the ANALYZE command line. We recommend that you use the DBMS_STATS package for manual collection, as it offers more features.
Use the DBMS_STATS package to collect statistics
In OceanBase Database V4.0, you can use the DBMS_STATS package for manual statistics collection in both the Oracle mode and MySQL mode. This package includes two commonly used procedures: GATHER_TABLE_STATS for collecting statistics on a specific table, and GATHER_SCHEMA_STATS for gathering statistics across all tables in a particular database. Basic definitions of these two procedures are described as follows:
Note
For more information about the DBMS_STATS package, see Overview (MySQL mode) and Overview (Oracle mode).
PROCEDURE gather_table_stats (
ownname VARCHAR2,
tabname VARCHAR2,
partname VARCHAR2 DEFAULT NULL,
estimate_percent NUMBER DEFAULT AUTO_SAMPLE_SIZE,
block_sample BOOLEAN DEFAULT FALSE,
method_opt VARCHAR2 DEFAULT DEFAULT_METHOD_OPT,
degree NUMBER DEFAULT NULL,
granularity VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
cascade BOOLEAN DEFAULT NULL,
stattab VARCHAR2 DEFAULT NULL,
statid VARCHAR2 DEFAULT NULL,
statown VARCHAR2 DEFAULT NULL,
no_invalidate BOOLEAN DEFAULT FALSE,
stattype VARCHAR2 DEFAULT 'DATA',
force BOOLEAN DEFAULT FALSE
);
PROCEDURE gather_schema_stats (
ownname VARCHAR2,
estimate_percent NUMBER DEFAULT AUTO_SAMPLE_SIZE,
block_sample BOOLEAN DEFAULT FALSE,
method_opt VARCHAR2 DEFAULT DEFAULT_METHOD_OPT,
degree NUMBER DEFAULT NULL,
granularity VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
cascade BOOLEAN DEFAULT NULL,
stattab VARCHAR2 DEFAULT NULL,
statid VARCHAR2 DEFAULT NULL,
statown VARCHAR2 DEFAULT NULL,
no_invalidate BOOLEAN DEFAULT FALSE,
stattype VARCHAR2 DEFAULT 'DATA',
force BOOLEAN DEFAULT FALSE
);
The parameters are described as follows:
ownname: The username. If the username is set to NULL, the current login username is used by default.tabname: The name of the table.partname: The name of the partition. The default value is NULL.estimate_percent: The percentage of data used to calculate its distribution characteristics. The value range is [0.000001,100]. If set toNULL, all data is used. The default value isAUTO_SAMPLE_SIZE, which indicates that the optimizer determines the percentage of data used for calculation. If there are no special requirements, you can also use the default value.block_sample: Specifies whether to use block sampling instead of row sampling. The default value isFALSE.method_opt: The statistics collection method at the column level. You can use the following syntax to set this parameter:method_opt: FOR ALL [INDEXED | HIDDEN] COLUMNS [size_clause] | FOR COLUMNS [size clause] column [size_clause] [,column [size_clause]...] size_clause: SIZE integer | SIZE REPEAT | SIZE AUTO | SIZE SKEWONLY column: column_name | (column_name [, column_name])integer: The number of histogram buckets for the column. The value range is [1,2048].REPEAT: Specifies to collect histograms only for columns whose histograms have been collected. The previous number of histogram buckets set for collection is used.AUTO: Indicates that the OceanBase Database optimizer determines whether to collect column histograms based on column usage. The default number of histogram buckets is 254.SKEWONLY: Specifies to collect histograms only for columns with uneven data distribution. The default number of histogram buckets is 254.
degree: The degree of parallelism (DOP) in statistics collection. The default value isNULL. If not set, the default DOP configured by perfs is used, which is 1.granularity: The granularity in statistics collection. Valid values:'GLOBAL': Collects global statistics.'PARTITION': Collects partition-level statistics.'SUBPARTITION': Collects subpartition-level statistics.'ALL': Collects global, partition-level, and subpartition-level statistics.'AUTO': Uses the default method to collect statistics at all levels, including GLOBAL, PARTITION, and SUBPARTITION.'AUTO'is the default value.'DEFAULT': Collects global and partition-level statistics.'GLOBAL AND PARTITION': Collects global and partition-level statistics.'APPROX_GLOBAL AND PARTITION': Collects partition-level statistics and deduces global statistics based on the partition-level statistics.
cascade: Specifies whether to collect index statistics on the table at the same time. The default value isTRUE.stattab: This parameter is not available for use currently.statid: This parameter is not available for use currently.statown: This parameter is not available for use currently.no_invalidate: This parameter is not available for use currently.stattype: This parameter is not available for use currently.force: Specifies whether to perform forced collection and ignore the lock status. The default value isFALSE.
Examples
Example 1: Collect global statistics on the T1 table under the TEST user, with the number of buckets being 128 for all columns.
call dbms_stats.gather_table_stats('TEST', 'T1', granularity=>'GLOBAL', method_opt=>'FOR ALL COLUMNS SIZE 128');
Example 2: Collect partition-level statistics on the T_PART1 table under the TEST user, with a DOP of 64. Only histograms of columns with uneven data distribution are collected.
call dbms_stats.gather_table_stats('TEST', 'T_PART1', degree=>64, granularity=>'PARTITION', method_opt=>'FOR ALL COLUMNS SIZE SKEWONLY');
Example 3: Collect all statistics on the T_SUBPART1 table under the TEST user, with a DOP of 128. Only 50% of data is collected, with histograms of columns determined by the optimizer.
call dbms_stats.gather_table_stats('TEST', 'T_SUBPART1', degree=>128, estimate_percent=> '50', granularity=>'ALL', method_opt=>'FOR ALL COLUMNS SIZE AUTO');
Example 4: Collect statistics on all tables under the TEST user, with a DOP of 128.
call dbms_stats.gather_schema_stats('TEST', degree=>128);
Use the ANALYZE statement to collect statistics
You can execute the ANALYZE statement to collect statistics in both the Oracle mode and MySQL mode of OceanBase Database. The syntax is described as follows:
analyze_stmt:
ANALYZE TABLE table_name [use_partition] analyze_statistics_clause
use_partition:
PARTITION (partition_name [,partition_name,....])
| SUBPARTITION(subpartition_name, [,subpartition_name,...])
analyze_statistics_clause:
COMPUTE STATISTICS [analyze_for_clause]
| ESTIMATE STATISTICS [analyze_for_clause] [SAMPLE INTNUM {ROWS | PERCENTAGE}]
analyze_for_clause:
FOR TABLE
| FOR ALL [INDEXED | HIDDEN] COLUMNS [size_clause]
| FOR COLUMNS [size clause] column [size_clause] [,column [size_clause]...]
size_clause:
SIZE integer
| SIZE REPEAT
| SIZE AUTO
| SIZE SKEWONLY
column:
column_name
| (column_name [, column_name])
Examples
Example 1: In Oracle mode, collect statistics on the T1 table under the TEST user, with the number of buckets being 128 for all columns.
obclient> ANALYZE TABLE T1 COMPUTE STATISTICS FOR ALL COLUMNS SIZE 128;
Example 2: In Oracle mode, collect global statistics on the T_PART1 table under the TEST user. Only histograms of columns with uneven data distribution are collected.
obclient> ANALYZE TABLE T_PART1 PARTITION('T_PART1') COMPUTE STATISTICS FOR ALL COLUMNS SIZE skewonly;
Example 3: In Oracle mode, collect all statistics on partitions p0sp0 and p1ps2 of the T_SUBPART1 table under the TEST user, with histograms of columns determined by the optimizer.
obclient> ANALYZE TABLE T_SUBPART1 SUBPARTITION('p0sp0','p1ps2') COMPUTE STATISTICS FOR ALL COLUMNS SIZE auto;
Note that in MySQL mode, before you execute the preceding ANALYZE syntax, set the enable_sql_extension parameter to TRUE. This is because native MySQL does not support the preceding syntax, and you can execute it only after you enable SQL extension.
Example 4: In MySQL mode, use the preceding syntax to collect statistics on the T1 table under the TEST user, with the number of buckets set to 128 for all columns.
obclient> ALTER SYSTEM SET enable_sql_extension = true;
obclient> ANALYZE TABLE T1 COMPUTE STATISTICS FOR ALL COLUMNS SIZE 128;
The OceanBase Database optimizer now supports the native MySQL ANALYZE syntax as well, which is detailed below:
analyze_stmt:
ANALYZE TABLE table_name UPDATE HISTOGRAM ON column_name_list WITH INTNUM BUCKETS
Summary
In addition to the preceding two procedures that are commonly used for statistics collection, the DBMS_STATS package also provides the GATHER_INDEX_STATS procedure for collecting index statistics, and the GATHER_DATABASE_STATS_JOB_PROC procedure for collecting statistics on all tables of all databases in a tenant.