Manual statistics collection|V4.1.0| docs|Distributed Database

Manual statistics collection

Last Updated：2023-08-01 06:02:28 Updated

The OceanBase Database optimizer collects statistics by using the DBMS_STATS package or the ANALYZE statement.

Collect statistics by using the DBMS_STATS package

The OceanBase Database optimizer can use the DBMS_STATS package to collect table-level statistics, schema-level statistics, and index statistics by respectively calling stored procedures gather_table_stats, gather_schema_stats, and gather_index_stats.

PROCEDURE gather_table_stats (
  ownname            VARCHAR2,
  tabname            VARCHAR2,
  partname           VARCHAR2 DEFAULT NULL,
  estimate_percent   NUMBER DEFAULT AUTO_SAMPLE_SIZE,
  block_sample       BOOLEAN DEFAULT FALSE,
  method_opt         VARCHAR2 DEFAULT DEFAULT_METHOD_OPT,
  degree             NUMBER DEFAULT NULL,
  granularity        VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
  cascade            BOOLEAN DEFAULT NULL,
  stattab            VARCHAR2 DEFAULT NULL,
  statid             VARCHAR2 DEFAULT NULL,
  statown            VARCHAR2 DEFAULT NULL,
  no_invalidate      BOOLEAN DEFAULT FALSE,
  stattype           VARCHAR2 DEFAULT 'DATA',
  force              BOOLEAN DEFAULT FALSE
);

PROCEDURE gather_schema_stats (
  ownname            VARCHAR2,
  estimate_percent   NUMBER DEFAULT AUTO_SAMPLE_SIZE,
  block_sample       BOOLEAN DEFAULT FALSE,
  method_opt         VARCHAR2 DEFAULT DEFAULT_METHOD_OPT,
  degree             NUMBER DEFAULT NULL,
  granularity        VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
  cascade            BOOLEAN DEFAULT NULL,
  stattab            VARCHAR2 DEFAULT NULL,
  statid             VARCHAR2 DEFAULT NULL,
  statown            VARCHAR2 DEFAULT NULL,
  no_invalidate      BOOLEAN DEFAULT FALSE,
  stattype           VARCHAR2 DEFAULT 'DATA',
  force              BOOLEAN DEFAULT FALSE
);

PROCEDURE gather_index_stats (
  ownname            VARCHAR2,
  indname            VARCHAR2,
  partname           VARCHAR2 DEFAULT NULL,
  estimate_percent   NUMBER DEFAULT AUTO_SAMPLE_SIZE,
  stattab            VARCHAR2 DEFAULT NULL,
  statid             VARCHAR2 DEFAULT NULL,
  statown            VARCHAR2 DEFAULT NULL,
  degree             NUMBER DEFAULT NULL,
  granularity        VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
  no_invalidate      BOOLEAN DEFAULT FALSE,
  force              BOOLEAN DEFAULT FALSE,
  tabname            VARCHAR2  DEFAULT NULL
);

The following table describes the parameters.

Parameter	Description
ownname	Oracle mode: The username. MySQL mode: The name of the database to which the table belongs. If the username is set to `NULL`, the current logon username is used by default.
tabname	The name of the table.
indname	The name of the index.
partname	The name of the partition. Default value: `NULL`.
estimate_percent	The percentage of data to be used to compute distribution features. The value range is [0.000001,100]. If you specify this parameter to `NULL`, all data is used. The default value is `AUTO_SAMPLE_SIZE`, which indicates that the optimizer determines the percentage of data used for computation.
block_sample	Specifies whether to use block sampling instead of row sampling. Default value: `FALSE`.
method_opt	The statistics collection method at the column level. For more information, see method_opt.
degree	The degree of parallelism in statistics collection. Default value: `NULL`.
granularity	The granularity in statistics collection. Valid values: `'GLOBAL'`: collects global-level statistics. `'PARTITION'`: collects partition-level statistics. `'SUBPARTITION'`: collects subpartition-level statistics. `'ALL'`: collects statistics at all levels, including `GLOBAL`, `PARTITION`, and `SUBPARTITION`. `'AUTO'`: uses the default method to collect statistics at all levels, including `GLOBAL`, `PARTITION`, and `SUBPARTITION`. If `granularity` is not specified, the default value `'AUTO'` is used. `'DEFAULT'`: collects `GLOBAL`-level and `PARTITION`-level statistics. `'GLOBAL AND PARTITION'`: collects `GLOBAL`-level and `PARTITION`-level statistics. `'APPROX_GLOBAL AND PARTITION'`: collects partition-level statistics and deduces global-level statistics based on the partition-level statistics.
cascade	Specifies whether to collect index statistics on the table at the same time. Default value: `TRUE`.
stattab	This parameter is not available.
statid	This parameter is not available.
statown	This parameter is not available.
no_invalidate	This parameter is not available.
stattype	This parameter is not available.
force	Specifies whether to collect statistics by force and ignore the lock status. Default value: `FALSE`. This parameter can be executed only in the `BEGIN ... END` clause of a PL statement.

method_opt

method_opt can be set in the following syntax:

method_opt:
  FOR ALL [INDEXED | HIDDEN] COLUMNS [size_clause]
| FOR COLUMNS [size clause] column [size_clause] [,column [size_clause]...]

size_clause:
  SIZE integer
| SIZE REPEAT
| SIZE AUTO
| SIZE SKEWONLY

column:
   column_name
| (column_name [, column_name])

The following table describes the parameters.

Parameter	Description
SIZE integer	The number of histogram buckets for the column. Value range: [1, 2048].
REPEAT	Histograms are collected only for columns whose histograms have been collected. Use the previous number of histogram buckets set for collection.
AUTO	The OceanBase Database optimizer determines whether to collect column histograms based on column usage. The default number of histogram buckets is 254.
SKEWONLY	Histograms are collected only for columns with uneven data distribution. The default number of histogram buckets is 254.

In addition, the DBMS_STATS package also provides GATHER_DATABASE_STATS_JOB_PROC to collect statistics on all tables in the entire database. The default preference settings (prefs) are used for all collection strategies. For more information about how to set preferences, see Collection strategy prefs management.

PROCEDURE GATHER_DATABASE_STATS_JOB_PROC();

Examples

The following examples use the DBMS_STATS package to collect statistics:

Collect global-level statistics of table tbl1 under user user, with the number of buckets being 128 for all columns.

CALL dbms_stats.gather_table_stats('user', 'tbl1', granularity=>'GLOBAL', method_opt=>'FOR ALL COLUMNS SIZE 128');

Collect partition-level statistics for table t_part1 under user user, with a DOP of 64. Only histograms of columns with uneven data distribution are collected.
```
CALL dbms_stats.gather_table_stats('user', 't_part1', degree=>'64', granularity=>'PARTITION', method_opt=>'FOR ALL COLUMNS SIZE SKEWONLY');
```
Collect all statistics for table t_subpart1 under user user, with a DOP of 128. Only 50% of data is collected, with histograms of columns determined by the optimizer.
```
CALL dbms_stats.gather_table_stats('user', 't_subpart1', degree=>'128', estimate_percent=> '50', granularity=>'ALL', method_opt=>'FOR ALL COLUMNS SIZE AUTO');
```

GRANULARITY association between table-level and index statistics collection strategies

By default, all index statistics on a table are collected along with the table statistics. Some statistics, including the number of rows and the average row length (which is calculated based on the corresponding columns), on an index table can be obtained from the statistics on the base table. This accelerates statistics collection and also avoids repeated whole table data scanning. The following table lists the differences in GRANULARITY between local indexes and global indexes.

Specified partition	GRANULARITY	Global partitioned indexes	Local index
No	ALL	GLOBAL	ALL
No	AUTO	GLOBAL	AUTO
No	DEFAULT	GLOBAL	DEFAULT
No	GLOBAL AND PARTITION	GLOBAL	GLOBAL AND PARTITION
No	APPROX_GLOBAL AND PARTITION	GLOBAL	APPROX_GLOBAL AND PARTITION
No	Global	GLOBAL	Global
No	PARTITION	GLOBAL	PARTITION
No	SUBPARTITION	Not collected	SUBPARTITION
Yes	ALL	GLOBAL	ALL
Yes	AUTO	GLOBAL	AUTO
Yes	DEFAULT	GLOBAL	DEFAULT
Yes	GLOBAL AND PARTITION	GLOBAL	GLOBAL AND PARTITION
Yes	APPROX_GLOBAL AND PARTITION	Not collected	APPROX_GLOBAL AND PARTITION
Yes	Global	GLOBAL	Global
Yes	PARTITION	Not collected	PARTITION
Yes	SUBPARTITION	Not collected	SUBPARTITION

Use the ANALYZE statement to collect statistics

OceanBase Database allows you to use the ANALYZE statement to collect statistics in both Oracle and MySQL modes.

Syntax of the ANALYZE statement in Oracle mode

The syntax of the ANALYZE statement in Oracle mode is as follows:

analyze_stmt:
ANALYZE TABLE table_name [use_partition] analyze_statistics_clause

use_partition:
  PARTITION (parition_name [,partition_name...])
| SUBPARTITION(subpartition_name [,subpartition_name...])

analyze_statistics_clause:
  COMPUTE STATISTICS [analyze_for_clause]
| ESTIMATE STATISTICS [analyze_for_clause] [SAMPLE INTNUM {ROWS | PERCENTAGE}]

analyze_for_clause:
  FOR TABLE
| FOR ALL [INDEXED | HIDDEN] COLUMNS [size_clause]
| FOR COLUMNS [size_clause] column [size_clause] [,column [size_clause]...]

size_clause:
  SIZE integer
| SIZE REPEAT
| SIZE AUTO
| SIZE SKEWONLY

column:
  column_name
| (column_name [,column_name])

The column histogram collection method in the preceding syntax is the same as the syntax of method_opt in the DBMS_STATS package. For more information, see method_opt.

The ANALYZE statement does not provide as many strategy settings as the DBMS_STATS package does. To implement statistics collection at the GLOBAL level only in an ANALYZE statement, you can set partition_name in the use_partition syntax to the table name.

The following examples collect statistics using the ANALYZE statement in Oracle mode:

Collect statistics of table tbl1 under user user, with the number of buckets being 128 for all columns.
```
ANALYZE TABLE tbl1 COMPUTE STATISTICS FOR ALL COLUMNS SIZE 128;
```
Collect GLOBAL-level statistics for table t_part1 under user user. Only histograms of columns with uneven data distribution are collected.
```
ANALYZE TABLE t_part1 PARTITION (t_part1) COMPUTE STATISTICS FOR ALL COLUMNS SIZE SKEWONLY;
```
Collect statistics for partitions p0sp0 and p1sp2 in table t_subpart1 under user user, with histograms of columns determined by the optimizer.
```
ANALYZE TABLE t_subpart1 SUBPARTITION(p0sp0,p1sp2) COMPUTE STATISTICS FOR ALL COLUMNS SIZE AUTO;
```

Syntax of the ANALYZE statement in MySQL mode

The syntax of the ANALYZE statement in MySQL mode is as follows:

analyze_stmt:
ANALYZE TABLE table_name UPDATE HISTOGRAM ON column_name_list WITH INTNUM BUCKETS

When the session variable enable_sql_extension is TRUE, you can use the syntax in Oracle mode, which is the extended syntax for OceanBase Database in MySQL mode.

analyze_stmt:
ANALYZE TABLE table_name [use_partition] analyze_statistics_clause

The following examples collect statistics using the ANALYZE statement in MySQL mode:

Collect statistics of table tbl1, with the number of buckets being 30 for columns.
```
ANALYZE TABLE tbl1 UPDATE HISTOGRAM ON a, b, c, d WITH 30 BUCKETS;
```
Collect statistics of table tbl1 under user test in MySQL mode by using the syntax in Oracle mode, with the number of buckets being 128 for all columns.
```
ALTER SYSTEM SET ENABLE_SQL_EXTENSION = TRUE;
ANALYZE TABLE tbl1 COMPUTE STATISTICS FOR ALL COLUMNS SIZE 128;
```