Manually collect statistics|V4.4.2|OceanBase Database| docs|Distributed Database

The OceanBase Database optimizer mainly collects statistics manually by using the DBMS_STATS package and the ANALYZE statement.

Note

OceanBase Database does not support estimating the time required to collect statistics. We recommend that you set the ob_query_timeout parameter to a sufficiently large value to complete the statistics collection. In particular, if the statistics collection for schema-level statistics is interrupted due to a timeout, the statistics that have been collected will be retained.

Usage notes and limitations

We recommend that you increase the ob_query_timeout variable at the session level. This setting only affects the current session and does not interfere with other operations.

Procedure

Step	Action	Command/View
Prepare	Increase `ob_query_timeout`	`SET SESSION ob_query_timeout = 36000000000;`
Execute	Call `DBMS_STATS` or `ANALYZE`	`CALL dbms_stats.gather_table_stats(...);`
Monitor	View the progress and status of the collection	`(G)V$OB_OPT_STAT_GATHER_MONITOR`
Verify	Check whether the statistics are written or expired	`DBA_TAB_STATISTICS`、`DBA_TAB_COL_STATISTICS` (and histogram-related views)
Troubleshoot	Handle failures or exceptions	`DBA_OB_TASK_OPT_STAT_GATHER_HISTORY`、`DBA_OB_TABLE_OPT_STAT_GATHER_HISTORY` + Plan cache (including refresh and eviction)

Collect statistics using the DBMS_STATS package

OceanBase Database supports collecting table-level, schema-level, and index-level statistics by using the DBMS_STATS (MySQL mode)/DBMS_STATS (Oracle mode) package. You can call the gather_table_stats, gather_schema_stats, and gather_index_stats stored procedures respectively.

MySQL mode

Oracle mode

PROCEDURE gather_table_stats (
  ownname            VARCHAR2,
  tabname            VARCHAR2,
  partname           VARCHAR2 DEFAULT NULL,
  estimate_percent   NUMBER DEFAULT AUTO_SAMPLE_SIZE,
  block_sample       BOOLEAN DEFAULT FALSE,
  method_opt         VARCHAR2 DEFAULT DEFAULT_METHOD_OPT,
  degree             NUMBER DEFAULT NULL,
  granularity        VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
  cascade            BOOLEAN DEFAULT NULL,
  stattab            VARCHAR2 DEFAULT NULL,
  statid             VARCHAR2 DEFAULT NULL,
  statown            VARCHAR2 DEFAULT NULL,
  no_invalidate      BOOLEAN DEFAULT FALSE,
  stattype           VARCHAR2 DEFAULT 'DATA',
  force              BOOLEAN DEFAULT FALSE
);

PROCEDURE gather_schema_stats (
  ownname            VARCHAR2,
  estimate_percent   NUMBER DEFAULT AUTO_SAMPLE_SIZE,
  block_sample       BOOLEAN DEFAULT FALSE,
  method_opt         VARCHAR2 DEFAULT DEFAULT_METHOD_OPT,
  degree             NUMBER DEFAULT NULL,
  granularity        VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
  cascade            BOOLEAN DEFAULT NULL,
  stattab            VARCHAR2 DEFAULT NULL,
  statid             VARCHAR2 DEFAULT NULL,
  statown            VARCHAR2 DEFAULT NULL,
  no_invalidate      BOOLEAN DEFAULT FALSE,
  stattype           VARCHAR2 DEFAULT 'DATA',
  force              BOOLEAN DEFAULT FALSE
);

PROCEDURE gather_index_stats (
  ownname            VARCHAR2,
  indname            VARCHAR2,
  partname           VARCHAR2 DEFAULT NULL,
  estimate_percent   NUMBER DEFAULT AUTO_SAMPLE_SIZE,
  stattab            VARCHAR2 DEFAULT NULL,
  statid             VARCHAR2 DEFAULT NULL,
  statown            VARCHAR2 DEFAULT NULL,
  degree             NUMBER DEFAULT NULL,
  granularity        VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
  no_invalidate      BOOLEAN DEFAULT FALSE,
  force              BOOLEAN DEFAULT FALSE,
  tabname            VARCHAR2  DEFAULT NULL
);

DBMS_STATS.GATHER_TABLE_STATS (
ownname             VARCHAR2,
tabname             VARCHAR2,
partname            VARCHAR2 DEFAULT NULL,
estimate_percent    NUMBER DEFAULT AUTO_SAMPLE_SIZE,
block_sample        BOOLEAN  DEFAULT FALSE,
method_opt          VARCHAR2 DEFAULT DEFAULT_METHOD_OPT,
degree              NUMBER   DEFAULT to_degree_type(get_param('DEGREE')),
granularity         VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
cascade             BOOLEAN DEFAULT NULL,
no_invalidate       BOOLEAN DEFAULT FALSE,
stattype            VARCHAR2 DEFAULT 'DATA',
force               BOOLEAN DEFAULT FALSE,
hist_est_percent    NUMBER DEFAULT AUTO_SAMPLE_SIZE,
hist_block_sample   BOOLEAN DEFAULT NULL
);

DBMS_STATS.GATHER_SCHEMA_STATS (
ownname            VARCHAR2,
estimate_percent   NUMBER DEFAULT AUTO_SAMPLE_SIZE,
block_sample       BOOLEAN DEFAULT FALSE,
method_opt         VARCHAR2 DEFAULT DEFAULT_METHOD_OPT,
degree             NUMBER DEFAULT NULL,
granularity        VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
cascade            BOOLEAN DEFAULT NULL,
no_invalidate      BOOLEAN DEFAULT FALSE,
stattype           VARCHAR2 DEFAULT 'DATA',
force              BOOLEAN DEFAULT FALSE);

DBMS_STATS.GATHER_INDEX_STATS (
ownname            VARCHAR2,
indname            VARCHAR2,
partname           VARCHAR2 DEFAULT NULL,
estimate_percent   NUMBER DEFAULT AUTO_SAMPLE_SIZE,
stattab            VARCHAR2 DEFAULT NULL,
statown            VARCHAR2 DEFAULT NULL,
degree             NUMBER DEFAULT NULL,
granularity        VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
no_invalidate      BOOLEAN DEFAULT FALSE,
force              BOOLEAN DEFAULT FALSE,
tabname            VARCHAR2 DEFAULT NULL);

For more information about the parameters, see:

The DBMS_STATS package provides a default configuration for Prefs. For more information about how to set Prefs, see Manage collection policy configuration items (Prefs).

PROCEDURE GATHER_DATABASE_STATS_JOB_PROC();

Examples

The following examples show how to collect statistics by using the DBMS_STATS package:

Collect global-level statistics for the tbl1 table of the user user.

CALL dbms_stats.gather_table_stats('user', 'tbl1', granularity=>'GLOBAL', method_opt=>'FOR ALL COLUMNS SIZE AUTO');

Collect partition-level statistics for the t_part1 table of the user user. The parallelism is 64, and histograms are collected only for columns with uneven data distribution.
```
CALL dbms_stats.gather_table_stats('user', 't_part1', degree=>'64', granularity=>'PARTITION', method_opt=>'FOR ALL COLUMNS SIZE SKEWONLY');
```
Collect all statistics for the t_subpart1 table of the user user. The parallelism is 128, and 50% of the data is collected. The optimizer decides whether to collect histograms for all columns.
```
CALL dbms_stats.gather_table_stats('user', 't_subpart1', degree=>'128', estimate_percent=> '50', granularity=>'ALL', method_opt=>'FOR ALL COLUMNS SIZE AUTO');
```

After running gather_table_stats, you can query (G)V$OB_OPT_STAT_GATHER_MONITOR to confirm whether the task has started.

GRANULARITY and Table / Index statistics collection

By default, when you collect table-level statistics, all index-level statistics on the table are also collected. This is because some index-level statistics can be obtained from the base table's statistics, such as the number of rows and average row length (calculated from several columns). This can accelerate the statistics collection and avoid a full table scan. The GRANULARITY parameter differs between global and local indexes, as shown in the following table.

Partition specified	GRANULARITY	Global index	Local index
No	ALL	GLOBAL	ALL
No	AUTO	GLOBAL	AUTO
No	DEFAULT	GLOBAL	DEFAULT
No	GLOBAL AND PARTITION	GLOBAL	GLOBAL AND PARTITION
No	APPROX_GLOBAL AND PARTITION	GLOBAL	APPROX_GLOBAL AND PARTITION
No	GLOBAL	GLOBAL	GLOBAL
No	PARTITION	GLOBAL	PARTITION
No	SUBPARTITION	None	SUBPARTITION
Yes	ALL	GLOBAL	ALL
Yes	AUTO	GLOBAL	AUTO
Yes	DEFAULT	GLOBAL	DEFAULT
Yes	GLOBAL AND PARTITION	GLOBAL	GLOBAL AND PARTITION
Yes	APPROX_GLOBAL AND PARTITION	None	APPROX_GLOBAL AND PARTITION
Yes	GLOBAL	GLOBAL	GLOBAL
Yes	PARTITION	None	PARTITION
Yes	SUBPARTITION	None	SUBPARTITION

Collect statistics by using the ANALYZE statement

OceanBase Database supports the ANALYZE statement for collecting statistics in both Oracle mode and MySQL mode.

MySQL mode

Oracle mode

The ANALYZE statement in MySQL mode has the following syntax:

analyze_stmt:
ANALYZE TABLE table_name UPDATE HISTOGRAM ON column_name_list WITH INTNUM BUCKETS

If the tenant-level parameter enable_sql_extension is set to TRUE, you can use the syntax provided for Oracle mode in OceanBase Database for MySQL mode.

analyze_stmt:
ANALYZE TABLE table_name [use_partition] analyze_statistics_clause

Here are some examples of using the ANALYZE statement in MySQL mode to collect statistics:

Collect statistics for the tbl1 table, with 30 buckets for each column.

ANALYZE TABLE tbl1 UPDATE HISTOGRAM ON a, b, c, d WITH 30 BUCKETS;

Use the syntax provided for Oracle mode in MySQL mode to collect statistics for the tbl1 table of the test user.

ALTER SYSTEM SET ENABLE_SQL_EXTENSION = TRUE;
ANALYZE TABLE tbl1 COMPUTE STATISTICS FOR ALL COLUMNS SIZE AUTO;

The ANALYZE statement in Oracle mode has the following syntax:

analyze_stmt:
ANALYZE TABLE table_name [use_partition] analyze_statistics_clause

use_partition:
  PARTITION (parition_name [,partition_name...])
| SUBPARTITION(subpartition_name [,subpartition_name...])

analyze_statistics_clause:
  COMPUTE STATISTICS [analyze_for_clause]
| ESTIMATE STATISTICS [analyze_for_clause] [SAMPLE INTNUM {ROWS | PERCENTAGE}]

analyze_for_clause:
  FOR TABLE
| FOR ALL [INDEXED | HIDDEN] COLUMNS [size_clause]
| FOR COLUMNS [size_clause] column [size_clause] [,column [size_clause]...]

size_clause:
  SIZE integer
| SIZE REPEAT
| SIZE AUTO
| SIZE SKEWONLY

column:
  column_name
| (column_name [,column_name])

In the preceding syntax, the histogram collection method for columns is the same as the method_opt syntax of the DBMS_STATS package. For more information, see method_opt.

Compared with the statistics collection methods of the DBMS_STATS package, the ANALYZE statement provides fewer strategy settings. To support the ANALYZE statement in collecting only GLOBAL statistics, the rule is to set the partition_name parameter to the table name in the use_partition syntax.

Here are some examples of using the ANALYZE statement in Oracle mode to collect statistics:

Collect statistics for the tbl1 table of the user user.

ANALYZE TABLE tbl1 COMPUTE STATISTICS FOR ALL COLUMNS SIZE AUTO;

Collect GLOBAL statistics for the t_part1 table of the user user and collect histograms for columns with uneven data distribution.
```
ANALYZE TABLE t_part1 PARTITION (t_part1) COMPUTE STATISTICS FOR ALL COLUMNS SIZE SKEWONLY;
```
Collect statistics for the p0sp0 and p1sp2 partitions of the t_subpart1 table of the user user. The histograms for all columns are determined internally by the optimizer.
```
ANALYZE TABLE t_subpart1 SUBPARTITION(p0sp0,p1sp2) COMPUTE STATISTICS FOR ALL COLUMNS SIZE AUTO;
```

Practice scenarios for manual statistics collection

Statistics collection for non-partitioned tables

If the product of the number of rows and columns in a table does not exceed 10 million, we recommend that you use the following statements to collect statistics for the table. In the following example, the test.t1 table has 10 columns and contains 1 million rows.

CREATE TABLE test.t1(c1 INT, c2 INT, c3 INT, c4 INT, c5 INT, c6 INT, c7 INT, c8 INT, c9 INT, c10 INT);
CREATE TABLE t2 (c1 INT PRIMARY KEY);
INSERT INTO t2 VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
INSERT /*+APPEND*/ INTO t1 SELECT * FROM (SELECT s0.c1 c1, s1.c1 c2, s2.c1 c3, s3.c1 c4, s4.c1 c5, s5.c1 c6, s0.c1 * 10 c7, s1.c1 * 10 c8, s2.c1 * 10 c9, s3.c1 * 10 c10 FROM t2 s0, t2 s1, t2 s2, t2 s3, t2 s4, t2 s5);

# Scenario where histograms are not collected
CALL DBMS_STATS.GATHER_TABLE_STATS('test', 't1', method_opt=>'for all columns size 1');

# Scenario where histograms are collected and the default strategy is used
CALL DBMS_STATS.GATHER_TABLE_STATS('test', 't1');

If the product of the number of rows and columns in a table exceeds 10 million, we recommend that you set a parallelism level based on the business requirements and system resources to accelerate statistics collection. In the following example, the test.t1 table contains 10 million rows. The parallelism level is set to 8.

# Scenario where histograms are not collected
CALL DBMS_STATS.GATHER_TABLE_STATS('test', 't1', degree=>8, method_opt=>'for all columns size 1');

# Scenario where histograms are collected and the default strategy is used
CALL DBMS_STATS.GATHER_TABLE_STATS('test', 't1', degree=>8);

Statistics collection for partitioned tables

For partitioned tables, you need to consider the collection of partition statistics in addition to the collection of non-partitioned table statistics. If system resources are available, we recommend that you double the parallelism level used for non-partitioned tables for partitioned tables.

In the following example, the test.t_part table has 10 columns and contains 1 million rows. The parallelism level is set to 2 because partition statistics are collected.

CREATE TABLE t_part(c1 INT, c2 INT, c3 INT, c4 INT, c5 INT, c6 INT, c7 INT, c8 INT, c9 INT, c10 INT) PARTITION BY HASH(c1) PARTITIONS 128;
CREATE TABLE t2 (c1 INT PRIMARY KEY);
INSERT INTO t2 VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
INSERT /*+APPEND*/ INTO t_part SELECT * FROM (SELECT s0.c1 c1, s1.c1 c2, s2.c1 c3, s3.c1 c4, s4.c1 c5, s5.c1 c6, s0.c1 * 10 c7, s1.c1 * 10 c8, s2.c1 * 10 c9, s3.c1 * 10 c10 FROM t2 s0, t2 s1, t2 s2, t2 s3, t2 s4, t2 s5);

For partitioned tables, you can also consider the partition derivation method to collect statistics. In this method, you collect partition statistics and derive global statistics based on the partition statistics to accelerate statistics collection. For example, in the same scenario, you can adjust the partition collection method without increasing the parallelism level. Here is an example:

# Scenario where histograms are not collected
CALL DBMS_STATS.GATHER_TABLE_STATS('test', 't_part', granularity=>'APPROX_GLOBAL AND PARTITION', method_opt=>'for all columns size 1');

# Scenario where histograms are collected and the default strategy is used
CALL DBMS_STATS.GATHER_TABLE_STATS('test', 't_part', granularity=>'APPROX_GLOBAL AND PARTITION');

For more information about the collection and update strategies for partitioned tables, see Collection and update strategies for partitioned tables.

Statistics collection at the schema (database) level

In addition to manually collecting statistics for individual tables, OceanBase Database provides the DBMS_STATS package to collect statistics for all tables in a user schema.

Note

This feature is supported only in OceanBase Database V4.x in dual mode and OceanBase Database V3.2.x in Oracle mode.

Because collecting statistics for all tables in a user schema is time-consuming, we recommend that you use this feature during off-peak hours.

If all tables in the user schema are small tables (with a number of rows not exceeding 1 million), you can directly use the following statements to collect statistics for the test user schema.

# Scenario where histograms are not collected
CALL DBMS_STATS.GATHER_SCHEMA_STATS('test', method_opt=>'for all columns size 1');

# Scenario where histograms are collected and the default strategy is used
CALL DBMS_STATS.GATHER_SCHEMA_STATS('test');

If some large tables (with a number of rows in the millions) exist in the user schema, you can increase the parallelism level during off-peak hours to accelerate statistics collection:

# Scenario where histograms are not collected
CALL DBMS_STATS.GATHER_SCHEMA_STATS('test', degree=>'16', method_opt=>'for all columns size 1');

# Scenario where histograms are collected and the default strategy is used
CALL DBMS_STATS.GATHER_SCHEMA_STATS('test', degree=>'16');

If the user schema contains a very large table (with more than 100 million rows), you can separately collect statistics for the very large table with a higher parallelism level, lock the statistics of the very large table, and then use the preceding statements to collect statistics for the entire user schema. After statistics collection is completed, unlock the statistics of the very large table and collect statistics incrementally. Here is an example:

CALL DBMS_STATS.GATHER_TABLE_STATS('test', 'big_table', degree=>128, method_opt=>'for all columns size 1');

CALL DBMS_STATS.LOCK_TABLE_STATS('test','big_table');

CALL DBMS_STATS.GATHER_SCHEMA_STATS('TEST', degree=>'16', method_opt=>'for all columns size 1');

CALL DBMS_STATS.UNLOCK_TABLE_STATS('test','big_table');

After statistics collection, perform a health check:

-- Example: Verify whether statistics for the test.t1 table are collected
SELECT table_name, num_rows, last_analyzed
FROM oceanbase.DBA_TAB_STATISTICS
WHERE owner = 'test' AND table_name = 'T1' AND object_type = 'TABLE';

Monitor the progress and health of statistics collection

Why is monitoring necessary? Statistics collection can take several minutes to several hours. You need to:

Know in real time whether the task is still running
Determine if it is stuck or failed
Verify the validity of the collected results
Evaluate if the statistics are "stale" or "distorted"

Query the progress of statistics collection

OceanBase Database supports statistics collection monitoring and diagnostics starting from V4.2.0. After manually or automatically initiating statistics collection, you can query the real-time collection status and progress from the (G)V$OB_OPT_STAT_GATHER_MONITOR view.

For more information about the fields, examples, and other diagnostic methods of this view, see Monitor and diagnose statistics collection.

View the running statistics collection tasks

Note

In MySQL mode, you typically use oceanbase.GV$OB_OPT_STAT_GATHER_MONITOR (or oceanbase.V$OB_OPT_STAT_GATHER_MONITOR).
In Oracle mode, you typically use SYS.GV$OB_OPT_STAT_GATHER_MONITOR (or SYS.V$OB_OPT_STAT_GATHER_MONITOR).

SELECT
  tenant_id,
  svr_ip,
  svr_port,
  session_id,
  trace_id,
  task_id,
  type,
  task_start_time,
  task_duration_time,
  task_table_count,
  completed_table_count,
  running_table_owner,
  running_table_name,
  running_table_duration_time,
  running_table_progress
FROM oceanbase.GV$OB_OPT_STAT_GATHER_MONITOR
WHERE type = 'MANUAL GATHER';

Field	Description
`TASK_TABLE_COUNT`	The number of tables to be collected by the task
`COMPLETED_TABLE_COUNT`	The number of tables completed by the task
`RUNNING_TABLE_OWNER` / `RUNNING_TABLE_NAME`	The table currently being collected
`RUNNING_TABLE_PROGRESS`	The progress of the table currently being collected (in string format)

If you want to view the real-time status of the current tenant on the current OBServer node, use V$OB_OPT_STAT_GATHER_MONITOR.

View the historical task records (task-level)

-- Query the historical execution of manual collection tasks
SELECT
  tenant_id,
  task_id,
  type,
  status,
  table_count,
  failed_count,
  start_time,
  end_time
FROM oceanbase.DBA_OB_TASK_OPT_STAT_GATHER_HISTORY
WHERE type = 'MANUAL GATHER'
ORDER BY start_time DESC;

View the historical task records (table-level)

-- Query the failed tables in manual collection tasks (no results returned if no failures)
SELECT
  owner,
  table_name,
  task_id,
  status,
  start_time,
  end_time,
  stat_refresh_failed_list,
  properties
FROM oceanbase.DBA_OB_TABLE_OPT_STAT_GATHER_HISTORY
WHERE status = 'FAILED'
ORDER BY start_time DESC;

STAT_REFRESH_FAILED_LIST indicates the list of nodes where statistics collection is completed but the statistics cache refresh fails. When this field appears, it is recommended to check the status of the related nodes based on the view information and refresh the Plan Cache when necessary.

Verify if the statistics are effective or healthy

After the collection is completed, you need to confirm that the statistics have been written and are available.

Check if table-level statistics exist

Note

In MySQL mode, you can use oceanbase.DBA_TAB_STATISTICS and oceanbase.DBA_TAB_COL_STATISTICS.
In Oracle mode, you can use SYS.DBA_TAB_STATISTICS and SYS.DBA_TAB_COL_STATISTICS.

SELECT
  table_name,
  num_rows,        -- Number of rows
  avg_row_len,     -- Average row length
  sample_size,     -- Sample size
  last_analyzed,   -- Last collection time
  stale_stats,     -- Whether the statistics are stale
  stattype_locked  -- Whether the statistics are locked
FROM oceanbase.DBA_TAB_STATISTICS
WHERE owner = 'test' AND table_name = 'T1' AND object_type = 'TABLE';

Health indicators:

last_analyzed is a recent time (consistent with the current collection time)
Key fields such as num_rows and sample_size are non-null and approximately match the business scale
stale_stats = 'NO' (statistics are not stale)

Check if column histograms are generated (if SIZE > 1 is specified)

SELECT
  column_name,
  num_distinct,    -- Number of distinct values
  num_nulls,       -- Number of NULL values
  num_buckets,     -- Number of buckets
  histogram,       -- Histogram type
  last_analyzed
FROM oceanbase.DBA_TAB_COL_STATISTICS
WHERE owner = 'test' AND table_name = 'T1';

Histogram health assessment:

If the histogram field shows the histogram type (e.g., HYBRID) and num_buckets > 1, it typically indicates that the histogram has been generated.
The statistical values such as num_distinct and num_nulls should be consistent with business expectations to avoid distortion caused by insufficient sampling or improper strategies.

Check if statistics are locked (to prevent automatic updates from overwriting them)

SELECT stattype_locked
FROM oceanbase.DBA_TAB_STATISTICS
WHERE owner = 'test' AND table_name = 'BIG_TABLE' AND object_type = 'TABLE';

stattype_locked = 'ALL': Locked
NULL: Not locked

Common issues and diagnostics

Issue	Possible Cause	Solution
The collection task makes no progress for a long time	Large table size, low parallelism, resource contention	Increase the `degree` and execute during off-peak hours
`num_rows = 0` or `NULL`	The collection is incomplete or failed	Check `DBA_OB_TASK_OPT_STAT_GATHER_HISTORY` and `DBA_OB_TABLE_OPT_STAT_GATHER_HISTORY`
Histogram not generated	`method_opt` does not specify `SIZE > 1`	Explicitly set `FOR COLUMNS col SIZE 254`
The query still uses the wrong plan	The statistics have not been refreshed in the Plan Cache	Execute `ALTER SYSTEM FLUSH PLAN CACHE;`

The OceanBase Database optimizer mainly collects statistics manually by using the DBMS_STATS package and the ANALYZE statement.

Note

Usage notes and limitations

We recommend that you increase the ob_query_timeout variable at the session level. This setting only affects the current session and does not interfere with other operations.

Procedure

Step	Action	Command/View
Prepare	Increase `ob_query_timeout`	`SET SESSION ob_query_timeout = 36000000000;`
Execute	Call `DBMS_STATS` or `ANALYZE`	`CALL dbms_stats.gather_table_stats(...);`
Monitor	View the progress and status of the collection	`(G)V$OB_OPT_STAT_GATHER_MONITOR`
Verify	Check whether the statistics are written or expired	`DBA_TAB_STATISTICS`、`DBA_TAB_COL_STATISTICS` (and histogram-related views)
Troubleshoot	Handle failures or exceptions	`DBA_OB_TASK_OPT_STAT_GATHER_HISTORY`、`DBA_OB_TABLE_OPT_STAT_GATHER_HISTORY` + Plan cache (including refresh and eviction)

Collect statistics using the DBMS_STATS package

MySQL mode

Oracle mode

PROCEDURE gather_table_stats (
  ownname            VARCHAR2,
  tabname            VARCHAR2,
  partname           VARCHAR2 DEFAULT NULL,
  estimate_percent   NUMBER DEFAULT AUTO_SAMPLE_SIZE,
  block_sample       BOOLEAN DEFAULT FALSE,
  method_opt         VARCHAR2 DEFAULT DEFAULT_METHOD_OPT,
  degree             NUMBER DEFAULT NULL,
  granularity        VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
  cascade            BOOLEAN DEFAULT NULL,
  stattab            VARCHAR2 DEFAULT NULL,
  statid             VARCHAR2 DEFAULT NULL,
  statown            VARCHAR2 DEFAULT NULL,
  no_invalidate      BOOLEAN DEFAULT FALSE,
  stattype           VARCHAR2 DEFAULT 'DATA',
  force              BOOLEAN DEFAULT FALSE
);

PROCEDURE gather_schema_stats (
  ownname            VARCHAR2,
  estimate_percent   NUMBER DEFAULT AUTO_SAMPLE_SIZE,
  block_sample       BOOLEAN DEFAULT FALSE,
  method_opt         VARCHAR2 DEFAULT DEFAULT_METHOD_OPT,
  degree             NUMBER DEFAULT NULL,
  granularity        VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
  cascade            BOOLEAN DEFAULT NULL,
  stattab            VARCHAR2 DEFAULT NULL,
  statid             VARCHAR2 DEFAULT NULL,
  statown            VARCHAR2 DEFAULT NULL,
  no_invalidate      BOOLEAN DEFAULT FALSE,
  stattype           VARCHAR2 DEFAULT 'DATA',
  force              BOOLEAN DEFAULT FALSE
);

PROCEDURE gather_index_stats (
  ownname            VARCHAR2,
  indname            VARCHAR2,
  partname           VARCHAR2 DEFAULT NULL,
  estimate_percent   NUMBER DEFAULT AUTO_SAMPLE_SIZE,
  stattab            VARCHAR2 DEFAULT NULL,
  statid             VARCHAR2 DEFAULT NULL,
  statown            VARCHAR2 DEFAULT NULL,
  degree             NUMBER DEFAULT NULL,
  granularity        VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
  no_invalidate      BOOLEAN DEFAULT FALSE,
  force              BOOLEAN DEFAULT FALSE,
  tabname            VARCHAR2  DEFAULT NULL
);

DBMS_STATS.GATHER_TABLE_STATS (
ownname             VARCHAR2,
tabname             VARCHAR2,
partname            VARCHAR2 DEFAULT NULL,
estimate_percent    NUMBER DEFAULT AUTO_SAMPLE_SIZE,
block_sample        BOOLEAN  DEFAULT FALSE,
method_opt          VARCHAR2 DEFAULT DEFAULT_METHOD_OPT,
degree              NUMBER   DEFAULT to_degree_type(get_param('DEGREE')),
granularity         VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
cascade             BOOLEAN DEFAULT NULL,
no_invalidate       BOOLEAN DEFAULT FALSE,
stattype            VARCHAR2 DEFAULT 'DATA',
force               BOOLEAN DEFAULT FALSE,
hist_est_percent    NUMBER DEFAULT AUTO_SAMPLE_SIZE,
hist_block_sample   BOOLEAN DEFAULT NULL
);

DBMS_STATS.GATHER_SCHEMA_STATS (
ownname            VARCHAR2,
estimate_percent   NUMBER DEFAULT AUTO_SAMPLE_SIZE,
block_sample       BOOLEAN DEFAULT FALSE,
method_opt         VARCHAR2 DEFAULT DEFAULT_METHOD_OPT,
degree             NUMBER DEFAULT NULL,
granularity        VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
cascade            BOOLEAN DEFAULT NULL,
no_invalidate      BOOLEAN DEFAULT FALSE,
stattype           VARCHAR2 DEFAULT 'DATA',
force              BOOLEAN DEFAULT FALSE);

DBMS_STATS.GATHER_INDEX_STATS (
ownname            VARCHAR2,
indname            VARCHAR2,
partname           VARCHAR2 DEFAULT NULL,
estimate_percent   NUMBER DEFAULT AUTO_SAMPLE_SIZE,
stattab            VARCHAR2 DEFAULT NULL,
statown            VARCHAR2 DEFAULT NULL,
degree             NUMBER DEFAULT NULL,
granularity        VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
no_invalidate      BOOLEAN DEFAULT FALSE,
force              BOOLEAN DEFAULT FALSE,
tabname            VARCHAR2 DEFAULT NULL);

For more information about the parameters, see:

The DBMS_STATS package provides a default configuration for Prefs. For more information about how to set Prefs, see Manage collection policy configuration items (Prefs).

PROCEDURE GATHER_DATABASE_STATS_JOB_PROC();

Examples

The following examples show how to collect statistics by using the DBMS_STATS package:

Collect global-level statistics for the tbl1 table of the user user.

CALL dbms_stats.gather_table_stats('user', 'tbl1', granularity=>'GLOBAL', method_opt=>'FOR ALL COLUMNS SIZE AUTO');

Collect partition-level statistics for the t_part1 table of the user user. The parallelism is 64, and histograms are collected only for columns with uneven data distribution.
```
CALL dbms_stats.gather_table_stats('user', 't_part1', degree=>'64', granularity=>'PARTITION', method_opt=>'FOR ALL COLUMNS SIZE SKEWONLY');
```
Collect all statistics for the t_subpart1 table of the user user. The parallelism is 128, and 50% of the data is collected. The optimizer decides whether to collect histograms for all columns.
```
CALL dbms_stats.gather_table_stats('user', 't_subpart1', degree=>'128', estimate_percent=> '50', granularity=>'ALL', method_opt=>'FOR ALL COLUMNS SIZE AUTO');
```

After running gather_table_stats, you can query (G)V$OB_OPT_STAT_GATHER_MONITOR to confirm whether the task has started.

GRANULARITY and Table / Index statistics collection

Partition specified	GRANULARITY	Global index	Local index
No	ALL	GLOBAL	ALL
No	AUTO	GLOBAL	AUTO
No	DEFAULT	GLOBAL	DEFAULT
No	GLOBAL AND PARTITION	GLOBAL	GLOBAL AND PARTITION
No	APPROX_GLOBAL AND PARTITION	GLOBAL	APPROX_GLOBAL AND PARTITION
No	GLOBAL	GLOBAL	GLOBAL
No	PARTITION	GLOBAL	PARTITION
No	SUBPARTITION	None	SUBPARTITION
Yes	ALL	GLOBAL	ALL
Yes	AUTO	GLOBAL	AUTO
Yes	DEFAULT	GLOBAL	DEFAULT
Yes	GLOBAL AND PARTITION	GLOBAL	GLOBAL AND PARTITION
Yes	APPROX_GLOBAL AND PARTITION	None	APPROX_GLOBAL AND PARTITION
Yes	GLOBAL	GLOBAL	GLOBAL
Yes	PARTITION	None	PARTITION
Yes	SUBPARTITION	None	SUBPARTITION

Collect statistics by using the ANALYZE statement

OceanBase Database supports the ANALYZE statement for collecting statistics in both Oracle mode and MySQL mode.

MySQL mode

Oracle mode

The ANALYZE statement in MySQL mode has the following syntax:

analyze_stmt:
ANALYZE TABLE table_name UPDATE HISTOGRAM ON column_name_list WITH INTNUM BUCKETS

If the tenant-level parameter enable_sql_extension is set to TRUE, you can use the syntax provided for Oracle mode in OceanBase Database for MySQL mode.

analyze_stmt:
ANALYZE TABLE table_name [use_partition] analyze_statistics_clause

Here are some examples of using the ANALYZE statement in MySQL mode to collect statistics:

Collect statistics for the tbl1 table, with 30 buckets for each column.

ANALYZE TABLE tbl1 UPDATE HISTOGRAM ON a, b, c, d WITH 30 BUCKETS;

Use the syntax provided for Oracle mode in MySQL mode to collect statistics for the tbl1 table of the test user.

ALTER SYSTEM SET ENABLE_SQL_EXTENSION = TRUE;
ANALYZE TABLE tbl1 COMPUTE STATISTICS FOR ALL COLUMNS SIZE AUTO;

The ANALYZE statement in Oracle mode has the following syntax:

analyze_stmt:
ANALYZE TABLE table_name [use_partition] analyze_statistics_clause

use_partition:
  PARTITION (parition_name [,partition_name...])
| SUBPARTITION(subpartition_name [,subpartition_name...])

analyze_statistics_clause:
  COMPUTE STATISTICS [analyze_for_clause]
| ESTIMATE STATISTICS [analyze_for_clause] [SAMPLE INTNUM {ROWS | PERCENTAGE}]

analyze_for_clause:
  FOR TABLE
| FOR ALL [INDEXED | HIDDEN] COLUMNS [size_clause]
| FOR COLUMNS [size_clause] column [size_clause] [,column [size_clause]...]

size_clause:
  SIZE integer
| SIZE REPEAT
| SIZE AUTO
| SIZE SKEWONLY

column:
  column_name
| (column_name [,column_name])

In the preceding syntax, the histogram collection method for columns is the same as the method_opt syntax of the DBMS_STATS package. For more information, see method_opt.

Here are some examples of using the ANALYZE statement in Oracle mode to collect statistics:

Collect statistics for the tbl1 table of the user user.

ANALYZE TABLE tbl1 COMPUTE STATISTICS FOR ALL COLUMNS SIZE AUTO;

Collect GLOBAL statistics for the t_part1 table of the user user and collect histograms for columns with uneven data distribution.
```
ANALYZE TABLE t_part1 PARTITION (t_part1) COMPUTE STATISTICS FOR ALL COLUMNS SIZE SKEWONLY;
```
Collect statistics for the p0sp0 and p1sp2 partitions of the t_subpart1 table of the user user. The histograms for all columns are determined internally by the optimizer.
```
ANALYZE TABLE t_subpart1 SUBPARTITION(p0sp0,p1sp2) COMPUTE STATISTICS FOR ALL COLUMNS SIZE AUTO;
```

Practice scenarios for manual statistics collection

Statistics collection for non-partitioned tables

CREATE TABLE test.t1(c1 INT, c2 INT, c3 INT, c4 INT, c5 INT, c6 INT, c7 INT, c8 INT, c9 INT, c10 INT);
CREATE TABLE t2 (c1 INT PRIMARY KEY);
INSERT INTO t2 VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
INSERT /*+APPEND*/ INTO t1 SELECT * FROM (SELECT s0.c1 c1, s1.c1 c2, s2.c1 c3, s3.c1 c4, s4.c1 c5, s5.c1 c6, s0.c1 * 10 c7, s1.c1 * 10 c8, s2.c1 * 10 c9, s3.c1 * 10 c10 FROM t2 s0, t2 s1, t2 s2, t2 s3, t2 s4, t2 s5);

# Scenario where histograms are not collected
CALL DBMS_STATS.GATHER_TABLE_STATS('test', 't1', method_opt=>'for all columns size 1');

# Scenario where histograms are collected and the default strategy is used
CALL DBMS_STATS.GATHER_TABLE_STATS('test', 't1');

# Scenario where histograms are not collected
CALL DBMS_STATS.GATHER_TABLE_STATS('test', 't1', degree=>8, method_opt=>'for all columns size 1');

# Scenario where histograms are collected and the default strategy is used
CALL DBMS_STATS.GATHER_TABLE_STATS('test', 't1', degree=>8);

Statistics collection for partitioned tables

In the following example, the test.t_part table has 10 columns and contains 1 million rows. The parallelism level is set to 2 because partition statistics are collected.

CREATE TABLE t_part(c1 INT, c2 INT, c3 INT, c4 INT, c5 INT, c6 INT, c7 INT, c8 INT, c9 INT, c10 INT) PARTITION BY HASH(c1) PARTITIONS 128;
CREATE TABLE t2 (c1 INT PRIMARY KEY);
INSERT INTO t2 VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
INSERT /*+APPEND*/ INTO t_part SELECT * FROM (SELECT s0.c1 c1, s1.c1 c2, s2.c1 c3, s3.c1 c4, s4.c1 c5, s5.c1 c6, s0.c1 * 10 c7, s1.c1 * 10 c8, s2.c1 * 10 c9, s3.c1 * 10 c10 FROM t2 s0, t2 s1, t2 s2, t2 s3, t2 s4, t2 s5);

# Scenario where histograms are not collected
CALL DBMS_STATS.GATHER_TABLE_STATS('test', 't_part', granularity=>'APPROX_GLOBAL AND PARTITION', method_opt=>'for all columns size 1');

# Scenario where histograms are collected and the default strategy is used
CALL DBMS_STATS.GATHER_TABLE_STATS('test', 't_part', granularity=>'APPROX_GLOBAL AND PARTITION');

For more information about the collection and update strategies for partitioned tables, see Collection and update strategies for partitioned tables.

Statistics collection at the schema (database) level

In addition to manually collecting statistics for individual tables, OceanBase Database provides the DBMS_STATS package to collect statistics for all tables in a user schema.

Note

This feature is supported only in OceanBase Database V4.x in dual mode and OceanBase Database V3.2.x in Oracle mode.

Because collecting statistics for all tables in a user schema is time-consuming, we recommend that you use this feature during off-peak hours.

If all tables in the user schema are small tables (with a number of rows not exceeding 1 million), you can directly use the following statements to collect statistics for the test user schema.

# Scenario where histograms are not collected
CALL DBMS_STATS.GATHER_SCHEMA_STATS('test', method_opt=>'for all columns size 1');

# Scenario where histograms are collected and the default strategy is used
CALL DBMS_STATS.GATHER_SCHEMA_STATS('test');

If some large tables (with a number of rows in the millions) exist in the user schema, you can increase the parallelism level during off-peak hours to accelerate statistics collection:

# Scenario where histograms are not collected
CALL DBMS_STATS.GATHER_SCHEMA_STATS('test', degree=>'16', method_opt=>'for all columns size 1');

# Scenario where histograms are collected and the default strategy is used
CALL DBMS_STATS.GATHER_SCHEMA_STATS('test', degree=>'16');

CALL DBMS_STATS.GATHER_TABLE_STATS('test', 'big_table', degree=>128, method_opt=>'for all columns size 1');

CALL DBMS_STATS.LOCK_TABLE_STATS('test','big_table');

CALL DBMS_STATS.GATHER_SCHEMA_STATS('TEST', degree=>'16', method_opt=>'for all columns size 1');

CALL DBMS_STATS.UNLOCK_TABLE_STATS('test','big_table');

After statistics collection, perform a health check:

-- Example: Verify whether statistics for the test.t1 table are collected
SELECT table_name, num_rows, last_analyzed
FROM oceanbase.DBA_TAB_STATISTICS
WHERE owner = 'test' AND table_name = 'T1' AND object_type = 'TABLE';

Monitor the progress and health of statistics collection

Why is monitoring necessary? Statistics collection can take several minutes to several hours. You need to:

Know in real time whether the task is still running
Determine if it is stuck or failed
Verify the validity of the collected results
Evaluate if the statistics are "stale" or "distorted"

Query the progress of statistics collection

For more information about the fields, examples, and other diagnostic methods of this view, see Monitor and diagnose statistics collection.

View the running statistics collection tasks

Note

In MySQL mode, you typically use oceanbase.GV$OB_OPT_STAT_GATHER_MONITOR (or oceanbase.V$OB_OPT_STAT_GATHER_MONITOR).
In Oracle mode, you typically use SYS.GV$OB_OPT_STAT_GATHER_MONITOR (or SYS.V$OB_OPT_STAT_GATHER_MONITOR).

SELECT
  tenant_id,
  svr_ip,
  svr_port,
  session_id,
  trace_id,
  task_id,
  type,
  task_start_time,
  task_duration_time,
  task_table_count,
  completed_table_count,
  running_table_owner,
  running_table_name,
  running_table_duration_time,
  running_table_progress
FROM oceanbase.GV$OB_OPT_STAT_GATHER_MONITOR
WHERE type = 'MANUAL GATHER';

Field	Description
`TASK_TABLE_COUNT`	The number of tables to be collected by the task
`COMPLETED_TABLE_COUNT`	The number of tables completed by the task
`RUNNING_TABLE_OWNER` / `RUNNING_TABLE_NAME`	The table currently being collected
`RUNNING_TABLE_PROGRESS`	The progress of the table currently being collected (in string format)

If you want to view the real-time status of the current tenant on the current OBServer node, use V$OB_OPT_STAT_GATHER_MONITOR.

View the historical task records (task-level)

-- Query the historical execution of manual collection tasks
SELECT
  tenant_id,
  task_id,
  type,
  status,
  table_count,
  failed_count,
  start_time,
  end_time
FROM oceanbase.DBA_OB_TASK_OPT_STAT_GATHER_HISTORY
WHERE type = 'MANUAL GATHER'
ORDER BY start_time DESC;

View the historical task records (table-level)

-- Query the failed tables in manual collection tasks (no results returned if no failures)
SELECT
  owner,
  table_name,
  task_id,
  status,
  start_time,
  end_time,
  stat_refresh_failed_list,
  properties
FROM oceanbase.DBA_OB_TABLE_OPT_STAT_GATHER_HISTORY
WHERE status = 'FAILED'
ORDER BY start_time DESC;

STAT_REFRESH_FAILED_LIST indicates the list of nodes where statistics collection is completed but the statistics cache refresh fails. When this field appears, it is recommended to check the status of the related nodes based on the view information and refresh the Plan Cache when necessary.

Verify if the statistics are effective or healthy

After the collection is completed, you need to confirm that the statistics have been written and are available.

Check if table-level statistics exist

Note

In MySQL mode, you can use oceanbase.DBA_TAB_STATISTICS and oceanbase.DBA_TAB_COL_STATISTICS.
In Oracle mode, you can use SYS.DBA_TAB_STATISTICS and SYS.DBA_TAB_COL_STATISTICS.

SELECT
  table_name,
  num_rows,        -- Number of rows
  avg_row_len,     -- Average row length
  sample_size,     -- Sample size
  last_analyzed,   -- Last collection time
  stale_stats,     -- Whether the statistics are stale
  stattype_locked  -- Whether the statistics are locked
FROM oceanbase.DBA_TAB_STATISTICS
WHERE owner = 'test' AND table_name = 'T1' AND object_type = 'TABLE';

Health indicators:

last_analyzed is a recent time (consistent with the current collection time)
Key fields such as num_rows and sample_size are non-null and approximately match the business scale
stale_stats = 'NO' (statistics are not stale)

Check if column histograms are generated (if SIZE > 1 is specified)

SELECT
  column_name,
  num_distinct,    -- Number of distinct values
  num_nulls,       -- Number of NULL values
  num_buckets,     -- Number of buckets
  histogram,       -- Histogram type
  last_analyzed
FROM oceanbase.DBA_TAB_COL_STATISTICS
WHERE owner = 'test' AND table_name = 'T1';

Histogram health assessment:

If the histogram field shows the histogram type (e.g., HYBRID) and num_buckets > 1, it typically indicates that the histogram has been generated.
The statistical values such as num_distinct and num_nulls should be consistent with business expectations to avoid distortion caused by insufficient sampling or improper strategies.

Check if statistics are locked (to prevent automatic updates from overwriting them)

SELECT stattype_locked
FROM oceanbase.DBA_TAB_STATISTICS
WHERE owner = 'test' AND table_name = 'BIG_TABLE' AND object_type = 'TABLE';

stattype_locked = 'ALL': Locked
NULL: Not locked

Common issues and diagnostics

Issue	Possible Cause	Solution
The collection task makes no progress for a long time	Large table size, low parallelism, resource contention	Increase the `degree` and execute during off-peak hours
`num_rows = 0` or `NULL`	The collection is incomplete or failed	Check `DBA_OB_TASK_OPT_STAT_GATHER_HISTORY` and `DBA_OB_TABLE_OPT_STAT_GATHER_HISTORY`
Histogram not generated	`method_opt` does not specify `SIZE > 1`	Explicitly set `FOR COLUMNS col SIZE 254`
The query still uses the wrong plan	The statistics have not been refreshed in the Plan Cache	Execute `ALTER SYSTEM FLUSH PLAN CACHE;`

OceanBase

Customer Stories

Documentation

Manually collect statistics

Note

Usage notes and limitations

Procedure

Collect statistics using the DBMS_STATS package

Examples

GRANULARITY and Table / Index statistics collection

Collect statistics by using the ANALYZE statement

Practice scenarios for manual statistics collection

Statistics collection for non-partitioned tables

Statistics collection for partitioned tables

Statistics collection at the schema (database) level

Note

Monitor the progress and health of statistics collection

Query the progress of statistics collection

View the running statistics collection tasks

Note

View the historical task records (task-level)

View the historical task records (table-level)

Verify if the statistics are effective or healthy

Check if table-level statistics exist

Note

Check if column histograms are generated (if SIZE > 1 is specified)

Check if statistics are locked (to prevent automatic updates from overwriting them)

Common issues and diagnostics

Manually collect statistics

Note

Usage notes and limitations

Procedure

Collect statistics using the DBMS_STATS package

Examples

GRANULARITY and Table / Index statistics collection

Collect statistics by using the ANALYZE statement

Practice scenarios for manual statistics collection

Statistics collection for non-partitioned tables

Statistics collection for partitioned tables

Statistics collection at the schema (database) level

Note

Monitor the progress and health of statistics collection

Query the progress of statistics collection

View the running statistics collection tasks

Note

View the historical task records (task-level)

View the historical task records (table-level)

Verify if the statistics are effective or healthy

Check if table-level statistics exist

Note

Check if column histograms are generated (if SIZE > 1 is specified)

Check if statistics are locked (to prevent automatic updates from overwriting them)

Common issues and diagnostics