The OceanBase Database optimizer mainly collects statistics manually by using the DBMS_STATS package and the ANALYZE statement.
Note
OceanBase Database does not support estimating the time required to collect statistics. We recommend that you set the ob_query_timeout parameter to a sufficiently large value to complete the statistics collection. In particular, if the statistics collection for schema-level statistics is interrupted due to a timeout, the statistics that have been collected will be retained.
Usage notes and limitations
We recommend that you increase the ob_query_timeout variable at the session level. This setting only affects the current session and does not interfere with other operations.
Procedure
| Step | Action | Command/View |
|---|---|---|
| Prepare | Increase ob_query_timeout |
SET SESSION ob_query_timeout = 36000000000; |
| Execute | Call DBMS_STATS or ANALYZE |
CALL dbms_stats.gather_table_stats(...); |
| Monitor | View the progress and status of the collection | (G)V$OB_OPT_STAT_GATHER_MONITOR |
| Verify | Check whether the statistics are written or expired | DBA_TAB_STATISTICS、DBA_TAB_COL_STATISTICS (and histogram-related views) |
| Troubleshoot | Handle failures or exceptions | DBA_OB_TASK_OPT_STAT_GATHER_HISTORY、DBA_OB_TABLE_OPT_STAT_GATHER_HISTORY + Plan cache (including refresh and eviction) |
Collect statistics using the DBMS_STATS package
OceanBase Database supports collecting table-level, schema-level, and index-level statistics by using the DBMS_STATS (MySQL mode)/DBMS_STATS (Oracle mode) package. You can call the gather_table_stats, gather_schema_stats, and gather_index_stats stored procedures respectively.
PROCEDURE gather_table_stats (
ownname VARCHAR2,
tabname VARCHAR2,
partname VARCHAR2 DEFAULT NULL,
estimate_percent NUMBER DEFAULT AUTO_SAMPLE_SIZE,
block_sample BOOLEAN DEFAULT FALSE,
method_opt VARCHAR2 DEFAULT DEFAULT_METHOD_OPT,
degree NUMBER DEFAULT NULL,
granularity VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
cascade BOOLEAN DEFAULT NULL,
stattab VARCHAR2 DEFAULT NULL,
statid VARCHAR2 DEFAULT NULL,
statown VARCHAR2 DEFAULT NULL,
no_invalidate BOOLEAN DEFAULT FALSE,
stattype VARCHAR2 DEFAULT 'DATA',
force BOOLEAN DEFAULT FALSE
);
PROCEDURE gather_schema_stats (
ownname VARCHAR2,
estimate_percent NUMBER DEFAULT AUTO_SAMPLE_SIZE,
block_sample BOOLEAN DEFAULT FALSE,
method_opt VARCHAR2 DEFAULT DEFAULT_METHOD_OPT,
degree NUMBER DEFAULT NULL,
granularity VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
cascade BOOLEAN DEFAULT NULL,
stattab VARCHAR2 DEFAULT NULL,
statid VARCHAR2 DEFAULT NULL,
statown VARCHAR2 DEFAULT NULL,
no_invalidate BOOLEAN DEFAULT FALSE,
stattype VARCHAR2 DEFAULT 'DATA',
force BOOLEAN DEFAULT FALSE
);
PROCEDURE gather_index_stats (
ownname VARCHAR2,
indname VARCHAR2,
partname VARCHAR2 DEFAULT NULL,
estimate_percent NUMBER DEFAULT AUTO_SAMPLE_SIZE,
stattab VARCHAR2 DEFAULT NULL,
statid VARCHAR2 DEFAULT NULL,
statown VARCHAR2 DEFAULT NULL,
degree NUMBER DEFAULT NULL,
granularity VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
no_invalidate BOOLEAN DEFAULT FALSE,
force BOOLEAN DEFAULT FALSE,
tabname VARCHAR2 DEFAULT NULL
);
DBMS_STATS.GATHER_TABLE_STATS (
ownname VARCHAR2,
tabname VARCHAR2,
partname VARCHAR2 DEFAULT NULL,
estimate_percent NUMBER DEFAULT AUTO_SAMPLE_SIZE,
block_sample BOOLEAN DEFAULT FALSE,
method_opt VARCHAR2 DEFAULT DEFAULT_METHOD_OPT,
degree NUMBER DEFAULT to_degree_type(get_param('DEGREE')),
granularity VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
cascade BOOLEAN DEFAULT NULL,
no_invalidate BOOLEAN DEFAULT FALSE,
stattype VARCHAR2 DEFAULT 'DATA',
force BOOLEAN DEFAULT FALSE,
hist_est_percent NUMBER DEFAULT AUTO_SAMPLE_SIZE,
hist_block_sample BOOLEAN DEFAULT NULL
);
DBMS_STATS.GATHER_SCHEMA_STATS (
ownname VARCHAR2,
estimate_percent NUMBER DEFAULT AUTO_SAMPLE_SIZE,
block_sample BOOLEAN DEFAULT FALSE,
method_opt VARCHAR2 DEFAULT DEFAULT_METHOD_OPT,
degree NUMBER DEFAULT NULL,
granularity VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
cascade BOOLEAN DEFAULT NULL,
no_invalidate BOOLEAN DEFAULT FALSE,
stattype VARCHAR2 DEFAULT 'DATA',
force BOOLEAN DEFAULT FALSE);
DBMS_STATS.GATHER_INDEX_STATS (
ownname VARCHAR2,
indname VARCHAR2,
partname VARCHAR2 DEFAULT NULL,
estimate_percent NUMBER DEFAULT AUTO_SAMPLE_SIZE,
stattab VARCHAR2 DEFAULT NULL,
statown VARCHAR2 DEFAULT NULL,
degree NUMBER DEFAULT NULL,
granularity VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
no_invalidate BOOLEAN DEFAULT FALSE,
force BOOLEAN DEFAULT FALSE,
tabname VARCHAR2 DEFAULT NULL);
For more information about the parameters, see:
- GATHER_TABLE_STATS (MySQL mode)/GATHER_TABLE_STATS (Oracle mode)
- GATHER_SCHEMA_STATS (MySQL mode)/GATHER_SCHEMA_STATS (Oracle mode)
- GATHER_INDEX_STATS (MySQL mode)/GATHER_INDEX_STATS (Oracle mode)
The DBMS_STATS package provides a default configuration for Prefs. For more information about how to set Prefs, see Manage collection policy configuration items (Prefs).
PROCEDURE GATHER_DATABASE_STATS_JOB_PROC();
Examples
The following examples show how to collect statistics by using the DBMS_STATS package:
Collect global-level statistics for the
tbl1table of theuseruser.CALL dbms_stats.gather_table_stats('user', 'tbl1', granularity=>'GLOBAL', method_opt=>'FOR ALL COLUMNS SIZE AUTO');Collect partition-level statistics for the
t_part1table of theuseruser. The parallelism is 64, and histograms are collected only for columns with uneven data distribution.CALL dbms_stats.gather_table_stats('user', 't_part1', degree=>'64', granularity=>'PARTITION', method_opt=>'FOR ALL COLUMNS SIZE SKEWONLY');Collect all statistics for the
t_subpart1table of theuseruser. The parallelism is 128, and 50% of the data is collected. The optimizer decides whether to collect histograms for all columns.CALL dbms_stats.gather_table_stats('user', 't_subpart1', degree=>'128', estimate_percent=> '50', granularity=>'ALL', method_opt=>'FOR ALL COLUMNS SIZE AUTO');
After running gather_table_stats, you can query (G)V$OB_OPT_STAT_GATHER_MONITOR to confirm whether the task has started.
GRANULARITY and Table / Index statistics collection
By default, when you collect table-level statistics, all index-level statistics on the table are also collected. This is because some index-level statistics can be obtained from the base table's statistics, such as the number of rows and average row length (calculated from several columns). This can accelerate the statistics collection and avoid a full table scan. The GRANULARITY parameter differs between global and local indexes, as shown in the following table.
| Partition specified | GRANULARITY | Global index | Local index |
|---|---|---|---|
| No | ALL | GLOBAL | ALL |
| No | AUTO | GLOBAL | AUTO |
| No | DEFAULT | GLOBAL | DEFAULT |
| No | GLOBAL AND PARTITION | GLOBAL | GLOBAL AND PARTITION |
| No | APPROX_GLOBAL AND PARTITION | GLOBAL | APPROX_GLOBAL AND PARTITION |
| No | GLOBAL | GLOBAL | GLOBAL |
| No | PARTITION | GLOBAL | PARTITION |
| No | SUBPARTITION | None | SUBPARTITION |
| Yes | ALL | GLOBAL | ALL |
| Yes | AUTO | GLOBAL | AUTO |
| Yes | DEFAULT | GLOBAL | DEFAULT |
| Yes | GLOBAL AND PARTITION | GLOBAL | GLOBAL AND PARTITION |
| Yes | APPROX_GLOBAL AND PARTITION | None | APPROX_GLOBAL AND PARTITION |
| Yes | GLOBAL | GLOBAL | GLOBAL |
| Yes | PARTITION | None | PARTITION |
| Yes | SUBPARTITION | None | SUBPARTITION |
Collect statistics by using the ANALYZE statement
OceanBase Database supports the ANALYZE statement for collecting statistics in both Oracle mode and MySQL mode.
The ANALYZE statement in MySQL mode has the following syntax:
analyze_stmt:
ANALYZE TABLE table_name UPDATE HISTOGRAM ON column_name_list WITH INTNUM BUCKETS
If the tenant-level parameter enable_sql_extension is set to TRUE, you can use the syntax provided for Oracle mode in OceanBase Database for MySQL mode.
analyze_stmt:
ANALYZE TABLE table_name [use_partition] analyze_statistics_clause
Here are some examples of using the ANALYZE statement in MySQL mode to collect statistics:
Collect statistics for the
tbl1table, with 30 buckets for each column.ANALYZE TABLE tbl1 UPDATE HISTOGRAM ON a, b, c, d WITH 30 BUCKETS;Use the syntax provided for Oracle mode in MySQL mode to collect statistics for the
tbl1table of thetestuser.ALTER SYSTEM SET ENABLE_SQL_EXTENSION = TRUE; ANALYZE TABLE tbl1 COMPUTE STATISTICS FOR ALL COLUMNS SIZE AUTO;
The ANALYZE statement in Oracle mode has the following syntax:
analyze_stmt:
ANALYZE TABLE table_name [use_partition] analyze_statistics_clause
use_partition:
PARTITION (parition_name [,partition_name...])
| SUBPARTITION(subpartition_name [,subpartition_name...])
analyze_statistics_clause:
COMPUTE STATISTICS [analyze_for_clause]
| ESTIMATE STATISTICS [analyze_for_clause] [SAMPLE INTNUM {ROWS | PERCENTAGE}]
analyze_for_clause:
FOR TABLE
| FOR ALL [INDEXED | HIDDEN] COLUMNS [size_clause]
| FOR COLUMNS [size_clause] column [size_clause] [,column [size_clause]...]
size_clause:
SIZE integer
| SIZE REPEAT
| SIZE AUTO
| SIZE SKEWONLY
column:
column_name
| (column_name [,column_name])
In the preceding syntax, the histogram collection method for columns is the same as the method_opt syntax of the DBMS_STATS package. For more information, see method_opt.
Compared with the statistics collection methods of the DBMS_STATS package, the ANALYZE statement provides fewer strategy settings. To support the ANALYZE statement in collecting only GLOBAL statistics, the rule is to set the partition_name parameter to the table name in the use_partition syntax.
Here are some examples of using the ANALYZE statement in Oracle mode to collect statistics:
Collect statistics for the
tbl1table of theuseruser.ANALYZE TABLE tbl1 COMPUTE STATISTICS FOR ALL COLUMNS SIZE AUTO;Collect
GLOBALstatistics for thet_part1table of theuseruser and collect histograms for columns with uneven data distribution.ANALYZE TABLE t_part1 PARTITION (t_part1) COMPUTE STATISTICS FOR ALL COLUMNS SIZE SKEWONLY;Collect statistics for the
p0sp0andp1sp2partitions of thet_subpart1table of theuseruser. The histograms for all columns are determined internally by the optimizer.ANALYZE TABLE t_subpart1 SUBPARTITION(p0sp0,p1sp2) COMPUTE STATISTICS FOR ALL COLUMNS SIZE AUTO;
Practice scenarios for manual statistics collection
Statistics collection for non-partitioned tables
If the product of the number of rows and columns in a table does not exceed 10 million, we recommend that you use the following statements to collect statistics for the table. In the following example, the test.t1 table has 10 columns and contains 1 million rows.
CREATE TABLE test.t1(c1 INT, c2 INT, c3 INT, c4 INT, c5 INT, c6 INT, c7 INT, c8 INT, c9 INT, c10 INT);
CREATE TABLE t2 (c1 INT PRIMARY KEY);
INSERT INTO t2 VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
INSERT /*+APPEND*/ INTO t1 SELECT * FROM (SELECT s0.c1 c1, s1.c1 c2, s2.c1 c3, s3.c1 c4, s4.c1 c5, s5.c1 c6, s0.c1 * 10 c7, s1.c1 * 10 c8, s2.c1 * 10 c9, s3.c1 * 10 c10 FROM t2 s0, t2 s1, t2 s2, t2 s3, t2 s4, t2 s5);
# Scenario where histograms are not collected
CALL DBMS_STATS.GATHER_TABLE_STATS('test', 't1', method_opt=>'for all columns size 1');
# Scenario where histograms are collected and the default strategy is used
CALL DBMS_STATS.GATHER_TABLE_STATS('test', 't1');
If the product of the number of rows and columns in a table exceeds 10 million, we recommend that you set a parallelism level based on the business requirements and system resources to accelerate statistics collection. In the following example, the test.t1 table contains 10 million rows. The parallelism level is set to 8.
# Scenario where histograms are not collected
CALL DBMS_STATS.GATHER_TABLE_STATS('test', 't1', degree=>8, method_opt=>'for all columns size 1');
# Scenario where histograms are collected and the default strategy is used
CALL DBMS_STATS.GATHER_TABLE_STATS('test', 't1', degree=>8);
Statistics collection for partitioned tables
For partitioned tables, you need to consider the collection of partition statistics in addition to the collection of non-partitioned table statistics. If system resources are available, we recommend that you double the parallelism level used for non-partitioned tables for partitioned tables.
In the following example, the test.t_part table has 10 columns and contains 1 million rows. The parallelism level is set to 2 because partition statistics are collected.
CREATE TABLE t_part(c1 INT, c2 INT, c3 INT, c4 INT, c5 INT, c6 INT, c7 INT, c8 INT, c9 INT, c10 INT) PARTITION BY HASH(c1) PARTITIONS 128;
CREATE TABLE t2 (c1 INT PRIMARY KEY);
INSERT INTO t2 VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
INSERT /*+APPEND*/ INTO t_part SELECT * FROM (SELECT s0.c1 c1, s1.c1 c2, s2.c1 c3, s3.c1 c4, s4.c1 c5, s5.c1 c6, s0.c1 * 10 c7, s1.c1 * 10 c8, s2.c1 * 10 c9, s3.c1 * 10 c10 FROM t2 s0, t2 s1, t2 s2, t2 s3, t2 s4, t2 s5);
For partitioned tables, you can also consider the partition derivation method to collect statistics. In this method, you collect partition statistics and derive global statistics based on the partition statistics to accelerate statistics collection. For example, in the same scenario, you can adjust the partition collection method without increasing the parallelism level. Here is an example:
# Scenario where histograms are not collected
CALL DBMS_STATS.GATHER_TABLE_STATS('test', 't_part', granularity=>'APPROX_GLOBAL AND PARTITION', method_opt=>'for all columns size 1');
# Scenario where histograms are collected and the default strategy is used
CALL DBMS_STATS.GATHER_TABLE_STATS('test', 't_part', granularity=>'APPROX_GLOBAL AND PARTITION');
For more information about the collection and update strategies for partitioned tables, see Collection and update strategies for partitioned tables.
Statistics collection at the schema (database) level
In addition to manually collecting statistics for individual tables, OceanBase Database provides the DBMS_STATS package to collect statistics for all tables in a user schema.
Note
This feature is supported only in OceanBase Database V4.x in dual mode and OceanBase Database V3.2.x in Oracle mode.
Because collecting statistics for all tables in a user schema is time-consuming, we recommend that you use this feature during off-peak hours.
If all tables in the user schema are small tables (with a number of rows not exceeding 1 million), you can directly use the following statements to collect statistics for the test user schema.
# Scenario where histograms are not collected
CALL DBMS_STATS.GATHER_SCHEMA_STATS('test', method_opt=>'for all columns size 1');
# Scenario where histograms are collected and the default strategy is used
CALL DBMS_STATS.GATHER_SCHEMA_STATS('test');
If some large tables (with a number of rows in the millions) exist in the user schema, you can increase the parallelism level during off-peak hours to accelerate statistics collection:
# Scenario where histograms are not collected
CALL DBMS_STATS.GATHER_SCHEMA_STATS('test', degree=>'16', method_opt=>'for all columns size 1');
# Scenario where histograms are collected and the default strategy is used
CALL DBMS_STATS.GATHER_SCHEMA_STATS('test', degree=>'16');
If the user schema contains a very large table (with more than 100 million rows), you can separately collect statistics for the very large table with a higher parallelism level, lock the statistics of the very large table, and then use the preceding statements to collect statistics for the entire user schema. After statistics collection is completed, unlock the statistics of the very large table and collect statistics incrementally. Here is an example:
CALL DBMS_STATS.GATHER_TABLE_STATS('test', 'big_table', degree=>128, method_opt=>'for all columns size 1');
CALL DBMS_STATS.LOCK_TABLE_STATS('test','big_table');
CALL DBMS_STATS.GATHER_SCHEMA_STATS('TEST', degree=>'16', method_opt=>'for all columns size 1');
CALL DBMS_STATS.UNLOCK_TABLE_STATS('test','big_table');
After statistics collection, perform a health check:
-- Example: Verify whether statistics for the test.t1 table are collected
SELECT table_name, num_rows, last_analyzed
FROM oceanbase.DBA_TAB_STATISTICS
WHERE owner = 'test' AND table_name = 'T1' AND object_type = 'TABLE';
Monitor the progress and health of statistics collection
Why is monitoring necessary? Statistics collection can take several minutes to several hours. You need to:
- Know in real time whether the task is still running
- Determine if it is stuck or failed
- Verify the validity of the collected results
- Evaluate if the statistics are "stale" or "distorted"
Query the progress of statistics collection
OceanBase Database supports statistics collection monitoring and diagnostics starting from V4.2.0. After manually or automatically initiating statistics collection, you can query the real-time collection status and progress from the (G)V$OB_OPT_STAT_GATHER_MONITOR view.
For more information about the fields, examples, and other diagnostic methods of this view, see Monitor and diagnose statistics collection.
View the running statistics collection tasks
Note
- In MySQL mode, you typically use
oceanbase.GV$OB_OPT_STAT_GATHER_MONITOR(oroceanbase.V$OB_OPT_STAT_GATHER_MONITOR). - In Oracle mode, you typically use
SYS.GV$OB_OPT_STAT_GATHER_MONITOR(orSYS.V$OB_OPT_STAT_GATHER_MONITOR).
SELECT
tenant_id,
svr_ip,
svr_port,
session_id,
trace_id,
task_id,
type,
task_start_time,
task_duration_time,
task_table_count,
completed_table_count,
running_table_owner,
running_table_name,
running_table_duration_time,
running_table_progress
FROM oceanbase.GV$OB_OPT_STAT_GATHER_MONITOR
WHERE type = 'MANUAL GATHER';
| Field | Description |
|---|---|
TASK_TABLE_COUNT |
The number of tables to be collected by the task |
COMPLETED_TABLE_COUNT |
The number of tables completed by the task |
RUNNING_TABLE_OWNER / RUNNING_TABLE_NAME |
The table currently being collected |
RUNNING_TABLE_PROGRESS |
The progress of the table currently being collected (in string format) |
If you want to view the real-time status of the current tenant on the current OBServer node, use V$OB_OPT_STAT_GATHER_MONITOR.
View the historical task records (task-level)
-- Query the historical execution of manual collection tasks
SELECT
tenant_id,
task_id,
type,
status,
table_count,
failed_count,
start_time,
end_time
FROM oceanbase.DBA_OB_TASK_OPT_STAT_GATHER_HISTORY
WHERE type = 'MANUAL GATHER'
ORDER BY start_time DESC;
View the historical task records (table-level)
-- Query the failed tables in manual collection tasks (no results returned if no failures)
SELECT
owner,
table_name,
task_id,
status,
start_time,
end_time,
stat_refresh_failed_list,
properties
FROM oceanbase.DBA_OB_TABLE_OPT_STAT_GATHER_HISTORY
WHERE status = 'FAILED'
ORDER BY start_time DESC;
STAT_REFRESH_FAILED_LISTindicates the list of nodes where statistics collection is completed but the statistics cache refresh fails. When this field appears, it is recommended to check the status of the related nodes based on the view information and refresh the Plan Cache when necessary.
Verify if the statistics are effective or healthy
After the collection is completed, you need to confirm that the statistics have been written and are available.
Check if table-level statistics exist
Note
- In MySQL mode, you can use
oceanbase.DBA_TAB_STATISTICSandoceanbase.DBA_TAB_COL_STATISTICS. - In Oracle mode, you can use
SYS.DBA_TAB_STATISTICSandSYS.DBA_TAB_COL_STATISTICS.
SELECT
table_name,
num_rows, -- Number of rows
avg_row_len, -- Average row length
sample_size, -- Sample size
last_analyzed, -- Last collection time
stale_stats, -- Whether the statistics are stale
stattype_locked -- Whether the statistics are locked
FROM oceanbase.DBA_TAB_STATISTICS
WHERE owner = 'test' AND table_name = 'T1' AND object_type = 'TABLE';
Health indicators:
last_analyzedis a recent time (consistent with the current collection time)- Key fields such as
num_rowsandsample_sizeare non-null and approximately match the business scale stale_stats = 'NO'(statistics are not stale)
Check if column histograms are generated (if SIZE > 1 is specified)
SELECT
column_name,
num_distinct, -- Number of distinct values
num_nulls, -- Number of NULL values
num_buckets, -- Number of buckets
histogram, -- Histogram type
last_analyzed
FROM oceanbase.DBA_TAB_COL_STATISTICS
WHERE owner = 'test' AND table_name = 'T1';
Histogram health assessment:
- If the
histogramfield shows the histogram type (e.g.,HYBRID) andnum_buckets > 1, it typically indicates that the histogram has been generated. - The statistical values such as
num_distinctandnum_nullsshould be consistent with business expectations to avoid distortion caused by insufficient sampling or improper strategies.
Check if statistics are locked (to prevent automatic updates from overwriting them)
SELECT stattype_locked
FROM oceanbase.DBA_TAB_STATISTICS
WHERE owner = 'test' AND table_name = 'BIG_TABLE' AND object_type = 'TABLE';
stattype_locked = 'ALL': LockedNULL: Not locked
Common issues and diagnostics
| Issue | Possible Cause | Solution |
|---|---|---|
| The collection task makes no progress for a long time | Large table size, low parallelism, resource contention | Increase the degree and execute during off-peak hours |
num_rows = 0 or NULL |
The collection is incomplete or failed | Check DBA_OB_TASK_OPT_STAT_GATHER_HISTORY and DBA_OB_TABLE_OPT_STAT_GATHER_HISTORY |
| Histogram not generated | method_opt does not specify SIZE > 1 |
Explicitly set FOR COLUMNS col SIZE 254 |
| The query still uses the wrong plan | The statistics have not been refreshed in the Plan Cache | Execute ALTER SYSTEM FLUSH PLAN CACHE; |
