The optimizer of OceanBase Database can manually collect statistics by using the DBMS_STATS package and the ANALYZE statement.
Note
OceanBase Database does not support estimating the time required for statistics collection. We recommend that you set the ob_query_timeout parameter to a sufficiently large value to ensure that statistics collection is completed. If statistics collection is interrupted due to a timeout, the statistics collected before the interruption will be retained. This is especially important when collecting statistics at the schema level.
Usage notes and limitations
We recommend that you increase the ob_query_timeout variable at the session level. This operation only affects the current session and does not interfere with other operations.
Procedure
| Step | Action | Command/View |
|---|---|---|
| Prepare | Increase ob_query_timeout |
SET SESSION ob_query_timeout = 36000000000; |
| Execute | Call DBMS_STATS or ANALYZE |
CALL dbms_stats.gather_table_stats(...); |
| Monitor | View the collection progress and status | (G)V$OB_OPT_STAT_GATHER_MONITOR |
| Verify | Check whether the statistics are written or expired | DBA_TAB_STATISTICS、DBA_TAB_COL_STATISTICS (and histogram-related views) |
| Troubleshoot | Handle failures or exceptions | DBA_OB_TASK_OPT_STAT_GATHER_HISTORY、DBA_OB_TABLE_OPT_STAT_GATHER_HISTORY + Plan Cache (including refresh and eviction instructions) |
Collect statistics using the DBMS_STATS package
OceanBase Database supports collecting table-level, schema-level, and index-level statistics by using the DBMS_STATS (MySQL mode) or DBMS_STATS (Oracle mode) package. You can call the gather_table_stats, gather_schema_stats, and gather_index_stats stored procedures respectively to collect statistics.
PROCEDURE gather_table_stats (
ownname VARCHAR2,
tabname VARCHAR2,
partname VARCHAR2 DEFAULT NULL,
estimate_percent NUMBER DEFAULT AUTO_SAMPLE_SIZE,
block_sample BOOLEAN DEFAULT FALSE,
method_opt VARCHAR2 DEFAULT DEFAULT_METHOD_OPT,
degree NUMBER DEFAULT NULL,
granularity VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
cascade BOOLEAN DEFAULT NULL,
stattab VARCHAR2 DEFAULT NULL,
statid VARCHAR2 DEFAULT NULL,
statown VARCHAR2 DEFAULT NULL,
no_invalidate BOOLEAN DEFAULT FALSE,
stattype VARCHAR2 DEFAULT 'DATA',
force BOOLEAN DEFAULT FALSE
);
PROCEDURE gather_schema_stats (
ownname VARCHAR2,
estimate_percent NUMBER DEFAULT AUTO_SAMPLE_SIZE,
block_sample BOOLEAN DEFAULT FALSE,
method_opt VARCHAR2 DEFAULT DEFAULT_METHOD_OPT,
degree NUMBER DEFAULT NULL,
granularity VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
cascade BOOLEAN DEFAULT NULL,
stattab VARCHAR2 DEFAULT NULL,
statid VARCHAR2 DEFAULT NULL,
statown VARCHAR2 DEFAULT NULL,
no_invalidate BOOLEAN DEFAULT FALSE,
stattype VARCHAR2 DEFAULT 'DATA',
force BOOLEAN DEFAULT FALSE
);
PROCEDURE gather_index_stats (
ownname VARCHAR2,
indname VARCHAR2,
partname VARCHAR2 DEFAULT NULL,
estimate_percent NUMBER DEFAULT AUTO_SAMPLE_SIZE,
stattab VARCHAR2 DEFAULT NULL,
statid VARCHAR2 DEFAULT NULL,
statown VARCHAR2 DEFAULT NULL,
degree NUMBER DEFAULT NULL,
granularity VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
no_invalidate BOOLEAN DEFAULT FALSE,
force BOOLEAN DEFAULT FALSE,
tabname VARCHAR2 DEFAULT NULL
);
DBMS_STATS.GATHER_TABLE_STATS (
ownname VARCHAR2,
tabname VARCHAR2,
partname VARCHAR2 DEFAULT NULL,
estimate_percent NUMBER DEFAULT AUTO_SAMPLE_SIZE,
block_sample BOOLEAN DEFAULT FALSE,
method_opt VARCHAR2 DEFAULT DEFAULT_METHOD_OPT,
degree NUMBER DEFAULT to_degree_type(get_param('DEGREE')),
granularity VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
cascade BOOLEAN DEFAULT NULL,
no_invalidate BOOLEAN DEFAULT FALSE,
stattype VARCHAR2 DEFAULT 'DATA',
force BOOLEAN DEFAULT FALSE,
hist_est_percent NUMBER DEFAULT AUTO_SAMPLE_SIZE,
hist_block_sample BOOLEAN DEFAULT NULL
);
DBMS_STATS.GATHER_SCHEMA_STATS (
ownname VARCHAR2,
estimate_percent NUMBER DEFAULT AUTO_SAMPLE_SIZE,
block_sample BOOLEAN DEFAULT FALSE,
method_opt VARCHAR2 DEFAULT DEFAULT_METHOD_OPT,
degree NUMBER DEFAULT NULL,
granularity VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
cascade BOOLEAN DEFAULT NULL,
no_invalidate BOOLEAN DEFAULT FALSE,
stattype VARCHAR2 DEFAULT 'DATA',
force BOOLEAN DEFAULT FALSE);
DBMS_STATS.GATHER_INDEX_STATS (
ownname VARCHAR2,
indname VARCHAR2,
partname VARCHAR2 DEFAULT NULL,
estimate_percent NUMBER DEFAULT AUTO_SAMPLE_SIZE,
stattab VARCHAR2 DEFAULT NULL,
statown VARCHAR2 DEFAULT NULL,
degree NUMBER DEFAULT NULL,
granularity VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
no_invalidate BOOLEAN DEFAULT FALSE,
force BOOLEAN DEFAULT FALSE,
tabname VARCHAR2 DEFAULT NULL);
For more information about the parameters, see:
- GATHER_TABLE_STATS (MySQL mode) or GATHER_TABLE_STATS (Oracle mode)
- GATHER_SCHEMA_STATS (MySQL mode) or GATHER_SCHEMA_STATS (Oracle mode)
- GATHER_INDEX_STATS (MySQL mode) or GATHER_INDEX_STATS (Oracle mode)
The DBMS_STATS package provides a default configuration for Prefs. For more information about how to set Prefs, see Manage collection policy configuration items (Prefs).
PROCEDURE GATHER_DATABASE_STATS_JOB_PROC();
Examples
Here are some examples of collecting statistics by using the DBMS_STATS package:
Collect global-level statistics for the
tbl1table of theuseruser.CALL dbms_stats.gather_table_stats('user', 'tbl1', granularity=>'GLOBAL', method_opt=>'FOR ALL COLUMNS SIZE AUTO');Collect partition-level statistics for the
t_part1table of theuseruser. The degree of parallelism is 64. Histograms are collected only for columns with uneven data distribution.CALL dbms_stats.gather_table_stats('user', 't_part1', degree=>'64', granularity=>'PARTITION', method_opt=>'FOR ALL COLUMNS SIZE SKEWONLY');Collect all statistics for the
t_subpart1table of theuseruser. The degree of parallelism is 128. Only 50% of the data is collected. The optimizer decides whether to collect histograms for each column.CALL dbms_stats.gather_table_stats('user', 't_subpart1', degree=>'128', estimate_percent=> '50', granularity=>'ALL', method_opt=>'FOR ALL COLUMNS SIZE AUTO');
After you execute the gather_table_stats procedure, you can query the (G)V$OB_OPT_STAT_GATHER_MONITOR view to confirm whether the task is started.
GRANULARITY and Table / Index statistics collection
By default, when you collect table-level statistics, statistics for all indexes on the table are also collected. Some statistics for indexes can be obtained from the base table, such as the number of rows and average row length (calculated from several columns). Therefore, you can accelerate the statistics collection and avoid rescanning the entire table. The GRANULARITY parameter is different for local and global indexes, as shown in the following table.
| Partition specified? | GRANULARITY | Global index | Local index |
|---|---|---|---|
| No | ALL | GLOBAL | ALL |
| No | AUTO | GLOBAL | AUTO |
| No | DEFAULT | GLOBAL | DEFAULT |
| No | GLOBAL AND PARTITION | GLOBAL | GLOBAL AND PARTITION |
| No | APPROX_GLOBAL AND PARTITION | GLOBAL | APPROX_GLOBAL AND PARTITION |
| No | GLOBAL | GLOBAL | GLOBAL |
| No | PARTITION | GLOBAL | PARTITION |
| No | SUBPARTITION | None | SUBPARTITION |
| Yes | ALL | GLOBAL | ALL |
| Yes | AUTO | GLOBAL | AUTO |
| Yes | DEFAULT | GLOBAL | DEFAULT |
| Yes | GLOBAL AND PARTITION | GLOBAL | GLOBAL AND PARTITION |
| Yes | APPROX_GLOBAL AND PARTITION | None | APPROX_GLOBAL AND PARTITION |
| Yes | GLOBAL | GLOBAL | GLOBAL |
| Yes | PARTITION | None | PARTITION |
| Yes | SUBPARTITION | None | SUBPARTITION |
Collect statistics using the ANALYZE statement
OceanBase Database supports using the ANALYZE statement to collect statistics in Oracle mode and MySQL mode.
In MySQL mode, the ANALYZE statement has the following syntax:
analyze_stmt:
ANALYZE TABLE table_name UPDATE HISTOGRAM ON column_name_list WITH INTNUM BUCKETS
If the tenant-level parameter enable_sql_extension is set to TRUE, you can use the syntax provided by OceanBase Database for MySQL mode.
analyze_stmt:
ANALYZE TABLE table_name [use_partition] analyze_statistics_clause
Here are some examples of using the ANALYZE statement in MySQL mode to collect statistics:
Collect statistics for the
tbl1table, with 30 buckets for each column.ANALYZE TABLE tbl1 UPDATE HISTOGRAM ON a, b, c, d WITH 30 BUCKETS;Collect statistics for the
tbl1table of thetestuser in MySQL mode by using the syntax provided by OceanBase Database for Oracle mode.ALTER SYSTEM SET ENABLE_SQL_EXTENSION = TRUE; ANALYZE TABLE tbl1 COMPUTE STATISTICS FOR ALL COLUMNS SIZE AUTO;
In Oracle mode, the ANALYZE statement has the following syntax:
analyze_stmt:
ANALYZE TABLE table_name [use_partition] analyze_statistics_clause
use_partition:
PARTITION (parition_name [,partition_name...])
| SUBPARTITION(subpartition_name [,subpartition_name...])
analyze_statistics_clause:
COMPUTE STATISTICS [analyze_for_clause]
| ESTIMATE STATISTICS [analyze_for_clause] [SAMPLE INTNUM {ROWS | PERCENTAGE}]
analyze_for_clause:
FOR TABLE
| FOR ALL [INDEXED | HIDDEN] COLUMNS [size_clause]
| FOR COLUMNS [size_clause] column [size_clause] [,column [size_clause]...]
size_clause:
SIZE integer
| SIZE REPEAT
| SIZE AUTO
| SIZE SKEWONLY
column:
column_name
| (column_name [,column_name])
In the preceding syntax, the histogram collection method for columns is the same as the method_opt syntax of the DBMS_STATS package. For more information, see method_opt.
Compared with the statistics collection methods of the DBMS_STATS package, the ANALYZE statement does not provide as many strategy settings. To support the ANALYZE statement in collecting only GLOBAL statistics, the rule is to set the partition_name parameter in the use_partition syntax to the table name.
Here are some examples of using the ANALYZE statement in Oracle mode to collect statistics:
Collect statistics for the
tbl1table of theuseruser.ANALYZE TABLE tbl1 COMPUTE STATISTICS FOR ALL COLUMNS SIZE AUTO;Collect
GLOBALstatistics for thet_part1table of theuseruser, and collect histograms for only columns with uneven data distribution.ANALYZE TABLE t_part1 PARTITION (t_part1) COMPUTE STATISTICS FOR ALL COLUMNS SIZE SKEWONLY;Collect statistics for the
p0sp0andp1sp2partitions of thet_subpart1table of theuseruser. The histograms for all columns are determined by the optimizer.ANALYZE TABLE t_subpart1 SUBPARTITION(p0sp0,p1sp2) COMPUTE STATISTICS FOR ALL COLUMNS SIZE AUTO;
Practice scenarios of manual statistics collection
Statistics collection for non-partitioned tables
If the product of the number of rows and the number of columns in a table is no more than 10 million, we recommend that you use the following statement to collect statistics for the table. In the following example, the test.t1 table has 10 columns and 1 million rows.
CREATE TABLE test.t1(c1 INT, c2 INT, c3 INT, c4 INT, c5 INT, c6 INT, c7 INT, c8 INT, c9 INT, c10 INT);
CREATE TABLE t2 (c1 INT PRIMARY KEY);
INSERT INTO t2 VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
INSERT /*+APPEND*/ INTO t1 SELECT * FROM (SELECT s0.c1 c1, s1.c1 c2, s2.c1 c3, s3.c1 c4, s4.c1 c5, s5.c1 c6, s0.c1 * 10 c7, s1.c1 * 10 c8, s2.c1 * 10 c9, s3.c1 * 10 c10 FROM t2 s0, t2 s1, t2 s2, t2 s3, t2 s4, t2 s5);
# Scenario where histograms are not collected
CALL DBMS_STATS.GATHER_TABLE_STATS('test', 't1', method_opt=>'for all columns size 1');
# Scenario where histograms are collected and the default strategy is used
CALL DBMS_STATS.GATHER_TABLE_STATS('test', 't1');
If the product of the number of rows and the number of columns in a table exceeds 10 million, we recommend that you set a parallelism level based on the business scenario and system resources to speed up statistics collection. In the following example, the test.t1 table has 10 million rows. The parallelism level is set to 8.
# Scenario where histograms are not collected
CALL DBMS_STATS.GATHER_TABLE_STATS('test', 't1', degree=>8, method_opt=>'for all columns size 1');
# Scenario where histograms are collected and the default strategy is used
CALL DBMS_STATS.GATHER_TABLE_STATS('test', 't1', degree=>8);
Statistics collection for partitioned tables
For partitioned tables, statistics collection strategies need to consider partition statistics collection. If system resources allow, we recommend that you increase the parallelism level for non-partitioned tables by 100% for partitioned tables.
In the following example, the test.t_part table has 10 columns and 1 million rows. The parallelism level is set to 2 because partition statistics collection is enabled.
CREATE TABLE t_part(c1 INT, c2 INT, c3 INT, c4 INT, c5 INT, c6 INT, c7 INT, c8 INT, c9 INT, c10 INT) PARTITION BY HASH(c1) PARTITIONS 128;
CREATE TABLE t2 (c1 INT PRIMARY KEY);
INSERT INTO t2 VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
INSERT /*+APPEND*/ INTO t_part SELECT * FROM (SELECT s0.c1 c1, s1.c1 c2, s2.c1 c3, s3.c1 c4, s4.c1 c5, s5.c1 c6, s0.c1 * 10 c7, s1.c1 * 10 c8, s2.c1 * 10 c9, s3.c1 * 10 c10 FROM t2 s0, t2 s1, t2 s2, t2 s3, t2 s4, t2 s5);
In addition to increasing the parallelism level, you can also use partition derivation to collect statistics for partitioned tables. This involves collecting statistics for partitions and deriving global statistics from partition statistics to improve the efficiency of statistics collection. For example, you can adjust the partition collection method without increasing the parallelism level, as shown in the following example.
# Scenario where histograms are not collected
CALL DBMS_STATS.GATHER_TABLE_STATS('test', 't_part', granularity=>'APPROX_GLOBAL AND PARTITION', method_opt=>'for all columns size 1');
# Scenario where histograms are collected and the default strategy is used
CALL DBMS_STATS.GATHER_TABLE_STATS('test', 't_part', granularity=>'APPROX_GLOBAL AND PARTITION');
For more information about statistics collection and update strategies for partitioned tables, see Collection and update strategies of statistics for partitioned tables.
Statistics collection at schema (database) level
In addition to manually collecting statistics for individual tables, OceanBase Database provides the DBMS_STATS package to collect statistics for all tables in a user.
Note
This feature is supported only in OceanBase Database V4.x in dual mode and OceanBase Database V3.2.x in Oracle mode.
Collecting statistics for all tables in a user can be time-consuming. We recommend that you use this feature during off-peak hours.
If all tables in the user are small tables (with no more than 1 million rows), you can directly use the following statements to collect statistics for the test user.
# Scenario where histograms are not collected
CALL DBMS_STATS.GATHER_SCHEMA_STATS('test', method_opt=>'for all columns size 1');
# Scenario where histograms are collected and the default strategy is used
CALL DBMS_STATS.GATHER_SCHEMA_STATS('test');
If the user contains large tables (with millions of rows), you can increase the parallelism level during off-peak hours to speed up statistics collection.
# Scenario where histograms are not collected
CALL DBMS_STATS.GATHER_SCHEMA_STATS('test', degree=>'16', method_opt=>'for all columns size 1');
# Scenario where histograms are collected and the default strategy is used
CALL DBMS_STATS.GATHER_SCHEMA_STATS('test', degree=>'16');
If the user contains extremely large tables (with more than 100 million rows), you can collect statistics for the extremely large tables separately with a higher parallelism level, lock the statistics of the extremely large tables, and then use the preceding statements to collect statistics for all tables in the user. After the statistics collection is complete, unlock the statistics of the extremely large tables and collect statistics incrementally, as shown in the following example:
CALL DBMS_STATS.GATHER_TABLE_STATS('test', 'big_table', degree=>128, method_opt=>'for all columns size 1');
CALL DBMS_STATS.LOCK_TABLE_STATS('test','big_table');
CALL DBMS_STATS.GATHER_SCHEMA_STATS('TEST', degree=>'16', method_opt=>'for all columns size 1');
CALL DBMS_STATS.UNLOCK_TABLE_STATS('test','big_table');
After each statistics collection, perform a health check:
-- Example: Verify whether statistics for test.t1 are collected.
SELECT table_name, num_rows, last_analyzed
FROM oceanbase.DBA_TAB_STATISTICS
WHERE owner = 'test' AND table_name = 'T1' AND object_type = 'TABLE';
Monitor the progress and health status of statistics collection
Why do you need to monitor the progress of statistics collection? Statistics collection can take several minutes to several hours. You need to:
- Determine whether the task is still running.
- Determine whether the task is stuck or failed.
- Verify whether the collected statistics are valid.
- Determine whether the statistics are outdated or distorted.
Query the progress of statistics collection
OceanBase Database V4.2.0 and later support the statistics collection monitoring and diagnostic feature. After you manually or automatically initiate statistics collection, you can query the real-time collection status and progress from the (G)V$OB_OPT_STAT_GATHER_MONITOR view.
For more information about the fields, examples, and other diagnostic methods of this view, see Monitor and diagnose statistics collection.
View running statistics collection task
Note
- In MySQL mode, you typically use
oceanbase.GV$OB_OPT_STAT_GATHER_MONITOR(oroceanbase.V$OB_OPT_STAT_GATHER_MONITOR). - In Oracle mode, you typically use
SYS.GV$OB_OPT_STAT_GATHER_MONITOR(orSYS.V$OB_OPT_STAT_GATHER_MONITOR).
SELECT
tenant_id,
svr_ip,
svr_port,
session_id,
trace_id,
task_id,
type,
task_start_time,
task_duration_time,
task_table_count,
completed_table_count,
running_table_owner,
running_table_name,
running_table_duration_time,
running_table_progress
FROM oceanbase.GV$OB_OPT_STAT_GATHER_MONITOR
WHERE type = 'MANUAL GATHER';
| Field | Description |
|---|---|
TASK_TABLE_COUNT |
The number of tables to be collected by the task. |
COMPLETED_TABLE_COUNT |
The number of tables completed by the task. |
RUNNING_TABLE_OWNER / RUNNING_TABLE_NAME |
The table being collected. |
RUNNING_TABLE_PROGRESS |
The progress of the table being collected (in string format). |
If you want to view the real-time status of the current tenant on the current OBServer node, use V$OB_OPT_STAT_GATHER_MONITOR.
View the historical task records (task level)
-- Query the historical execution of manual collection tasks.
SELECT
tenant_id,
task_id,
type,
status,
table_count,
failed_count,
start_time,
end_time
FROM oceanbase.DBA_OB_TASK_OPT_STAT_GATHER_HISTORY
WHERE type = 'MANUAL GATHER'
ORDER BY start_time DESC;
View the historical task records (table level)
-- Query the failed tables in the manual collection process (no results returned if no failures)
SELECT
owner,
table_name,
task_id,
status,
start_time,
end_time,
stat_refresh_failed_list,
properties
FROM oceanbase.DBA_OB_TABLE_OPT_STAT_GATHER_HISTORY
WHERE status = 'FAILED'
ORDER BY start_time DESC;
STAT_REFRESH_FAILED_LISTindicates the list of nodes where statistics collection is completed but statistics cache refresh fails. When this field appears, you are advised to check the status of the related nodes based on the view information and refresh the Plan Cache if necessary.
Verify whether the statistics are effective or healthy
After statistics collection is completed, you need to confirm that the statistics are written and available.
Check whether table-level statistics exist
Note
- In MySQL mode, you can use
oceanbase.DBA_TAB_STATISTICSandoceanbase.DBA_TAB_COL_STATISTICS. - In Oracle mode, you can use
SYS.DBA_TAB_STATISTICSandSYS.DBA_TAB_COL_STATISTICS.
SELECT
table_name,
num_rows, -- Number of rows
avg_row_len, -- Average row length
sample_size, -- Sample size
last_analyzed, -- Last collected time
stale_stats, -- Whether the statistics are outdated
stattype_locked -- Whether the statistics are locked
FROM oceanbase.DBA_TAB_STATISTICS
WHERE owner = 'test' AND table_name = 'T1' AND object_type = 'TABLE';
Health indicators:
last_analyzedis a recent time (consistent with the current collection time).- Key fields such as
num_rowsandsample_sizeare not empty and match the business scale. stale_stats = 'NO'(the statistics are not outdated).
Check whether column histograms are generated (if SIZE > 1 is specified)
SELECT
column_name,
num_distinct, -- Number of distinct values
num_nulls, -- Number of NULL values
num_buckets, -- Number of buckets
histogram, -- Histogram type
last_analyzed
FROM oceanbase.DBA_TAB_COL_STATISTICS
WHERE owner = 'test' AND table_name = 'T1';
Histogram health indicators:
- If the
histogramfield displays the histogram type (for example,HYBRID) andnum_buckets > 1, it usually indicates that the histogram is generated. - The statistical values such as
num_distinctandnum_nullsare recommended to be verified for consistency with business expectations to avoid distortion caused by small sampling or improper strategies.
Check whether statistics are locked (to prevent automatic updates from overwriting them)
SELECT stattype_locked
FROM oceanbase.DBA_TAB_STATISTICS
WHERE owner = 'test' AND table_name = 'BIG_TABLE' AND object_type = 'TABLE';
stattype_locked = 'ALL': lockedNULL: not locked
Troubleshoot common issues
| Issue | Possible cause | Solution |
|---|---|---|
| The statistics collection task makes no progress for a long time. | The table is extremely large, the parallelism is low, or resources are being contended. | Increase the degree and execute the task during off-peak hours. |
num_rows = 0 or NULL. |
The statistics collection task is not completed or failed. | Check DBA_OB_TASK_OPT_STAT_GATHER_HISTORY and DBA_OB_TABLE_OPT_STAT_GATHER_HISTORY. |
| No histogram is generated. | method_opt does not specify SIZE > 1. |
Explicitly set FOR COLUMNS col SIZE 254. |
| The query still uses an incorrect plan. | The statistics cache is not refreshed. | Execute ALTER SYSTEM FLUSH PLAN CACHE;. |
