OceanBase Database allows you to manually collect statistics by using the DBMS_STATS package and the ANALYZE statement.
Note
OceanBase Database does not support estimating the time required to collect statistics. We recommend that you set the ob_query_timeout parameter to a sufficiently large value to ensure that statistics collection is completed. If statistics collection is interrupted due to a timeout, the statistics collected so far will be retained. This is particularly important when collecting statistics at the schema level.
Usage notes and limitations
We recommend that you increase the ob_query_timeout variable at the session level. This only affects the current session and does not interfere with other operations.
Procedure
| Step | Action | Command/View |
|---|---|---|
| Prepare | Increase ob_query_timeout |
SET SESSION ob_query_timeout = 36000000000; |
| Execute | Call DBMS_STATS or ANALYZE |
CALL dbms_stats.gather_table_stats(...); |
| Monitor | View the collection progress and status | (G)V$OB_OPT_STAT_GATHER_MONITOR |
| Verify | Check whether the statistics are written or expired | DBA_TAB_STATISTICS, DBA_TAB_COL_STATISTICS (and histogram-related views) |
| Troubleshoot | Handle failures or exceptions | DBA_OB_TASK_OPT_STAT_GATHER_HISTORY, DBA_OB_TABLE_OPT_STAT_GATHER_HISTORY + Plan cache (including refresh and eviction instructions) |
Collect statistics using the DBMS_STATS package
OceanBase Database allows you to collect table-level, schema-level, and index-level statistics by using the DBMS_STATS (MySQL mode) or DBMS_STATS (Oracle mode) package. You can call the gather_table_stats, gather_schema_stats, and gather_index_stats stored procedures respectively.
PROCEDURE gather_table_stats (
ownname VARCHAR2,
tabname VARCHAR2,
partname VARCHAR2 DEFAULT NULL,
estimate_percent NUMBER DEFAULT AUTO_SAMPLE_SIZE,
block_sample BOOLEAN DEFAULT FALSE,
method_opt VARCHAR2 DEFAULT DEFAULT_METHOD_OPT,
degree NUMBER DEFAULT NULL,
granularity VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
cascade BOOLEAN DEFAULT NULL,
stattab VARCHAR2 DEFAULT NULL,
statid VARCHAR2 DEFAULT NULL,
statown VARCHAR2 DEFAULT NULL,
no_invalidate BOOLEAN DEFAULT FALSE,
stattype VARCHAR2 DEFAULT 'DATA',
force BOOLEAN DEFAULT FALSE
);
PROCEDURE gather_schema_stats (
ownname VARCHAR2,
estimate_percent NUMBER DEFAULT AUTO_SAMPLE_SIZE,
block_sample BOOLEAN DEFAULT FALSE,
method_opt VARCHAR2 DEFAULT DEFAULT_METHOD_OPT,
degree NUMBER DEFAULT NULL,
granularity VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
cascade BOOLEAN DEFAULT NULL,
stattab VARCHAR2 DEFAULT NULL,
statid VARCHAR2 DEFAULT NULL,
statown VARCHAR2 DEFAULT NULL,
no_invalidate BOOLEAN DEFAULT FALSE,
stattype VARCHAR2 DEFAULT 'DATA',
force BOOLEAN DEFAULT FALSE
);
PROCEDURE gather_index_stats (
ownname VARCHAR2,
indname VARCHAR2,
partname VARCHAR2 DEFAULT NULL,
estimate_percent NUMBER DEFAULT AUTO_SAMPLE_SIZE,
stattab VARCHAR2 DEFAULT NULL,
statid VARCHAR2 DEFAULT NULL,
statown VARCHAR2 DEFAULT NULL,
degree NUMBER DEFAULT NULL,
granularity VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
no_invalidate BOOLEAN DEFAULT FALSE,
force BOOLEAN DEFAULT FALSE,
tabname VARCHAR2 DEFAULT NULL
);
DBMS_STATS.GATHER_TABLE_STATS (
ownname VARCHAR2,
tabname VARCHAR2,
partname VARCHAR2 DEFAULT NULL,
estimate_percent NUMBER DEFAULT AUTO_SAMPLE_SIZE,
block_sample BOOLEAN DEFAULT FALSE,
method_opt VARCHAR2 DEFAULT DEFAULT_METHOD_OPT,
degree NUMBER DEFAULT to_degree_type(get_param('DEGREE')),
granularity VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
cascade BOOLEAN DEFAULT NULL,
no_invalidate BOOLEAN DEFAULT FALSE,
stattype VARCHAR2 DEFAULT 'DATA',
force BOOLEAN DEFAULT FALSE,
hist_est_percent NUMBER DEFAULT AUTO_SAMPLE_SIZE,
hist_block_sample BOOLEAN DEFAULT NULL
);
DBMS_STATS.GATHER_SCHEMA_STATS (
ownname VARCHAR2,
estimate_percent NUMBER DEFAULT AUTO_SAMPLE_SIZE,
block_sample BOOLEAN DEFAULT FALSE,
method_opt VARCHAR2 DEFAULT DEFAULT_METHOD_OPT,
degree NUMBER DEFAULT NULL,
granularity VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
cascade BOOLEAN DEFAULT NULL,
no_invalidate BOOLEAN DEFAULT FALSE,
stattype VARCHAR2 DEFAULT 'DATA',
force BOOLEAN DEFAULT FALSE);
DBMS_STATS.GATHER_INDEX_STATS (
ownname VARCHAR2,
indname VARCHAR2,
partname VARCHAR2 DEFAULT NULL,
estimate_percent NUMBER DEFAULT AUTO_SAMPLE_SIZE,
stattab VARCHAR2 DEFAULT NULL,
statown VARCHAR2 DEFAULT NULL,
degree NUMBER DEFAULT NULL,
granularity VARCHAR2 DEFAULT DEFAULT_GRANULARITY,
no_invalidate BOOLEAN DEFAULT FALSE,
force BOOLEAN DEFAULT FALSE,
tabname VARCHAR2 DEFAULT NULL);
For more information about the parameters, see:
- GATHER_TABLE_STATS (MySQL mode) or GATHER_TABLE_STATS (Oracle mode)
- GATHER_SCHEMA_STATS (MySQL mode) or GATHER_SCHEMA_STATS (Oracle mode)
- GATHER_INDEX_STATS (MySQL mode) or GATHER_INDEX_STATS (Oracle mode)
The DBMS_STATS package uses the default Prefs settings. For more information about how to set Prefs, see Manage collection policy configuration items (Prefs).
PROCEDURE GATHER_DATABASE_STATS_JOB_PROC();
Examples
Here are some examples of collecting statistics by using the DBMS_STATS package:
Collect global-level statistics for the
tbl1table of theuseruser.CALL dbms_stats.gather_table_stats('user', 'tbl1', granularity=>'GLOBAL', method_opt=>'FOR ALL COLUMNS SIZE AUTO');Collect partition-level statistics for the
t_part1table of theuseruser. Set the degree of parallelism to 64 and collect histograms for only the columns with uneven data distribution.CALL dbms_stats.gather_table_stats('user', 't_part1', degree=>'64', granularity=>'PARTITION', method_opt=>'FOR ALL COLUMNS SIZE SKEWONLY');Collect all statistics for the
t_subpart1table of theuseruser. Set the degree of parallelism to 128 and collect only 50% of the data. The optimizer decides whether to collect histograms for all columns.CALL dbms_stats.gather_table_stats('user', 't_subpart1', degree=>'128', estimate_percent=> '50', granularity=>'ALL', method_opt=>'FOR ALL COLUMNS SIZE AUTO');
After you execute the gather_table_stats stored procedure, you can query the (G)V$OB_OPT_STAT_GATHER_MONITOR view to confirm whether the task is started.
GRANULARITY and Table / Index statistics collection
By default, when you collect table-level statistics, all index-level statistics on the table are also collected. This is because some index-level statistics can be derived from the table-level statistics, such as the number of rows and average row length. This approach accelerates the collection of statistics and avoids scanning the entire table data twice. The GRANULARITY parameter has different settings for local and global indexes, as shown in the following table.
| Partition specified | GRANULARITY | Global index | Local index |
|---|---|---|---|
| No | ALL | GLOBAL | ALL |
| No | AUTO | GLOBAL | AUTO |
| No | DEFAULT | GLOBAL | DEFAULT |
| No | GLOBAL AND PARTITION | GLOBAL | GLOBAL AND PARTITION |
| No | APPROX_GLOBAL AND PARTITION | GLOBAL | APPROX_GLOBAL AND PARTITION |
| No | GLOBAL | GLOBAL | GLOBAL |
| No | PARTITION | GLOBAL | PARTITION |
| No | SUBPARTITION | None | SUBPARTITION |
| Yes | ALL | GLOBAL | ALL |
| Yes | AUTO | GLOBAL | AUTO |
| Yes | DEFAULT | GLOBAL | DEFAULT |
| Yes | GLOBAL AND PARTITION | GLOBAL | GLOBAL AND PARTITION |
| Yes | APPROX_GLOBAL AND PARTITION | None | APPROX_GLOBAL AND PARTITION |
| Yes | GLOBAL | GLOBAL | GLOBAL |
| Yes | PARTITION | None | PARTITION |
| Yes | SUBPARTITION | None | SUBPARTITION |
Collect statistics by using the ANALYZE statement
In OceanBase Database in Oracle mode and MySQL mode, you can use the ANALYZE statement to collect statistics.
In MySQL mode, the syntax of the ANALYZE statement is as follows:
analyze_stmt:
ANALYZE TABLE table_name UPDATE HISTOGRAM ON column_name_list WITH INTNUM BUCKETS
If the tenant-level parameter enable_sql_extension is set to TRUE, you can use the Oracle mode syntax, which is an extended syntax provided by OceanBase Database for MySQL mode.
analyze_stmt:
ANALYZE TABLE table_name [use_partition] analyze_statistics_clause
Here is an example of using the ANALYZE statement in MySQL mode to collect statistics:
Collect statistics for the
tbl1table with 30 buckets per column.ANALYZE TABLE tbl1 UPDATE HISTOGRAM ON a, b, c, d WITH 30 BUCKETS;Collect statistics for the
tbl1table of thetestuser in MySQL mode by using the Oracle mode syntax.ALTER SYSTEM SET ENABLE_SQL_EXTENSION = TRUE; ANALYZE TABLE tbl1 COMPUTE STATISTICS FOR ALL COLUMNS SIZE AUTO;
In Oracle mode, the syntax of the ANALYZE statement is as follows:
analyze_stmt:
ANALYZE TABLE table_name [use_partition] analyze_statistics_clause
use_partition:
PARTITION (parition_name [,partition_name...])
| SUBPARTITION(subpartition_name [,subpartition_name...])
analyze_statistics_clause:
COMPUTE STATISTICS [analyze_for_clause]
| ESTIMATE STATISTICS [analyze_for_clause] [SAMPLE INTNUM {ROWS | PERCENTAGE}]
analyze_for_clause:
FOR TABLE
| FOR ALL [INDEXED | HIDDEN] COLUMNS [size_clause]
| FOR COLUMNS [size_clause] column [size_clause] [,column [size_clause]...]
size_clause:
SIZE integer
| SIZE REPEAT
| SIZE AUTO
| SIZE SKEWONLY
column:
column_name
| (column_name [,column_name])
In the preceding syntax, the histogram collection method for columns is the same as the method_opt syntax of the DBMS_STATS package. For more information, see method_opt.
Compared with the statistics collection methods of the DBMS_STATS package, the ANALYZE statement does not provide as many strategy settings. To support the collection of GLOBAL statistics by using the ANALYZE statement, the rule is to set the partition_name parameter in the use_partition syntax to the table name.
Here is an example of using the ANALYZE statement in Oracle mode to collect statistics:
Collect statistics for the
tbl1table of theuseruser.ANALYZE TABLE tbl1 COMPUTE STATISTICS FOR ALL COLUMNS SIZE AUTO;Collect
GLOBALstatistics for thet_part1table of theuseruser and collect histograms for columns with uneven data distribution.ANALYZE TABLE t_part1 PARTITION (t_part1) COMPUTE STATISTICS FOR ALL COLUMNS SIZE SKEWONLY;Collect statistics for the
p0sp0andp1sp2partitions of thet_subpart1table of theuseruser. The histograms for all columns are determined by the optimizer.ANALYZE TABLE t_subpart1 SUBPARTITION(p0sp0,p1sp2) COMPUTE STATISTICS FOR ALL COLUMNS SIZE AUTO;
Practice scenarios for manual statistics collection
Statistics collection for non-partitioned tables
When the product of the number of rows and columns in a table does not exceed 10 million, you can use the following commands to collect statistics for non-partitioned tables. In the following example, the test.t1 table has 10 columns and contains 1 million rows.
CREATE TABLE test.t1(c1 INT, c2 INT, c3 INT, c4 INT, c5 INT, c6 INT, c7 INT, c8 INT, c9 INT, c10 INT);
CREATE TABLE t2 (c1 INT PRIMARY KEY);
INSERT INTO t2 VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
INSERT /*+APPEND*/ INTO t1 SELECT * FROM (SELECT s0.c1 c1, s1.c1 c2, s2.c1 c3, s3.c1 c4, s4.c1 c5, s5.c1 c6, s0.c1 * 10 c7, s1.c1 * 10 c8, s2.c1 * 10 c9, s3.c1 * 10 c10 FROM t2 s0, t2 s1, t2 s2, t2 s3, t2 s4, t2 s5);
# Scenario where histograms are not collected
CALL DBMS_STATS.GATHER_TABLE_STATS('test', 't1', method_opt=>'for all columns size 1');
# Scenario where histograms are collected and the default strategy is used
CALL DBMS_STATS.GATHER_TABLE_STATS('test', 't1');
When the product of the number of rows and columns in a table exceeds 10 million, you can set a parallel degree based on the data business scenario and system resources to accelerate statistics collection. In the following example, the test.t1 table contains 10 million rows and uses a parallel degree of 8.
# Scenario where histograms are not collected
CALL DBMS_STATS.GATHER_TABLE_STATS('test', 't1', degree=>8, method_opt=>'for all columns size 1');
# Scenario where histograms are collected and the default strategy is used
CALL DBMS_STATS.GATHER_TABLE_STATS('test', 't1', degree=>8);
Statistics collection for partitioned tables
For partitioned tables, you need to consider statistics collection for partitions. In systems with sufficient resources, we recommend that you set the parallel degree for statistics collection for partitioned tables to twice that for non-partitioned tables.
In the following example, the test.t_part table has 10 columns and contains 1 million rows. The parallel degree is set to 2 because statistics collection for partitions is enabled.
CREATE TABLE t_part(c1 INT, c2 INT, c3 INT, c4 INT, c5 INT, c6 INT, c7 INT, c8 INT, c9 INT, c10 INT) PARTITION BY HASH(c1) PARTITIONS 128;
CREATE TABLE t2 (c1 INT PRIMARY KEY);
INSERT INTO t2 VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
INSERT /*+APPEND*/ INTO t_part SELECT * FROM (SELECT s0.c1 c1, s1.c1 c2, s2.c1 c3, s3.c1 c4, s4.c1 c5, s5.c1 c6, s0.c1 * 10 c7, s1.c1 * 10 c8, s2.c1 * 10 c9, s3.c1 * 10 c10 FROM t2 s0, t2 s1, t2 s2, t2 s3, t2 s4, t2 s5);
In addition to increasing the parallel degree, you can also use partition derivation to collect statistics for partitions and derive global statistics from partition statistics to accelerate statistics collection. For example, you can collect statistics for partitions without increasing the parallel degree. Here is an example:
# Scenario where histograms are not collected
CALL DBMS_STATS.GATHER_TABLE_STATS('test', 't_part', granularity=>'APPROX_GLOBAL AND PARTITION', method_opt=>'for all columns size 1');
# Scenario where histograms are collected and the default strategy is used
CALL DBMS_STATS.GATHER_TABLE_STATS('test', 't_part', granularity=>'APPROX_GLOBAL AND PARTITION');
Statistics collection at the schema (database) level
In addition to manually collecting statistics for individual tables, OceanBase Database provides the DBMS_STATS package to collect statistics for all tables in a user.
Note
This feature is supported only in OceanBase Database V4.x in dual mode and OceanBase Database V3.2.x in Oracle mode.
Collecting statistics for all tables in a user is time-consuming. We recommend that you perform this operation during off-peak hours.
If all tables in the user are small tables (containing no more than 1 million rows), you can use the following commands to collect statistics for the test user.
# Scenario where histograms are not collected
CALL DBMS_STATS.GATHER_SCHEMA_STATS('test', method_opt=>'for all columns size 1');
# Scenario where histograms are collected and the default strategy is used
CALL DBMS_STATS.GATHER_SCHEMA_STATS('test');
If the user contains large tables (containing millions of rows), you can increase the parallel degree during off-peak hours to accelerate statistics collection.
# Scenario where histograms are not collected
CALL DBMS_STATS.GATHER_SCHEMA_STATS('test', degree=>'16', method_opt=>'for all columns size 1');
# Scenario where histograms are collected and the default strategy is used
CALL DBMS_STATS.GATHER_SCHEMA_STATS('test', degree=>'16');
If the user contains a super-large table (containing more than 100 million rows), you can collect statistics for the super-large table separately with a large parallel degree, lock the statistics for the super-large table, and then use the preceding commands to collect statistics for all tables in the user. After the statistics are collected, you can unlock the statistics for the super-large table and collect statistics incrementally.
CALL DBMS_STATS.GATHER_TABLE_STATS('test', 'big_table', degree=>128, method_opt=>'for all columns size 1');
CALL DBMS_STATS.LOCK_TABLE_STATS('test','big_table');
CALL DBMS_STATS.GATHER_SCHEMA_STATS('TEST', degree=>'16', method_opt=>'for all columns size 1');
CALL DBMS_STATS.UNLOCK_TABLE_STATS('test','big_table');
After statistics are collected, you must perform a health check:
-- Example: Verify whether statistics are collected for the test.t1 table.
SELECT table_name, num_rows, last_analyzed
FROM oceanbase.DBA_TAB_STATISTICS
WHERE owner = 'test' AND table_name = 'T1' AND object_type = 'TABLE';
Monitor the progress and health status of statistics collection
Why monitor? Statistics collection can take from several minutes to several hours. You need to:
- Know whether the task is still running in real time
- Determine whether it is stuck or failed
- Verify the validity of the collected results
- Assess whether the statistics are "stale" or "distorted"
Query the progress of statistics collection
OceanBase Database V4.2.0 and later support the statistics collection monitoring and diagnostic feature. After you manually or automatically initiate statistics collection, you can query the real-time collection status and progress from the (G)V$OB_OPT_STAT_GATHER_MONITOR view.
For more information about the fields, examples, and other diagnostic methods of this view, see Monitor statistics collection.
View the running statistics collection tasks
Note
- In MySQL mode, use
oceanbase.GV$OB_OPT_STAT_GATHER_MONITOR(oroceanbase.V$OB_OPT_STAT_GATHER_MONITOR). - In Oracle mode, use
SYS.GV$OB_OPT_STAT_GATHER_MONITOR(orSYS.V$OB_OPT_STAT_GATHER_MONITOR).
SELECT
tenant_id,
svr_ip,
svr_port,
session_id,
trace_id,
task_id,
type,
task_start_time,
task_duration_time,
task_table_count,
completed_table_count,
running_table_owner,
running_table_name,
running_table_duration_time,
running_table_progress
FROM oceanbase.GV$OB_OPT_STAT_GATHER_MONITOR
WHERE type = 'MANUAL GATHER';
| Field | Description |
|---|---|
TASK_TABLE_COUNT |
The number of tables to be collected in the task. |
COMPLETED_TABLE_COUNT |
The number of tables completed in the task. |
RUNNING_TABLE_OWNER / RUNNING_TABLE_NAME |
The table being collected. |
RUNNING_TABLE_PROGRESS |
The progress of the table being collected (in string format). |
To view the real-time status of the current tenant on the current OBServer node, use V$OB_OPT_STAT_GATHER_MONITOR.
View historical task records (task-level)
-- Query the historical execution of manual collection tasks.
SELECT
tenant_id,
task_id,
type,
status,
table_count,
failed_count,
start_time,
end_time
FROM oceanbase.DBA_OB_TASK_OPT_STAT_GATHER_HISTORY
WHERE type = 'MANUAL GATHER'
ORDER BY start_time DESC;
View historical task records (table-level)
-- Query the failed tables in manual collection tasks. If there are no failed tables, no records will be returned.
SELECT
owner,
table_name,
task_id,
status,
start_time,
end_time,
stat_refresh_failed_list,
properties
FROM oceanbase.DBA_OB_TABLE_OPT_STAT_GATHER_HISTORY
WHERE status = 'FAILED'
ORDER BY start_time DESC;
STAT_REFRESH_FAILED_LISTindicates the list of nodes where statistics collection is completed but the statistics cache refresh fails. When this field appears, you are advised to check the status of the relevant nodes based on the view information and refresh the Plan Cache if necessary.
Verify whether the statistics are effective and healthy
After statistics collection is completed, you need to confirm that the statistics have been written and are available.
Check whether table-level statistics exist
Note
- In MySQL mode, you can use the
oceanbase.DBA_TAB_STATISTICSandoceanbase.DBA_TAB_COL_STATISTICSviews. - In Oracle mode, you can use the
SYS.DBA_TAB_STATISTICSandSYS.DBA_TAB_COL_STATISTICSviews.
SELECT
table_name,
num_rows, -- Number of rows
avg_row_len, -- Average row length
sample_size, -- Sampling size
last_analyzed, -- Last collected time
stale_stats, -- Whether the statistics are stale
stattype_locked -- Whether the statistics are locked
FROM oceanbase.DBA_TAB_STATISTICS
WHERE owner = 'test' AND table_name = 'T1' AND object_type = 'TABLE';
Health indicators:
last_analyzedis a recent time (consistent with the current collection time)num_rows/sample_sizeand other key fields are not empty and are approximately consistent with the business scalestale_stats = 'NO'(statistics are not stale)
Check whether column histograms are generated (if SIZE > 1 is specified)
SELECT
column_name,
num_distinct, -- Number of distinct values
num_nulls, -- Number of NULL values
num_buckets, -- Number of buckets
histogram, -- Histogram type
last_analyzed
FROM oceanbase.DBA_TAB_COL_STATISTICS
WHERE owner = 'test' AND table_name = 'T1';
Histogram health indicators:
- If the
histogramfield displays the histogram type (for example,HYBRID) andnum_buckets > 1, it usually indicates that the histogram has been generated. - The statistical values such as
num_distinctandnum_nullsare recommended to be verified for consistency with business expectations (to avoid distortion caused by insufficient sampling or inappropriate strategies).
Check whether statistics are locked (to prevent automatic updates from overwriting them)
SELECT stattype_locked
FROM oceanbase.DBA_TAB_STATISTICS
WHERE owner = 'test' AND table_name = 'BIG_TABLE' AND object_type = 'TABLE';
stattype_locked = 'ALL': LockedNULL: Not locked
Troubleshoot common issues
| Issue | Possible cause | Solution |
|---|---|---|
| The statistics collection task makes no progress for a long time. | The table is extremely large, the parallelism is low, and resources are contended. | Increase the degree and execute the task during off-peak hours. |
num_rows = 0 or NULL. |
The statistics collection is not completed or has failed. | Check the DBA_OB_TASK_OPT_STAT_GATHER_HISTORY and DBA_OB_TABLE_OPT_STAT_GATHER_HISTORY views. |
| Histograms are not generated. | The method_opt parameter does not specify SIZE > 1. |
Explicitly set FOR COLUMNS col SIZE 254. |
| The query still uses the wrong plan. | The statistics cache has not been refreshed. | Execute ALTER SYSTEM FLUSH PLAN CACHE;. |
