Statistics are essential for the optimizer to generate execution plans. Accurate and timely statistics can help the optimizer generate better-performing plans. In most systems, you do not need to manually collect statistics. The optimizer performs a scheduled task every day to collect statistics on tables to be updated. However, analytical processing (AP) systems usually contain ultra-large tables. In this case, statistics cannot be normally collected based on default statistics collection strategies, which affects the plan generation. Therefore, you must configure statistics collection.
Modify the statistics collection window
By default, the OceanBase Database optimizer automatically collects statistics every day based on the maintenance windows to ensure iterative updates of statistics. By default, the optimizer starts to collect statistics at 22:00 on Monday to Friday with the maximum collection duration of 4 hours and at 6:00 on Saturday and Sunday with the maximum collection duration of 20 hours. The following table describes the maintenance windows.
| Maintenance window | Start time/Frequency | Maximum collection duration |
|---|---|---|
| MONDAY_WINDOW | 22:00/Per week | 4 hours |
| TUESDAY_WINDOW | 22:00/Per week | 4 hours |
| WEDNESDAY_WINDOW | 22:00/Per week | 4 hours |
| THURSDAY_WINDOW | 22:00/Per week | 4 hours |
| FRIDAY_WINDOW | 22:00/Per week | 4 hours |
| SATURDAY_WINDOW | 6:00/Per week | 20 hours |
| SUNDAY_WINDOW | 6:00/Per week | 20 hours |
You can configure the maintenance windows based on your business needs. For example, when a maintenance window coincides with peak hours, you can modify the start time of the maintenance window or specify not to collect statistics on specific dates. When your business environment contains a large number of tables or many ultra-large tables, you can modify the maximum collection duration of the maintenance window.
Here are some examples.
-- Disable automatic statistics collection on Mondays
call dbms_scheduler.disable('MONDAY_WINDOW');
-- Enable automatic statistics collection on Mondays
call dbms_scheduler.enable('MONDAY_WINDOW');
-- Set the start time of the automatic statistics collection job on Mondays to 8 p.m.
call dbms_scheduler.set_attribute('MONDAY_WINDOW', 'NEXT_DATE', '2022-09-12 20:00:00');
-- Set the duration of the automatic statistics collection job on Mondays to 6 hours
-- 6 hours <=> 6 * 60 * 60 * 1000 * 1000 <=> 21600000000 us
call dbms_scheduler.set_attribute('WEDNESDAY_WINDOW', 'JOB_ACTION', 'DBMS_STATS.GATHER_DATABASE_STATS_JOB_PROC(21600000000)');
Statistics collection strategies for ultra-large tables
The optimizer may be unable to complete statistics collection on ultra-large tables within one maintenance window based on default statistics collection strategies. Therefore, you must configure proper statistics collection strategies for ultra-large tables. The reasons why statistics collection on ultra-large tables is time-consuming are as follows:
- The tables contain a large amount of data. Therefore, full-table scans during statistics collection take a long time.
- Histogram collection involves complex computations, which causes extra costs.
- By default, statistics and histograms are collected on subpartitions, partitions, and whole tables of large partitioned tables. The cost is equal to 3 × (full-table scan cost + histogram collection cost).
In view of the preceding reasons, you can optimize statistics collection configurations based on your query needs and characteristics of your tables. We recommend that you:
Set an appropriate default degree of parallelism (DOP) for statistics collection. Note that after you set the DOP, you must schedule related automatic statistics collection jobs to run during off-peak hours to avoid impact on your business. We recommend that you set the DOP to 8 or a smaller value. A sample statement is as follows:
-- The following statement is applicable to both Oracle and MySQL business tenants. call dbms_stats.set_table_prefs('database_name', 'table_name', 'degree', '8');Set default histogram collection methods for columns. We recommend that you specify not to collect histograms on columns where data is evenly distributed.
-- The following statements are applicable to both Oracle and MySQL business tenants. -- 1. If data is evenly distributed in all columns of the table, you can specify to skip histogram collection on all columns. A sample statement is as follows: call dbms_stats.set_table_prefs('database_name', 'table_name', 'method_opt', 'for all columns size 1'); -- 2. If data is unevenly distributed in a few columns of the table, you can specify to collect histograms on these columns and skip histogram collection on other columns. The following sample statement specifies to collect histograms on the `c1` and `c2` columns and not on the `c3`, `c4`, or c5 column. call dbms_stats.set_table_prefs('database_name', 'table_name', 'method_opt', 'for columns c1 size 254, c2 size 254, c3 size 1, c4 size 1, c5 size 1');Set default statistics collection granularities for partitioned tables. For HASH-partitioned tables and KEY-partitioned partitions, you can specify to collect only global statistics or specify to deduce global statistics based on partition-level statistics.
-- The following statements are applicable to both Oracle and MySQL business tenants. -- 1. Specify to collect only global statistics. call dbms_stats.set_table_prefs('database_name', 'table_name', 'granularity', 'GLOBAL'); -- 2. Specify to deduce global statistics based on partition-level statistics. call dbms_stats.set_table_prefs('database_name', 'table_name', 'granularity', 'APPROX_GLOBAL AND PARTITION');Use large-table sampling with caution. When you use large-table sampling to collect statistics, the number of samples for a histogram becomes very large, which affects collection efficiency. Large-table sampling is suitable for collecting only basic statistics but not histograms.
-- The following statements are applicable to both Oracle and MySQL business tenants. -- 1. Specify to skip histogram collection on all columns. call dbms_stats.set_table_prefs('database_name', 'table_name', 'method_opt', 'for all columns size 1'); -- 2. Set the sampling ratio to 10%. call dbms_stats.set_table_prefs('database_name', 'table_name', 'estimate_percent', '10');
If you want to clear or delete a default statistics collection strategy that you specified, you need to specify only the attribute to be deleted. A sample statement is as follows:
-- The following statement is applicable to both Oracle and MySQL business tenants. It deletes the `granularity` strategy.
call dbms_stats.delete_table_prefs('database_name', 'table_name', 'granularity');
After you set a statistics collection strategy, you can query whether it has taken effect. A sample statement is as follows:
-- The following statement is applicable to both Oracle and MySQL business tenants. It queries the specified DOP.
select dbms_stats.get_prefs('degree', 'database_name','table_name') from dual;
You can also lock statistics after you manually collect statistics on large tables. Note that the locked statistics will not be automatically updated. This is suitable for scenarios with slight changes of data characteristics and insensitive to data values. If you want to recollect the locked statistics, you must unlock them first.
-- The following statement is applicable to both Oracle and MySQL business tenants. It locks the statistics on a specified table.
call dbms_stats.lock_table_stats('database_name', 'table_name');
-- The following statement is applicable to both Oracle and MySQL business tenants. It unlocks the statistics on a specified table.
call dbms_stats.unlock_table_stats('database_name', 'table_name');
References
- Statistics include table-level statistics and column-level statistics. For more information, see Overview.
- The OceanBase Database optimizer supports manual and automatic statistics collection. For more information, see Overview.
- For information about how to manage statistics, see Statistics management.
- For examples of statistics collection, see Examples.