OceanBase logo

OceanBase

A unified distributed database ready for your transactional, analytical, and AI workloads.

DEPLOY YOUR WAY

OceanBase Cloud

The best way to deploy and scale OceanBase

OceanBase Enterprise

Run and manage OceanBase on your infra

TRY OPEN SOURCE

OceanBase Community Edition

The free, open-source distributed database

OceanBase seekdb

Open source AI native search database

Customer Stories

Real-world success stories from enterprises across diverse industries.

View All
BY USE CASES

Mission-Critical Transactions

Global & Multicloud Application

Elastic Scaling for Peak Traffic

Real-time Analytics

Active Geo-redundancy

Database Consolidation

Resources

Comprehensive knowledge hub for OceanBase.

Blog

Live Demos

Training & Certification

Documentation

Official technical guides, tutorials, API references, and manuals for all OceanBase products.

View All
PRODUCTS

OceanBase Cloud

OceanBase Database

Tools

Connectors and Middleware

QUICK START

OceanBase Cloud

OceanBase Database

BEST PRACTICES

Practical guides for utilizing OceanBase more effectively and conveniently

Company

Learn more about OceanBase – our company, partnerships, and trust and security initiatives.

About OceanBase

Partner

Trust Center

Contact Us

International - English
中国站 - 简体中文
日本 - 日本語
Sign In
Start on Cloud

A unified distributed database ready for your transactional, analytical, and AI workloads.

DEPLOY YOUR WAY

OceanBase Cloud

The best way to deploy and scale OceanBase

OceanBase Enterprise

Run and manage OceanBase on your infra

TRY OPEN SOURCE

OceanBase Community Edition

The free, open-source distributed database

OceanBase seekdb

Open source AI native search database

Customer Stories

Real-world success stories from enterprises across diverse industries.

View All
BY USE CASES

Mission-Critical Transactions

Global & Multicloud Application

Elastic Scaling for Peak Traffic

Real-time Analytics

Active Geo-redundancy

Database Consolidation

Comprehensive knowledge hub for OceanBase.

Blog

Live Demos

Training & Certification

Documentation

Official technical guides, tutorials, API references, and manuals for all OceanBase products.

View All
PRODUCTS
OceanBase CloudOceanBase Database
ToolsConnectors and Middleware
QUICK START
OceanBase CloudOceanBase Database
BEST PRACTICES

Practical guides for utilizing OceanBase more effectively and conveniently

Learn more about OceanBase – our company, partnerships, and trust and security initiatives.

About OceanBase

Partner

Trust Center

Contact Us

Start on Cloud
编组
All Products
    • Databases
    • iconOceanBase Database
    • iconOceanBase Cloud
    • iconOceanBase Tugraph
    • iconInteractive Tutorials
    • iconOceanBase Best Practices
    • Tools
    • iconOceanBase Cloud Platform
    • iconOceanBase Migration Service
    • iconOceanBase Developer Center
    • iconOceanBase Migration Assessment
    • iconOceanBase Admin Tool
    • iconOceanBase Loader and Dumper
    • iconOceanBase Deployer
    • iconKubernetes operator for OceanBase
    • iconOceanBase Diagnostic Tool
    • iconOceanBase Binlog Service
    • Connectors and Middleware
    • iconOceanBase Database Proxy
    • iconEmbedded SQL in C for OceanBase
    • iconOceanBase Call Interface
    • iconOceanBase Connector/C
    • iconOceanBase Connector/J
    • iconOceanBase Connector/ODBC
    • iconOceanBase Connector/NET
icon

OceanBase Best Practices

All Versions

  • Deploy
    • Configuration guide for read-write splitting in AP scenarios
    • Best practices for read-write splitting
  • Migrate
    • Data transfer solutions in OceanBase Database
    • Overview on data migration
    • Best practices for importing data files to OceanBase Database
    • Best practice for migrating data from other databases to OceanBase Database
    • Massive data migration strategy
    • Best practices for migrating data from MyCat to OceanBase Database
    • Best practices for migrating PostgreSQL to OceanBase MySQL-compatible mode
  • Route
    • ODP routing best practices
  • Table Design
    • Best practices for table design and index optimization
    • Best practices for creating indexes on large tables
    • Best practices for database development
  • Develop
    • Best practices for connecting Java applications to OceanBase Database
    • Best practices for integrating Spark Catalog with OceanBase Database
    • Best practices for achieving optimal performance in batch DML using JDBC and OBServer
    • Best practices for bulk data cleanup in OceanBase Database
    • Best practices for PDML processing in OceanBase Database
    • Best practices for hot tables in OceanBase Database
    • Best practices for auto-increment columns and sequences in OceanBase Database
  • Manage
    • Best practices for resource throttling
    • Best practices for data load balancing
    • Best practices for security certification
    • Best practices for access control
    • Best practices for data encryption
  • Diagnose
    • Best practices for log interpretation in common scenarios
    • Best practices for end-to-end tracing
    • Best practices for using obdiag to collect performance information
    • Best practices for using obdiag to collect diagnostic information of parallel and slow SQL statements
    • Best practices for troubleshooting OceanBase Database performance issues
  • Performance Tuning
    • Best practices for handling slow queries
    • Best practices for collecting statistics to generate an efficient execution plan
    • Best practices for updating hotspot rows
    • Best practices for large object storage performance
    • Best practices for semi-structured storage performance
    • Best practices for OceanBase materialized views
  • Cloud Database
    • Best practices for achieving high availability through cross-cloud active-active deployment
    • High availability through primary and standby databases across clouds
    • High host CPU usage
    • Best practices for read/write splitting in OceanBase Cloud

Download PDF

Configuration guide for read-write splitting in AP scenarios Best practices for read-write splitting Data transfer solutions in OceanBase Database Overview on data migration Best practices for importing data files to OceanBase Database Best practice for migrating data from other databases to OceanBase Database Massive data migration strategy Best practices for migrating data from MyCat to OceanBase Database Best practices for migrating PostgreSQL to OceanBase MySQL-compatible mode ODP routing best practices Best practices for table design and index optimization Best practices for creating indexes on large tables Best practices for database development Best practices for connecting Java applications to OceanBase Database Best practices for integrating Spark Catalog with OceanBase Database Best practices for achieving optimal performance in batch DML using JDBC and OBServer Best practices for bulk data cleanup in OceanBase Database Best practices for PDML processing in OceanBase Database Best practices for hot tables in OceanBase Database Best practices for auto-increment columns and sequences in OceanBase Database Best practices for resource throttling Best practices for data load balancing Best practices for security certification Best practices for access control Best practices for data encryption Best practices for log interpretation in common scenarios Best practices for end-to-end tracing Best practices for using obdiag to collect performance information Best practices for using obdiag to collect diagnostic information of parallel and slow SQL statements Best practices for troubleshooting OceanBase Database performance issues Best practices for handling slow queries Best practices for collecting statistics to generate an efficient execution plan Best practices for updating hotspot rows Best practices for large object storage performance Best practices for semi-structured storage performance Best practices for OceanBase materialized views Best practices for achieving high availability through cross-cloud active-active deployment High availability through primary and standby databases across clouds High host CPU usage Best practices for read/write splitting in OceanBase Cloud
OceanBase logo

The Unified Distributed Database for the AI Era.

Follow Us
Products
OceanBase CloudOceanBase EnterpriseOceanBase Community EditionOceanBase seekdb
Resources
DocsBlogLive DemosTraining & Certification
Company
About OceanBaseTrust CenterLegalPartnerContact Us
Follow Us

© OceanBase 2026. All rights reserved

Cloud Service AgreementPrivacy PolicySecurity
Contact Us
Document Feedback
  1. Documentation Center
  2. OceanBase Best Practices
  3. master
iconOceanBase Best Practices
master
  • master

Best practices for collecting statistics to generate an efficient execution plan

Last Updated:2025-01-16 06:17:34  Updated
share
What is on this page
Why collect statistics?
How it works?
Scenarios
Considerations
Default statistics collection strategy
Configure a statistics collection strategy
Business peaks overlapping with the statistics maintenance window
Incomplete statistics collected for ultra-large tables
Query after a batch execution task
Query on tables in which partitions are pre-created by date
References

folded

share

Why collect statistics?

When generating and selecting an execution plan, the optimizer evaluates and compares the execution costs of all available plans. Accurate cost evaluation is essential for this process. The optimizer relies on a cost model and operator-estimated row counts to assess execution plan costs, with statistics playing a pivotal role. Accurate statistics directly impact cardinality estimation, which in turn influences cost evaluation and execution plan selection. Therefore, maintaining accurate statistics is critical for generating efficient execution plans.

How it works?

Statistics are stored as regular data in internal tables, and the optimizer maintains a local cache of these statistics to speed up access. Using the statistics, the optimizer evaluates the costs of various query paths and selects the most efficient plan. Regular updates to statistics ensure they reflect changes in the data, helping to maintain optimal query performance.

Scenarios

  • Complex query optimization: When a complex JSON query or a subquery is executed, accurate statistics enable the optimizer to select a more appropriate table join order and table join method, thereby significantly improving the query performance.
  • Index selection: The optimizer determines whether to use a specific index to accelerate access based on statistics.
  • Data distribution change: When the data in a table is significantly changed due to operations such as batch insert, delete, or update, it is particularly important to update statistics.

Considerations

  • Collection frequency: You need to set an appropriate statistics collection frequency based on the data change pattern.
  • Resource usage: Statistics collection consumes system resources. Therefore, we recommend that you collect statistics when the system load is low.
  • Accuracy: It is essential to ensure the accuracy of statistics. Otherwise, the optimizer may fail to select the optimal execution plan.

Default statistics collection strategy

By default, the system starts to collect statistics from 10:00 p.m. on each workday and from 6:00 a.m. on weekends. The collection lasts at most 4 hours on workdays and 20 hours on weekends. Such a period of time is called a statistics maintenance window. In each maintenance window, the optimizer re-collects all outdated statistics of tables or partitions.

Statistics are re-collected for a table or partition when either of the following conditions is met:

  • Statistics have not been successfully collected for the table or partition since the last collection time.
  • The number of added, deleted, or modified rows exceeds 10% of the number of rows in the table or partition when statistics are collected the last time.

The following table describes the default statistics collection strategy.

Parameter Description Default value
degree The degree of parallelism (DOP). 1 (single thread for scanning)
method_opt The strategy for collecting column-level statistics. Collect the basic statistics of all columns and the histograms of columns with data skew.
granularity The collection granularity. For a non-partitioned table, directly collect global statistics; for a partitioned table, collect partition-level statistics and deduce the global statistics.
estimate_percent The sampling ratio. Collect statistics through a full-table scan without sampling.
block_sample Specifies whether to use block sampling. Use row sampling instead of block sampling.

Configure a statistics collection strategy

The default statistics collection strategy can meet the requirements of most tables. In certain scenarios, you may need to adjust the collection strategy based on business characteristics. This section describes some common scenarios and corresponding collection strategies.

Business peaks overlapping with the statistics maintenance window

The default statistics maintenance window may overlap with business peak hours, such as workday nights. In such a period, the statistics collection task may contend resources with business SQL queries, compromising the business performance. In this case, you can adjust the start time of the statistics maintenance window to stagger the statistics maintenance window with the business peak hours.

-- For a MySQL business tenant, if it is now 11:00 a.m. on Thursday, March 7, 2024, you need to adjust the start time of the statistics maintenance window to 2:00 a.m. on Friday in the current tenant.
call dbms_scheduler.set_attribute('MONDAY_WINDOW', 'NEXT_DATE', '2024-03-11 02:00:00');
call dbms_scheduler.set_attribute('TUESDAY_WINDOW', 'NEXT_DATE', '2024-03-12 02:00:00');
call dbms_scheduler.set_attribute('WEDNESDAY_WINDOW', 'NEXT_DATE', '2024-03-13 02:00:00');
call dbms_scheduler.set_attribute('THURSDAY_WINDOW', 'NEXT_DATE', '2024-03-14 02:00:00');
call dbms_scheduler.set_attribute('FRIDAY_WINDOW', 'NEXT_DATE', '2024-03-08 02:00:00');
call dbms_scheduler.set_attribute('SATURDAY_WINDOW', 'NEXT_DATE', '2024-03-09 02:00:00');
call dbms_scheduler.set_attribute('SUNDAY_WINDOW', 'NEXT_DATE', '2024-03-10 02:00:00');


-- For an Oracle business tenant, if it is now 11:00 a.m. on Thursday, March 7, 2024, you need to adjust the start time of the statistics maintenance window to 2:00 a.m. on Friday in the corresponding sys tenant.
SET NLS_DATE_FORMAT='YYYY-MM-DD HH24:MI:SS';
SET NLS_TIMESTAMP_FORMAT='YYYY-MM-DD HH24:MI:SS.FF';
SET NLS_TIMESTAMP_TZ_FORMAT='YYYY-MM-DD HH24:MI:SS.FF TZR TZD';
call dbms_scheduler.set_attribute('MONDAY_WINDOW', 'NEXT_DATE', '2024-03-11 02:00:00');
call dbms_scheduler.set_attribute('TUESDAY_WINDOW', 'NEXT_DATE', '2024-03-12 02:00:00');
call dbms_scheduler.set_attribute('WEDNESDAY_WINDOW', 'NEXT_DATE', '2024-03-13 02:00:00');
call dbms_scheduler.set_attribute('THURSDAY_WINDOW', 'NEXT_DATE', '2024-03-14 02:00:00');
call dbms_scheduler.set_attribute('FRIDAY_WINDOW', 'NEXT_DATE', '2024-03-08 02:00:00');
call dbms_scheduler.set_attribute('SATURDAY_WINDOW', 'NEXT_DATE', '2024-03-09 02:00:00');
call dbms_scheduler.set_attribute('SUNDAY_WINDOW', 'NEXT_DATE', '2024-03-10 02:00:00');

Incomplete statistics collected for ultra-large tables

Under the default statistics collection strategy, a single full-table scan is performed concurrently on the table or partition whose statistics are to be collected. If the table or partition whose statistics are to be collected contains a large volume of data or occupies much disk space, the collection will take a long time and thus affect statistics collection of other tables, or even time out. In the current business environment, if a table contains more than 0.1 billion rows or occupies more than 20 GB of disk space, we recommend that you configure the statistics collection strategy as follows:

In OceanBase Database of a version earlier than V4.2.1, the estimate_percent parameter specifies the sampling ratio of both basic statistics and histograms. When you use the estimate_percent parameter to adjust the sampling ratio of basic statistics, the sampling ratio of histograms may increase accordingly. Therefore, we recommend that you do not use this parameter in combination with a histogram collection command.

In OceanBase Database V4.2.1 and later, the estimate_percent parameter specifies the sampling ratio only of basic statistics. You can directly adjust this parameter as needed without worrying about affecting the histogram sampling ratio.

Non-partitioned table

  1. Skip large objects (LOBs): Do not collect statistics of LOBs such as TEXT columns. Collect statistics of only non-LOB columns. In MySQL mode, statistics of TEXT columns are collected by default. If all TEXT columns store LOBs, the statistics collection can be very slow. In the following example, the fourth parameter specifies the columns whose statistics are to be collected. You need to add all columns except LOB columns.

    call dbms_stats.set_table_prefs(
      'database_name',
      'table_name',
      'method_opt',
      'for columns col1,col2,col3,... size auto');
    
  2. Increase the DOP or configure block sampling: You can increase the DOP so that more threads are used to concurrently collect statistics. This way, you can achieve quick collection by consuming more resources. Alternatively, you can configure block sampling to achieve quick collection by reducing the amount of data to be processed during statistics collection. Both methods can improve the statistics collection efficiency. The first method trades off resources for statistics accuracy, whereas the second method trades off statistics accuracy for reduced resource consumption. You can select a method as needed.

    -- Configure a DOP at the table level.
    call dbms_stats.set_table_prefs('database_name', 'table_name', 'degree', '4');
    
    -- Enable block sampling at the table level.
    call dbms_stats.set_table_prefs(
      'databse_name',
      'table_name',
      'block_sample',
      'True');
    -- Configure a sampling ratio based on the order of magnitude of the table. Generally, collecting statistics of tens of millions of rows can fully reflect the data characteristics of a table.
    call dbms_stats.set_table_prefs(
      'databse_name',
      'table_name',
      'estimate_percent',
      '0.1');
    

Partitioned table

  1. Skip LOBs: Do not collect statistics of TEXT columns. In MySQL mode, statistics of TEXT columns are collected by default. If all TEXT columns store LOBs, the statistics collection can be very slow. In the following example, the fourth parameter specifies the columns whose statistics are to be collected. You need to add all columns except LOB columns.

    -- Skip LOBs.
    call dbms_stats.set_table_prefs(
      'database_name',
      'table_name',
      'method_opt',
      'for columns col1,col2,col3,... size auto');
    
  2. Do not collect global statistics: Collect statistics only of a specific level for partitioned or subpartitioned tables, and make sure that global statistics are deleted. In the following example, the fourth parameter specifies the level of statistics to be collected. For a partitioned table, you can specify to collect statistics of only partitions. For a subpartitioned table, you can specify to collect statistics of only subpartitions. If you choose this strategy, you need to delete global statistics for partitioned and subpartitioned tables, and partition-level statistics for subpartitioned tables.

    -- Configure not to collect global statistics.
    -- For partitioned tables
    call dbms_stats.set_table_prefs('database_name', 'table_name', 'granularity', 'PARTITION');
    
    -- For subpartitioned tables
    call dbms_stats.set_table_prefs('database_name', 'table_name', 'granularity', 'SUBPARTITION');
    
  3. Increase the DOP or configure block sampling: Use settings similar to those of non-partitioned tables to make sure that statistics are collected in off-peak hours. You can increase the DOP so that more threads are used to concurrently collect statistics. This way, you can achieve quick collection by consuming more resources. If you choose this method, you need to stagger the statistics collection time with business peaks. Alternatively, you can configure block sampling to achieve quick collection by reducing the amount of data to be processed during statistics collection, and to prevent a high disk I/O usage resulting from massive data disk reads. Both methods can improve the statistics collection efficiency. The first method trades off resources for statistics accuracy, whereas the second method trades off statistics accuracy for reduced resource consumption. You can select a method as needed.

    -- Configure a DOP at the table level.
    call dbms_stats.set_table_prefs('database_name', 'table_name', 'degree', '4');
    
    -- Enable block sampling at the table level.
    call dbms_stats.set_table_prefs(
      'databse_name',
      'table_name',
      'block_sample',
      'True');
    -- Configure a sampling ratio based on the order of magnitude of the table. Generally, collecting statistics of tens of millions of rows can fully reflect the data characteristics of a table.
    call dbms_stats.set_table_prefs(
      'databse_name',
      'table_name',
      'estimate_percent',
      '0.1');
    

Configure a histogram strategy for statistics collection

The OceanBase Database optimizer supports three types of histograms: frequency histograms, TopK histograms, and hybrid histograms.

In a frequency histogram, each distinct column value corresponds to a single bucket in the histogram. The number of buckets specified must be no less than the number of distinct values (NDV) of the column.

A TopK histogram is a variant of the frequency histogram. Based on the Lossy Counting algorithm, it estimates the overall data distribution by calculating the feature of some data at a ratio of no less than 1-(1/bucket_size).

A hybrid histogram is set based on a specified amount of data collected and is a supplement to frequency histograms and TopK histograms.

In large table scenarios, the OceanBase Database optimizer introduces hybrid histograms for efficient statistics building. This method collects a preset amount of data to build a histogram. Each bucket of a hybrid histogram can contain multiple distinct values so as to segment collected data based on the number of buckets. All data in each segment is placed in the corresponding bucket to describe the data distribution in a larger range by using fewer buckets.

  • Condition 1: A frequency histogram is selected if the NDV of the specified column is less than or equal to 254.

  • Condition 2: A TopK histogram is preferentially selected if the NDV of the specified column is greater than 254. After column statistics are collected, the buckets are numbered in descending order based on their frequency. Buckets numbered greater than 254 are discarded, provided that the ratio of data records in the 254 buckets to the total data records collected is at least 1-(1/bucket_size), which is 99.6% by default.

  • A hybrid histogram is selected if the preceding conditions are not met.

Use the stored procedure set_column_stats to explicitly set column-level statistics. Here is an example:

-- Call the `SET_COLUMN_STATS` procedure in the `DBMS_STATS` package.
-- Set statistics of the 'col1' column in the 'tbl1' table.
-- 'user1': the user or schema to which the table belongs.
-- 'tbl1': the name of the table for which statistics are to be set.
-- 'col1': the name of the column for which statistics are to be set.
-- distcnt => 254: The NDV of the specified column is 254.
-- nullcnt => 10: The number of NULL values in the specified column is 10.
CALL dbms_stats.set_column_stats('user1', 'tbl1', 'col1', distcnt => 254, nullcnt => 10);

Manually collect statistics

Through manual statistics collection, you can better control the collection time and method to reduce the impact on business. The OceanBase Database optimizer allows you to manually collect statistics by using the DBMS_STATS package or executing the ANALYZE statement. We recommend that you use the DBMS_STATS package because it provides more features.

Example: Use the DBMS_STATS package to collect statistics

-- Collect partition-level statistics on the T_PART1 table under the TEST user, with a DOP of 64. Only histograms of columns with uneven data distribution are collected.
CALL dbms_stats.gather_table_stats('TEST', 'T_PART1', degree => 64, granularity => 'PARTITION', method_opt => 'FOR ALL COLUMNS SIZE SKEWONLY');

Example: Use the ANALYZE statement to collect statistics

-- Collect global statistics on the T_PART1 table under the TEST user. Only histograms of columns with uneven data distribution are collected.
ANALYZE TABLE T_PART1 PARTITION ('T_PART1') COMPUTE STATISTICS FOR ALL COLUMNS SIZE skewonly;

For ultra-large tables, you can select the segment-based or manual collection strategy based on the data volume and table schema. Segment-based collection can effectively reduce the load of a single statistics collection task. Manual collection is more flexible and controllable.

Query after a batch execution task

By default, statistics are not immediately updated after you batch import data. If business queries need to be performed immediately after batch import, we recommend that you manually initiate a statistics collection after the import.

Query on tables in which partitions are pre-created by date

For a table in which partitions are pre-created by date, if statistics of some pre-created partitions are not updated in a timely manner after data import, the optimizer may generate a nonoptimal execution plan. To ensure that the optimizer can generate an optimal execution plan, we recommend that you manually collect statistics of related partitions after data import.

References

For more information about statistics, see the following topics:

  • Overview of statistics
  • Overview

Previous topic

Best practices for handling slow queries
Last

Next topic

Best practices for updating hotspot rows
Next
What is on this page
Why collect statistics?
How it works?
Scenarios
Considerations
Default statistics collection strategy
Configure a statistics collection strategy
Business peaks overlapping with the statistics maintenance window
Incomplete statistics collected for ultra-large tables
Query after a batch execution task
Query on tables in which partitions are pre-created by date
References