Collection and update strategies of statistics for partitioned tables|V4.3.5|OceanBase Database| docs|Distributed Database

In OceanBase Database, statistics management for partitioned tables distinguishes between partition-level statistics and global statistics. This mechanism applies only to partitioned tables. For non-partitioned tables (regular single tables), only a single set of global statistics exists — no partition-level statistics or derivation logic is involved.

The statistics mechanism for partitioned tables in OceanBase has the following characteristics:

Separate global and partition statistics: Both are stored independently and distinguished by partition_id.
Collected simultaneously by default: Under the APPROX_GLOBAL AND PARTITION granularity, both global and partition statistics are collected at the same time.
Automatic staleness detection: Missing or stale statistics are automatically detected and queued for collection.
Global statistics derived from partition statistics: Global basic statistics are derived from partition-level statistics, improving their timeliness and accuracy.

This topic describes the statistics collection strategies and automatic update rules that OceanBase applies to partitioned tables.

Default collection strategy

By default, OceanBase collects statistics for all partitioned tables at the APPROX_GLOBAL AND PARTITION granularity (implemented via the GATHER_INDEX_STATS procedure in the DBMS_STATS package). Under this strategy, the system generates two types of statistics: partition-level statistics and global statistics.

Partition-level statistics

Independent statistics are collected for each partition:

Basic statistics: such as column minimum (min), maximum (max), and number of distinct values (NDV).
Histograms: used to describe data distribution characteristics.

Global statistics

Basic statistics (min, max, NDV, etc.): derived by aggregating partition-level basic statistics (for example, global min = the smallest min across all partitions).
Global histograms: cannot be derived from partition results and require a full-table sample scan.

Because global basic statistics do not require a full-table scan, collection efficiency is greatly improved for large partitioned tables, especially in TB/PB-scale AP workloads.

Automatic update mechanism

To keep statistics current, OceanBase has a built-in detection and update mechanism that triggers refreshes based on the proportion of data changes.

Conditions that trigger a global statistics update

A global statistics update is triggered in any of the following cases:

Global statistics are missing: No global statistics exist for the table.
Global data is stale: The percentage of modified rows across the entire table is ≥ 10% (controlled by STALE_PERCENT).
Any partition statistics are stale: Even if the global data has not changed significantly, a stale partition triggers a global statistics update to maintain consistency.

Update process:

Global basic statistics: Derived from partition-level statistics (min, max, NDV, etc.).
Global histograms: Collected separately, because histograms cannot currently be derived from partition results.

Conditions that trigger a partition-level statistics update

Statistics for an individual partition are updated in the following cases:

Partition statistics are missing: No statistics exist for that partition.
Partition data is stale: The percentage of modified rows in the partition is ≥ 10% (controlled by STALE_PERCENT).

Update process:

Re-collect statistics for the stale partition.
Re-collect global statistics.

Configuring STALE_PERCENT

You can adjust the staleness threshold flexibly using the DBMS_STATS package:

-- Set the global default (applies to all new tables)
CALL dbms_stats.set_global_prefs('STALE_PERCENT', '50');

-- Set at the schema level
CALL dbms_stats.set_schema_prefs('MY_SCHEMA', 'STALE_PERCENT', '30');

-- Set at the table level
CALL dbms_stats.set_table_prefs('MY_SCHEMA', 'MY_TABLE', 'STALE_PERCENT', '20');

Automatic detection

OceanBase continuously runs the following background detection tasks:

Statistics status check: Scans all existing global and partition statistics.
Stale partition detection: Checks whether each partition's statistics are stale or missing.
Missing statistics detection: Automatically flags missing statistics and adds them to the collection queue.
Independent update triggers: Global and partition statistics can be updated independently without affecting each other.

Key parameters

Parameter	Default	Description
`STALE_PERCENT`	10 (10%)	Parameter for stored procedures (`set_schema_prefs`, `set_table_prefs`). Defines the change threshold for determining whether statistics are stale.
Default collection granularity	`APPROX_GLOBAL AND PARTITION`	Applies to partitioned tables only. Non-partitioned tables use `GLOBAL` (implemented via the `GATHER_INDEX_STATS` procedure in the `DBMS_STATS` package).

Comparison with non-partitioned tables

Feature	Partitioned table	Non-partitioned table
Statistics types	Global + partition	Global only
Source of global basic statistics	Derived from partition statistics	Generated by full-table scan
Supports partition pruning optimization	Yes	No
Default collection granularity	`APPROX_GLOBAL AND PARTITION`	`GLOBAL`

Example: Managing statistics for a partitioned table

This example is based on OceanBase Database in MySQL mode. It demonstrates how to create a partitioned table, observe statistics collection and automatic update behavior, and verify that global statistics are derived from partition-level statistics.

Step 1: Create a test partitioned table

-- Create the database if it does not already exist
CREATE DATABASE IF NOT EXISTS sales_db;
USE sales_db;

-- Create a RANGE-partitioned table for sales records
CREATE TABLE sales (
    id BIGINT NOT NULL,
    region VARCHAR(50),
    amount DECIMAL(10,2),
    sale_date DATE,
    PRIMARY KEY (id, sale_date)
) PARTITION BY RANGE COLUMNS(sale_date) (
    PARTITION p2023_q1 VALUES LESS THAN ('2023-04-01'),
    PARTITION p2023_q2 VALUES LESS THAN ('2023-07-01'),
    PARTITION p2023_q3 VALUES LESS THAN ('2023-10-01'),
    PARTITION p2023_q4 VALUES LESS THAN ('2024-01-01')
);

Step 2: Insert initial data

-- Insert test data into each partition
INSERT INTO sales VALUES
(1, 'North', 1500.00, '2023-01-15'),
(2, 'South', 2300.50, '2023-02-20'),
(3, 'East', 980.75, '2023-04-10'),
(4, 'West', 3200.00, '2023-06-25'),
(5, 'North', 1800.25, '2023-07-05'),
(6, 'South', 2100.00, '2023-09-18');

Step 3: Collect statistics (default granularity: APPROX_GLOBAL AND PARTITION)

CALL DBMS_STATS.GATHER_TABLE_STATS(
    ownname => 'sales_db',
    tabname => 'sales',
    method_opt => 'FOR ALL COLUMNS SIZE AUTO',
    granularity => 'APPROX_GLOBAL AND PARTITION'
);

Step 4: View statistics

View global table-level statistics

SELECT
    TABLE_NAME,
    OBJECT_TYPE,
    NUM_ROWS,
    AVG_ROW_LEN,
    LAST_ANALYZED
FROM oceanbase.DBA_TAB_STATISTICS
WHERE OWNER = 'SALES_DB' AND TABLE_NAME = 'SALES'
ORDER BY OBJECT_TYPE;

Expected result:

+------------+-------------+----------+-------------+----------------------------+
| TABLE_NAME | OBJECT_TYPE | NUM_ROWS | AVG_ROW_LEN | LAST_ANALYZED              |
+------------+-------------+----------+-------------+----------------------------+
| sales      | PARTITION   |        2 |          73 | 2026-01-14 10:45:58.067339 |
| sales      | PARTITION   |        2 |          72 | 2026-01-14 10:45:58.067339 |
| sales      | PARTITION   |        2 |          73 | 2026-01-14 10:45:58.067339 |
| sales      | PARTITION   |        0 |           0 | 2026-01-14 10:45:58.067339 |
| sales      | TABLE       |        6 |          72 | 2026-01-14 10:45:58.079250 |
+------------+-------------+----------+-------------+----------------------------+
5 rows in set

View global column-level statistics (for the `amount` column)

SELECT
    COLUMN_NAME,
    NUM_DISTINCT,
    LOW_VALUE,
    HIGH_VALUE,
    NUM_NULLS,
    HISTOGRAM
FROM oceanbase.DBA_TAB_COL_STATISTICS
WHERE OWNER = 'SALES_DB'
  AND TABLE_NAME = 'SALES'
  AND COLUMN_NAME = 'AMOUNT';

Expected result:

+-------------+--------------+-----------+------------+-----------+-----------+
| COLUMN_NAME | NUM_DISTINCT | LOW_VALUE | HIGH_VALUE | NUM_NULLS | HISTOGRAM |
+-------------+--------------+-----------+------------+-----------+-----------+
| amount      |            6 | 980.75    | 3200.00    |         0 | NULL      |
+-------------+--------------+-----------+------------+-----------+-----------+
1 row in set

View partition-level column statistics (for the `amount` column in partition `p2023_q1`)

SELECT
    PARTITION_NAME,
    COLUMN_NAME,
    NUM_DISTINCT,
    LOW_VALUE,
    HIGH_VALUE,
    NUM_NULLS
FROM oceanbase.DBA_PART_COL_STATISTICS
WHERE OWNER = 'SALES_DB'
  AND TABLE_NAME = 'SALES'
  AND PARTITION_NAME = 'P2023_Q1'
  AND COLUMN_NAME = 'AMOUNT';

Expected result:

+----------------+-------------+--------------+-----------+------------+-----------+
| PARTITION_NAME | COLUMN_NAME | NUM_DISTINCT | LOW_VALUE | HIGH_VALUE | NUM_NULLS |
+----------------+-------------+--------------+-----------+------------+-----------+
| p2023_q1       | amount      |            2 | 1500.00   | 2300.50    |         0 |
+----------------+-------------+--------------+-----------+------------+-----------+
1 row in set

Step 5: Verify the automatic update mechanism (optional)

-- Insert new data into p2023_q4 to simulate >10% change
INSERT INTO sales
SELECT 1000 + n, 'Central', ROUND(RAND()*5000, 2), '2023-11-01'
FROM (SELECT @n := @n + 1 AS n FROM information_schema.columns a, information_schema.columns b, (SELECT @n := 0) r LIMIT 50) t;

-- Wait for background detection, or manually flush monitoring info
CALL DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO();
-- If the change ratio >= STALE_PERCENT (default 10%), the system marks the statistics as stale and may trigger collection.

-- Check whether LAST_ANALYZED has been updated
SELECT PARTITION_NAME, LAST_ANALYZED
FROM oceanbase.DBA_TAB_STATISTICS
WHERE OWNER = 'SALES_DB' AND TABLE_NAME = 'SALES'
ORDER BY PARTITION_NAME;

Expected result:

+----------------+----------------------------+
| PARTITION_NAME | LAST_ANALYZED              |
+----------------+----------------------------+
| NULL           | 2026-01-14 10:45:58.079250 |
| p2023_q1       | 2026-01-14 10:45:58.067339 |
| p2023_q2       | 2026-01-14 10:45:58.067339 |
| p2023_q3       | 2026-01-14 10:45:58.067339 |
| p2023_q4       | 2026-01-14 10:45:58.067339 |
+----------------+----------------------------+
5 rows in set

References

For related PL procedure documentation, see:

OceanBase

Customer Stories

Documentation

Collection and update strategies of statistics for partitioned tables

Default collection strategy

Partition-level statistics

Global statistics

Automatic update mechanism

Conditions that trigger a global statistics update

Conditions that trigger a partition-level statistics update

Configuring STALE_PERCENT

Automatic detection

Key parameters

Comparison with non-partitioned tables

Example: Managing statistics for a partitioned table

Step 1: Create a test partitioned table

Step 2: Insert initial data

Step 3: Collect statistics (default granularity: APPROX_GLOBAL AND PARTITION)

Step 4: View statistics

View global table-level statistics

View global column-level statistics (for the `amount` column)

View partition-level column statistics (for the `amount` column in partition `p2023_q1`)

Step 5: Verify the automatic update mechanism (optional)

References

OceanBase

Customer Stories

Documentation

Collection and update strategies of statistics for partitioned tables

Default collection strategy

Partition-level statistics

Global statistics

Automatic update mechanism

Conditions that trigger a global statistics update

Conditions that trigger a partition-level statistics update

Configuring STALE_PERCENT

Automatic detection

Key parameters

Comparison with non-partitioned tables

Example: Managing statistics for a partitioned table

Step 1: Create a test partitioned table

Step 2: Insert initial data

Step 3: Collect statistics (default granularity: APPROX_GLOBAL AND PARTITION)

Step 4: View statistics

View global table-level statistics

View global column-level statistics (for the amount column)

View partition-level column statistics (for the amount column in partition p2023_q1)

Step 5: Verify the automatic update mechanism (optional)

References

View global column-level statistics (for the `amount` column)

View partition-level column statistics (for the `amount` column in partition `p2023_q1`)