Statistics collection and update strategy for partitioned tables|V4.2.1|OceanBase Database| docs|Distributed Database

Statistics collection and update strategy for partitioned tables

Last Updated：2026-06-23 06:23:59 Updated

In OceanBase Database, statistics management for partitioned tables distinguishes between partition-level statistics and global statistics. This mechanism applies only to partitioned tables. For non-partitioned tables (regular tables), only a single set of global statistics exists, with no partition-level statistics or derivation logic involved.

The partitioned table statistics mechanism in OceanBase has the following characteristics:

Separate global and partition statistics: Both are stored independently, distinguished by partition_id.
Collected simultaneously by default: Under the APPROX_GLOBAL AND PARTITION granularity, both global and partition statistics are collected at the same time.
Automatic detection of missing statistics: The system automatically detects missing or stale statistics and triggers collection.
Global statistics derived from partition statistics: Global basic statistics are derived from partition-level basic statistics, improving the timeliness and accuracy of global statistics.

This topic describes the statistics collection strategies and automatic update rules that OceanBase uses for partitioned tables.

Default collection strategy

OceanBase collects statistics for all partitioned tables using the APPROX_GLOBAL AND PARTITION granularity by default (implemented through the GATHER_INDEX_STATS procedure of the DBMS_STATS package). Under this strategy, the system generates two types of statistics simultaneously: partition-level statistics and global statistics.

Partition-level statistics

Independent statistics are collected for each partition:

Basic statistics: Such as column minimum (min), maximum (max), number of distinct values (NDV), and so on.
Histograms: Used to characterize data distribution patterns.

Global statistics

Basic statistics (min, max, NDV, and so on): Derived by combining basic statistics from all partitions (for example, global min = minimum of all partition min values).
Global histograms: Cannot be derived from partition-level results and require sampling across the entire table.

Global basic statistics do not require a full table scan, which significantly improves statistics collection efficiency for large-scale partitioned tables, especially in TB/PB-level AP scenarios.

Automatic update mechanism

To ensure the timeliness of statistics, OceanBase has a built-in automatic detection and update mechanism that triggers refresh based on the proportion of data modifications.

Conditions that trigger global statistics update

A global statistics update is triggered when any of the following conditions is met:

Global statistics missing: No global statistics exist for the table.
Global data stale: The proportion of modified rows across the entire table is ≥ 10% (controlled by STALE_PERCENT).
Any partition statistics stale: Even if the global data has not changed, a global statistics update is triggered whenever any partition's statistics become stale, to ensure consistency.

Update process:

Global basic statistics: Derived from partition-level statistics (min, max, NDV, and so on).
Global histograms: Must be collected separately, because histograms currently cannot be derived from partition-level results.

Conditions that trigger partition-level statistics update

Statistics for a single partition are updated when:

Partition statistics missing: No statistics exist for that partition.
Partition data stale: The proportion of modified rows in that partition is ≥ 10% (controlled by STALE_PERCENT).

Update process:

Re-collect statistics for the stale partition.
Re-collect global statistics.

Configuring STALE_PERCENT

You can flexibly adjust the staleness threshold through the DBMS_STATS package:

-- Set the global default value (affects all new tables)
CALL dbms_stats.set_global_prefs('STALE_PERCENT', '50');

-- Set at the schema level
CALL dbms_stats.set_schema_prefs('MY_SCHEMA', 'STALE_PERCENT', '30');

-- Set at the table level
CALL dbms_stats.set_table_prefs('MY_SCHEMA', 'MY_TABLE', 'STALE_PERCENT', '20');

Automatic detection capabilities

The OceanBase system continuously performs the following detection tasks in the background:

Statistics status check: Scans all existing global and partition statistics.
Stale partition detection: Checks whether each partition is stale or missing statistics.
Missing statistics detection: Automatically marks missing entries and adds them to the collection queue.
Independent update triggering: Supports independent triggering of global and partition statistics updates without mutual interference.

Key parameters

Parameter	Default value	Description
`STALE_PERCENT`	10 (10%)	A parameter for stored procedures (set_schema_prefs, set_table_prefs) that specifies the modification threshold for determining whether statistics are stale
Default collection granularity	`APPROX_GLOBAL AND PARTITION`	Applies only to partitioned tables; non-partitioned tables use `GLOBAL` (implemented through the `GATHER_INDEX_STATS` procedure of the `DBMS_STATS` package)

Comparison with non-partitioned tables

Feature	Partitioned table	Non-partitioned table
Statistics type	Global + Partition	Global only
Source of global basic statistics	Derived from partition statistics	Generated by full table scan
Partition pruning optimization	Yes	No
Default collection granularity	`APPROX_GLOBAL AND PARTITION`	`GLOBAL`

Example: managing statistics for a partitioned table in OceanBase

This example is based on OceanBase Database (MySQL mode). It demonstrates how to create a partitioned table, view its statistics collection and automatic update behavior, and shows how global statistics are derived from partition statistics.

Step 1: Create a test partitioned table

-- Create the database and user (if they do not already exist)
CREATE DATABASE IF NOT EXISTS sales_db;
USE sales_db;

-- Create a RANGE-partitioned sales table
CREATE TABLE sales (
    id BIGINT NOT NULL,
    region VARCHAR(50),
    amount DECIMAL(10,2),
    sale_date DATE,
    PRIMARY KEY (id, sale_date)
) PARTITION BY RANGE COLUMNS(sale_date) (
    PARTITION p2023_q1 VALUES LESS THAN ('2023-04-01'),
    PARTITION p2023_q2 VALUES LESS THAN ('2023-07-01'),
    PARTITION p2023_q3 VALUES LESS THAN ('2023-10-01'),
    PARTITION p2023_q4 VALUES LESS THAN ('2024-01-01')
);

Step 2: Insert initial data

-- Insert test data into each partition
INSERT INTO sales VALUES
(1, 'North', 1500.00, '2023-01-15'),
(2, 'South', 2300.50, '2023-02-20'),
(3, 'East', 980.75, '2023-04-10'),
(4, 'West', 3200.00, '2023-06-25'),
(5, 'North', 1800.25, '2023-07-05'),
(6, 'South', 2100.00, '2023-09-18');

Step 3: Collect statistics (default granularity: APPROX_GLOBAL AND PARTITION)

CALL DBMS_STATS.GATHER_TABLE_STATS(
    ownname => 'sales_db',
    tabname => 'sales',
    method_opt => 'FOR ALL COLUMNS SIZE AUTO',
    granularity => 'APPROX_GLOBAL AND PARTITION'
);

Step 4: View statistics

View global table-level statistics

SELECT
    TABLE_NAME,
    OBJECT_TYPE,
    NUM_ROWS,
    AVG_ROW_LEN,
    LAST_ANALYZED
FROM oceanbase.DBA_TAB_STATISTICS
WHERE OWNER = 'SALES_DB' AND TABLE_NAME = 'SALES'
ORDER BY OBJECT_TYPE;

Expected output:

+------------+-------------+----------+-------------+----------------------------+
| TABLE_NAME | OBJECT_TYPE | NUM_ROWS | AVG_ROW_LEN | LAST_ANALYZED              |
+------------+-------------+----------+-------------+----------------------------+
| sales      | PARTITION   |        2 |          73 | 2026-01-14 10:45:58.067339 |
| sales      | PARTITION   |        2 |          72 | 2026-01-14 10:45:58.067339 |
| sales      | PARTITION   |        2 |          73 | 2026-01-14 10:45:58.067339 |
| sales      | PARTITION   |        0 |           0 | 2026-01-14 10:45:58.067339 |
| sales      | TABLE       |        6 |          72 | 2026-01-14 10:45:58.079250 |
+------------+-------------+----------+-------------+----------------------------+
5 rows in set

View global column-level statistics (such as the amount column)

SELECT
    COLUMN_NAME,
    NUM_DISTINCT,
    LOW_VALUE,
    HIGH_VALUE,
    NUM_NULLS,
    HISTOGRAM
FROM oceanbase.DBA_TAB_COL_STATISTICS
WHERE OWNER = 'SALES_DB'
  AND TABLE_NAME = 'SALES'
  AND COLUMN_NAME = 'AMOUNT';

Expected output:

+-------------+--------------+-----------+------------+-----------+-----------+
| COLUMN_NAME | NUM_DISTINCT | LOW_VALUE | HIGH_VALUE | NUM_NULLS | HISTOGRAM |
+-------------+--------------+-----------+------------+-----------+-----------+
| amount      |            6 | 980.75    | 3200.00    |         0 | NULL      |
+-------------+--------------+-----------+------------+-----------+-----------+
1 row in set

View partition-level column statistics (such as the amount column in partition p2023_q1)

SELECT
    PARTITION_NAME,
    COLUMN_NAME,
    NUM_DISTINCT,
    LOW_VALUE,
    HIGH_VALUE,
    NUM_NULLS
FROM oceanbase.DBA_PART_COL_STATISTICS
WHERE OWNER = 'SALES_DB'
  AND TABLE_NAME = 'SALES'
  AND PARTITION_NAME = 'P2023_Q1'
  AND COLUMN_NAME = 'AMOUNT';

Expected output:

+----------------+-------------+--------------+-----------+------------+-----------+
| PARTITION_NAME | COLUMN_NAME | NUM_DISTINCT | LOW_VALUE | HIGH_VALUE | NUM_NULLS |
+----------------+-------------+--------------+-----------+------------+-----------+
| p2023_q1       | amount      |            2 | 1500.00   | 2300.50    |         0 |
+----------------+-------------+--------------+-----------+------------+-----------+
1 row in set

Step 5: Verify the automatic update mechanism (optional)

-- Insert new data into p2023_q4 (simulating >10% modification)
INSERT INTO sales
SELECT 1000 + n, 'Central', ROUND(RAND()*5000, 2), '2023-11-01'
FROM (SELECT @n := @n + 1 AS n FROM information_schema.columns a, information_schema.columns b, (SELECT @n := 0) r LIMIT 50) t;

-- Wait for background automatic detection (or trigger manually)
CALL DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO();
-- At this point, if the modification proportion ≥ STALE_PERCENT (default 10%), the system will automatically mark it as stale and may trigger collection

-- Query again to check whether LAST_ANALYZED has been updated
SELECT PARTITION_NAME, LAST_ANALYZED
FROM oceanbase.DBA_TAB_STATISTICS
WHERE OWNER = 'SALES_DB' AND TABLE_NAME = 'SALES'
ORDER BY PARTITION_NAME;

Expected output:

+----------------+----------------------------+
| PARTITION_NAME | LAST_ANALYZED              |
+----------------+----------------------------+
| NULL           | 2026-01-14 10:45:58.079250 |
| p2023_q1       | 2026-01-14 10:45:58.067339 |
| p2023_q2       | 2026-01-14 10:45:58.067339 |
| p2023_q3       | 2026-01-14 10:45:58.067339 |
| p2023_q4       | 2026-01-14 10:45:58.067339 |
+----------------+----------------------------+
5 rows in set

For related PL procedure references, see the following topics:

OceanBase

Customer Stories

Documentation

Statistics collection and update strategy for partitioned tables

Default collection strategy

Partition-level statistics

Global statistics

Automatic update mechanism

Conditions that trigger global statistics update

Conditions that trigger partition-level statistics update

Configuring STALE_PERCENT

Automatic detection capabilities

Key parameters

Comparison with non-partitioned tables

Example: managing statistics for a partitioned table in OceanBase

Step 1: Create a test partitioned table

Step 2: Insert initial data

Step 3: Collect statistics (default granularity: APPROX_GLOBAL AND PARTITION)

Step 4: View statistics

View global table-level statistics

View global column-level statistics (such as the amount column)

View partition-level column statistics (such as the amount column in partition p2023_q1)

Step 5: Verify the automatic update mechanism (optional)

OceanBase

Customer Stories

Documentation

Statistics collection and update strategy for partitioned tables

Default collection strategy

Partition-level statistics

Global statistics

Automatic update mechanism

Conditions that trigger global statistics update

Conditions that trigger partition-level statistics update

Configuring STALE_PERCENT

Automatic detection capabilities

Key parameters

Comparison with non-partitioned tables

Example: managing statistics for a partitioned table in OceanBase

Step 1: Create a test partitioned table

Step 2: Insert initial data

Step 3: Collect statistics (default granularity: APPROX_GLOBAL AND PARTITION)

Step 4: View statistics

View global table-level statistics

View global column-level statistics (such as the amount column)

View partition-level column statistics (such as the amount column in partition p2023_q1)

Step 5: Verify the automatic update mechanism (optional)

Related topics