In OceanBase Database, statistics management for partitioned tables distinguishes between partition-level statistics and global statistics. This mechanism applies only to partitioned tables. For non-partitioned tables (regular single tables), only a single set of global statistics exists — no partition-level statistics or derivation logic is involved.
The statistics mechanism for partitioned tables in OceanBase has the following characteristics:
- Separate global and partition statistics: Both are stored independently and distinguished by
partition_id. - Collected simultaneously by default: Under the
APPROX_GLOBAL AND PARTITIONgranularity, both global and partition statistics are collected at the same time. - Automatic staleness detection: Missing or stale statistics are automatically detected and queued for collection.
- Global statistics derived from partition statistics: Global basic statistics are derived from partition-level statistics, improving their timeliness and accuracy.
This topic describes the statistics collection strategies and automatic update rules that OceanBase applies to partitioned tables.
Default collection strategy
By default, OceanBase collects statistics for all partitioned tables at the APPROX_GLOBAL AND PARTITION granularity (implemented via the GATHER_INDEX_STATS procedure in the DBMS_STATS package). Under this strategy, the system generates two types of statistics: partition-level statistics and global statistics.
Partition-level statistics
Independent statistics are collected for each partition:
- Basic statistics: such as column minimum (min), maximum (max), and number of distinct values (NDV).
- Histograms: used to describe data distribution characteristics.
Global statistics
- Basic statistics (min, max, NDV, etc.): derived by aggregating partition-level basic statistics (for example, global min = the smallest min across all partitions).
- Global histograms: cannot be derived from partition results and require a full-table sample scan.
Because global basic statistics do not require a full-table scan, collection efficiency is greatly improved for large partitioned tables, especially in TB/PB-scale AP workloads.
Automatic update mechanism
To keep statistics current, OceanBase has a built-in detection and update mechanism that triggers refreshes based on the proportion of data changes.
Conditions that trigger a global statistics update
A global statistics update is triggered in any of the following cases:
- Global statistics are missing: No global statistics exist for the table.
- Global data is stale: The percentage of modified rows across the entire table is ≥ 10% (controlled by
STALE_PERCENT). - Any partition statistics are stale: Even if the global data has not changed significantly, a stale partition triggers a global statistics update to maintain consistency.
Update process:
- Global basic statistics: Derived from partition-level statistics (min, max, NDV, etc.).
- Global histograms: Collected separately, because histograms cannot currently be derived from partition results.
Conditions that trigger a partition-level statistics update
Statistics for an individual partition are updated in the following cases:
- Partition statistics are missing: No statistics exist for that partition.
- Partition data is stale: The percentage of modified rows in the partition is ≥ 10% (controlled by
STALE_PERCENT).
Update process:
- Re-collect statistics for the stale partition.
- Re-collect global statistics.
Configuring STALE_PERCENT
You can adjust the staleness threshold flexibly using the DBMS_STATS package:
-- Set the global default (applies to all new tables)
CALL dbms_stats.set_global_prefs('STALE_PERCENT', '50');
-- Set at the schema level
CALL dbms_stats.set_schema_prefs('MY_SCHEMA', 'STALE_PERCENT', '30');
-- Set at the table level
CALL dbms_stats.set_table_prefs('MY_SCHEMA', 'MY_TABLE', 'STALE_PERCENT', '20');
Automatic detection
OceanBase continuously runs the following background detection tasks:
- Statistics status check: Scans all existing global and partition statistics.
- Stale partition detection: Checks whether each partition's statistics are stale or missing.
- Missing statistics detection: Automatically flags missing statistics and adds them to the collection queue.
- Independent update triggers: Global and partition statistics can be updated independently without affecting each other.
Key parameters
| Parameter | Default | Description |
|---|---|---|
STALE_PERCENT |
10 (10%) | Parameter for stored procedures (set_schema_prefs, set_table_prefs). Defines the change threshold for determining whether statistics are stale. |
| Default collection granularity | APPROX_GLOBAL AND PARTITION |
Applies to partitioned tables only. Non-partitioned tables use GLOBAL (implemented via the GATHER_INDEX_STATS procedure in the DBMS_STATS package). |
Comparison with non-partitioned tables
| Feature | Partitioned table | Non-partitioned table |
|---|---|---|
| Statistics types | Global + partition | Global only |
| Source of global basic statistics | Derived from partition statistics | Generated by full-table scan |
| Supports partition pruning optimization | Yes | No |
| Default collection granularity | APPROX_GLOBAL AND PARTITION |
GLOBAL |
Example: Managing statistics for a partitioned table
This example is based on OceanBase Database in MySQL mode. It demonstrates how to create a partitioned table, observe statistics collection and automatic update behavior, and verify that global statistics are derived from partition-level statistics.
Step 1: Create a test partitioned table
-- Create the database if it does not already exist
CREATE DATABASE IF NOT EXISTS sales_db;
USE sales_db;
-- Create a RANGE-partitioned table for sales records
CREATE TABLE sales (
id BIGINT NOT NULL,
region VARCHAR(50),
amount DECIMAL(10,2),
sale_date DATE,
PRIMARY KEY (id, sale_date)
) PARTITION BY RANGE COLUMNS(sale_date) (
PARTITION p2023_q1 VALUES LESS THAN ('2023-04-01'),
PARTITION p2023_q2 VALUES LESS THAN ('2023-07-01'),
PARTITION p2023_q3 VALUES LESS THAN ('2023-10-01'),
PARTITION p2023_q4 VALUES LESS THAN ('2024-01-01')
);
Step 2: Insert initial data
-- Insert test data into each partition
INSERT INTO sales VALUES
(1, 'North', 1500.00, '2023-01-15'),
(2, 'South', 2300.50, '2023-02-20'),
(3, 'East', 980.75, '2023-04-10'),
(4, 'West', 3200.00, '2023-06-25'),
(5, 'North', 1800.25, '2023-07-05'),
(6, 'South', 2100.00, '2023-09-18');
Step 3: Collect statistics (default granularity: APPROX_GLOBAL AND PARTITION)
CALL DBMS_STATS.GATHER_TABLE_STATS(
ownname => 'sales_db',
tabname => 'sales',
method_opt => 'FOR ALL COLUMNS SIZE AUTO',
granularity => 'APPROX_GLOBAL AND PARTITION'
);
Step 4: View statistics
View global table-level statistics
SELECT
TABLE_NAME,
OBJECT_TYPE,
NUM_ROWS,
AVG_ROW_LEN,
LAST_ANALYZED
FROM oceanbase.DBA_TAB_STATISTICS
WHERE OWNER = 'SALES_DB' AND TABLE_NAME = 'SALES'
ORDER BY OBJECT_TYPE;
Expected result:
+------------+-------------+----------+-------------+----------------------------+
| TABLE_NAME | OBJECT_TYPE | NUM_ROWS | AVG_ROW_LEN | LAST_ANALYZED |
+------------+-------------+----------+-------------+----------------------------+
| sales | PARTITION | 2 | 73 | 2026-01-14 10:45:58.067339 |
| sales | PARTITION | 2 | 72 | 2026-01-14 10:45:58.067339 |
| sales | PARTITION | 2 | 73 | 2026-01-14 10:45:58.067339 |
| sales | PARTITION | 0 | 0 | 2026-01-14 10:45:58.067339 |
| sales | TABLE | 6 | 72 | 2026-01-14 10:45:58.079250 |
+------------+-------------+----------+-------------+----------------------------+
5 rows in set
View global column-level statistics (for the amount column)
SELECT
COLUMN_NAME,
NUM_DISTINCT,
LOW_VALUE,
HIGH_VALUE,
NUM_NULLS,
HISTOGRAM
FROM oceanbase.DBA_TAB_COL_STATISTICS
WHERE OWNER = 'SALES_DB'
AND TABLE_NAME = 'SALES'
AND COLUMN_NAME = 'AMOUNT';
Expected result:
+-------------+--------------+-----------+------------+-----------+-----------+
| COLUMN_NAME | NUM_DISTINCT | LOW_VALUE | HIGH_VALUE | NUM_NULLS | HISTOGRAM |
+-------------+--------------+-----------+------------+-----------+-----------+
| amount | 6 | 980.75 | 3200.00 | 0 | NULL |
+-------------+--------------+-----------+------------+-----------+-----------+
1 row in set
View partition-level column statistics (for the amount column in partition p2023_q1)
SELECT
PARTITION_NAME,
COLUMN_NAME,
NUM_DISTINCT,
LOW_VALUE,
HIGH_VALUE,
NUM_NULLS
FROM oceanbase.DBA_PART_COL_STATISTICS
WHERE OWNER = 'SALES_DB'
AND TABLE_NAME = 'SALES'
AND PARTITION_NAME = 'P2023_Q1'
AND COLUMN_NAME = 'AMOUNT';
Expected result:
+----------------+-------------+--------------+-----------+------------+-----------+
| PARTITION_NAME | COLUMN_NAME | NUM_DISTINCT | LOW_VALUE | HIGH_VALUE | NUM_NULLS |
+----------------+-------------+--------------+-----------+------------+-----------+
| p2023_q1 | amount | 2 | 1500.00 | 2300.50 | 0 |
+----------------+-------------+--------------+-----------+------------+-----------+
1 row in set
Step 5: Verify the automatic update mechanism (optional)
-- Insert new data into p2023_q4 to simulate >10% change
INSERT INTO sales
SELECT 1000 + n, 'Central', ROUND(RAND()*5000, 2), '2023-11-01'
FROM (SELECT @n := @n + 1 AS n FROM information_schema.columns a, information_schema.columns b, (SELECT @n := 0) r LIMIT 50) t;
-- Wait for background detection, or manually flush monitoring info
CALL DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO();
-- If the change ratio >= STALE_PERCENT (default 10%), the system marks the statistics as stale and may trigger collection.
-- Check whether LAST_ANALYZED has been updated
SELECT PARTITION_NAME, LAST_ANALYZED
FROM oceanbase.DBA_TAB_STATISTICS
WHERE OWNER = 'SALES_DB' AND TABLE_NAME = 'SALES'
ORDER BY PARTITION_NAME;
Expected result:
+----------------+----------------------------+
| PARTITION_NAME | LAST_ANALYZED |
+----------------+----------------------------+
| NULL | 2026-01-14 10:45:58.079250 |
| p2023_q1 | 2026-01-14 10:45:58.067339 |
| p2023_q2 | 2026-01-14 10:45:58.067339 |
| p2023_q3 | 2026-01-14 10:45:58.067339 |
| p2023_q4 | 2026-01-14 10:45:58.067339 |
+----------------+----------------------------+
5 rows in set
References
For related PL procedure documentation, see:
