In OceanBase Database, statistics management for partitioned tables distinguishes between partition-level statistics and global statistics. This mechanism applies only to partitioned tables. For non-partitioned tables (regular tables), only a single set of global statistics exists, with no partition-level statistics or derivation logic involved.
The partitioned table statistics mechanism in OceanBase has the following characteristics:
- Separate global and partition statistics: Both are stored independently, distinguished by
partition_id. - Collected simultaneously by default: Under the
APPROX_GLOBAL AND PARTITIONgranularity, both global and partition statistics are collected at the same time. - Automatic detection of missing statistics: The system automatically detects missing or stale statistics and triggers collection.
- Global statistics derived from partition statistics: Global basic statistics are derived from partition-level basic statistics, improving the timeliness and accuracy of global statistics.
This topic describes the statistics collection strategies and automatic update rules that OceanBase uses for partitioned tables.
Default collection strategy
OceanBase collects statistics for all partitioned tables using the APPROX_GLOBAL AND PARTITION granularity by default (implemented through the GATHER_INDEX_STATS procedure of the DBMS_STATS package). Under this strategy, the system generates two types of statistics simultaneously: partition-level statistics and global statistics.
Partition-level statistics
Independent statistics are collected for each partition:
- Basic statistics: Such as column minimum (min), maximum (max), number of distinct values (NDV), and so on.
- Histograms: Used to characterize data distribution patterns.
Global statistics
- Basic statistics (min, max, NDV, and so on): Derived by combining basic statistics from all partitions (for example, global min = minimum of all partition min values).
- Global histograms: Cannot be derived from partition-level results and require sampling across the entire table.
Global basic statistics do not require a full table scan, which significantly improves statistics collection efficiency for large-scale partitioned tables, especially in TB/PB-level AP scenarios.
Automatic update mechanism
To ensure the timeliness of statistics, OceanBase has a built-in automatic detection and update mechanism that triggers refresh based on the proportion of data modifications.
Conditions that trigger global statistics update
A global statistics update is triggered when any of the following conditions is met:
- Global statistics missing: No global statistics exist for the table.
- Global data stale: The proportion of modified rows across the entire table is ≥ 10% (controlled by
STALE_PERCENT). - Any partition statistics stale: Even if the global data has not changed, a global statistics update is triggered whenever any partition's statistics become stale, to ensure consistency.
Update process:
- Global basic statistics: Derived from partition-level statistics (min, max, NDV, and so on).
- Global histograms: Must be collected separately, because histograms currently cannot be derived from partition-level results.
Conditions that trigger partition-level statistics update
Statistics for a single partition are updated when:
- Partition statistics missing: No statistics exist for that partition.
- Partition data stale: The proportion of modified rows in that partition is ≥ 10% (controlled by
STALE_PERCENT).
Update process:
- Re-collect statistics for the stale partition.
- Re-collect global statistics.
Configuring STALE_PERCENT
You can flexibly adjust the staleness threshold through the DBMS_STATS package:
-- Set the global default value (affects all new tables)
CALL dbms_stats.set_global_prefs('STALE_PERCENT', '50');
-- Set at the schema level
CALL dbms_stats.set_schema_prefs('MY_SCHEMA', 'STALE_PERCENT', '30');
-- Set at the table level
CALL dbms_stats.set_table_prefs('MY_SCHEMA', 'MY_TABLE', 'STALE_PERCENT', '20');
Automatic detection capabilities
The OceanBase system continuously performs the following detection tasks in the background:
- Statistics status check: Scans all existing global and partition statistics.
- Stale partition detection: Checks whether each partition is stale or missing statistics.
- Missing statistics detection: Automatically marks missing entries and adds them to the collection queue.
- Independent update triggering: Supports independent triggering of global and partition statistics updates without mutual interference.
Key parameters
Parameter |
Default value |
Description |
|---|---|---|
STALE_PERCENT |
10 (10%) | A parameter for stored procedures (set_schema_prefs, set_table_prefs) that specifies the modification threshold for determining whether statistics are stale |
| Default collection granularity | APPROX_GLOBAL AND PARTITION |
Applies only to partitioned tables; non-partitioned tables use GLOBAL (implemented through the GATHER_INDEX_STATS procedure of the DBMS_STATS package) |
Comparison with non-partitioned tables
Feature |
Partitioned table |
Non-partitioned table |
|---|---|---|
| Statistics type | Global + Partition | Global only |
| Source of global basic statistics | Derived from partition statistics | Generated by full table scan |
| Partition pruning optimization | Yes | No |
| Default collection granularity | APPROX_GLOBAL AND PARTITION |
GLOBAL |
Example: managing statistics for a partitioned table in OceanBase
This example is based on OceanBase Database (MySQL mode). It demonstrates how to create a partitioned table, view its statistics collection and automatic update behavior, and shows how global statistics are derived from partition statistics.
Step 1: Create a test partitioned table
-- Create the database and user (if they do not already exist)
CREATE DATABASE IF NOT EXISTS sales_db;
USE sales_db;
-- Create a RANGE-partitioned sales table
CREATE TABLE sales (
id BIGINT NOT NULL,
region VARCHAR(50),
amount DECIMAL(10,2),
sale_date DATE,
PRIMARY KEY (id, sale_date)
) PARTITION BY RANGE COLUMNS(sale_date) (
PARTITION p2023_q1 VALUES LESS THAN ('2023-04-01'),
PARTITION p2023_q2 VALUES LESS THAN ('2023-07-01'),
PARTITION p2023_q3 VALUES LESS THAN ('2023-10-01'),
PARTITION p2023_q4 VALUES LESS THAN ('2024-01-01')
);
Step 2: Insert initial data
-- Insert test data into each partition
INSERT INTO sales VALUES
(1, 'North', 1500.00, '2023-01-15'),
(2, 'South', 2300.50, '2023-02-20'),
(3, 'East', 980.75, '2023-04-10'),
(4, 'West', 3200.00, '2023-06-25'),
(5, 'North', 1800.25, '2023-07-05'),
(6, 'South', 2100.00, '2023-09-18');
Step 3: Collect statistics (default granularity: APPROX_GLOBAL AND PARTITION)
CALL DBMS_STATS.GATHER_TABLE_STATS(
ownname => 'sales_db',
tabname => 'sales',
method_opt => 'FOR ALL COLUMNS SIZE AUTO',
granularity => 'APPROX_GLOBAL AND PARTITION'
);
Step 4: View statistics
View global table-level statistics
SELECT
TABLE_NAME,
OBJECT_TYPE,
NUM_ROWS,
AVG_ROW_LEN,
LAST_ANALYZED
FROM oceanbase.DBA_TAB_STATISTICS
WHERE OWNER = 'SALES_DB' AND TABLE_NAME = 'SALES'
ORDER BY OBJECT_TYPE;
Expected output:
+------------+-------------+----------+-------------+----------------------------+
| TABLE_NAME | OBJECT_TYPE | NUM_ROWS | AVG_ROW_LEN | LAST_ANALYZED |
+------------+-------------+----------+-------------+----------------------------+
| sales | PARTITION | 2 | 73 | 2026-01-14 10:45:58.067339 |
| sales | PARTITION | 2 | 72 | 2026-01-14 10:45:58.067339 |
| sales | PARTITION | 2 | 73 | 2026-01-14 10:45:58.067339 |
| sales | PARTITION | 0 | 0 | 2026-01-14 10:45:58.067339 |
| sales | TABLE | 6 | 72 | 2026-01-14 10:45:58.079250 |
+------------+-------------+----------+-------------+----------------------------+
5 rows in set
View global column-level statistics (such as the amount column)
SELECT
COLUMN_NAME,
NUM_DISTINCT,
LOW_VALUE,
HIGH_VALUE,
NUM_NULLS,
HISTOGRAM
FROM oceanbase.DBA_TAB_COL_STATISTICS
WHERE OWNER = 'SALES_DB'
AND TABLE_NAME = 'SALES'
AND COLUMN_NAME = 'AMOUNT';
Expected output:
+-------------+--------------+-----------+------------+-----------+-----------+
| COLUMN_NAME | NUM_DISTINCT | LOW_VALUE | HIGH_VALUE | NUM_NULLS | HISTOGRAM |
+-------------+--------------+-----------+------------+-----------+-----------+
| amount | 6 | 980.75 | 3200.00 | 0 | NULL |
+-------------+--------------+-----------+------------+-----------+-----------+
1 row in set
View partition-level column statistics (such as the amount column in partition p2023_q1)
SELECT
PARTITION_NAME,
COLUMN_NAME,
NUM_DISTINCT,
LOW_VALUE,
HIGH_VALUE,
NUM_NULLS
FROM oceanbase.DBA_PART_COL_STATISTICS
WHERE OWNER = 'SALES_DB'
AND TABLE_NAME = 'SALES'
AND PARTITION_NAME = 'P2023_Q1'
AND COLUMN_NAME = 'AMOUNT';
Expected output:
+----------------+-------------+--------------+-----------+------------+-----------+
| PARTITION_NAME | COLUMN_NAME | NUM_DISTINCT | LOW_VALUE | HIGH_VALUE | NUM_NULLS |
+----------------+-------------+--------------+-----------+------------+-----------+
| p2023_q1 | amount | 2 | 1500.00 | 2300.50 | 0 |
+----------------+-------------+--------------+-----------+------------+-----------+
1 row in set
Step 5: Verify the automatic update mechanism (optional)
-- Insert new data into p2023_q4 (simulating >10% modification)
INSERT INTO sales
SELECT 1000 + n, 'Central', ROUND(RAND()*5000, 2), '2023-11-01'
FROM (SELECT @n := @n + 1 AS n FROM information_schema.columns a, information_schema.columns b, (SELECT @n := 0) r LIMIT 50) t;
-- Wait for background automatic detection (or trigger manually)
CALL DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO();
-- At this point, if the modification proportion ≥ STALE_PERCENT (default 10%), the system will automatically mark it as stale and may trigger collection
-- Query again to check whether LAST_ANALYZED has been updated
SELECT PARTITION_NAME, LAST_ANALYZED
FROM oceanbase.DBA_TAB_STATISTICS
WHERE OWNER = 'SALES_DB' AND TABLE_NAME = 'SALES'
ORDER BY PARTITION_NAME;
Expected output:
+----------------+----------------------------+
| PARTITION_NAME | LAST_ANALYZED |
+----------------+----------------------------+
| NULL | 2026-01-14 10:45:58.079250 |
| p2023_q1 | 2026-01-14 10:45:58.067339 |
| p2023_q2 | 2026-01-14 10:45:58.067339 |
| p2023_q3 | 2026-01-14 10:45:58.067339 |
| p2023_q4 | 2026-01-14 10:45:58.067339 |
+----------------+----------------------------+
5 rows in set
Related topics
For related PL procedure references, see the following topics:
