Collection and update strategy for partitioned table statistics

2026-03-06 07:02:42  Updated

In OceanBase Database, statistics management for partitioned tables is divided into partition-level statistics and global statistics. This mechanism applies only to partitioned tables. For non-partitioned tables (regular single tables), only global statistics are maintained, and there is no concept of partition statistics or any derivation logic.

The statistics mechanism for partitioned tables in OceanBase Database has the following characteristics:

  1. Separation of global and partition statistics: These statistics are stored independently and identified by partition_id.
  2. Default simultaneous collection: With the APPROX_GLOBAL AND PARTITION granularity, both global and partition statistics are collected at the same time.
  3. Automatic detection of missing statistics: The system automatically detects when statistics are missing or outdated and triggers their collection.
  4. Derivation of global statistics from partition statistics: Global basic statistics are derived from the basic statistics of individual partitions, improving the timeliness and accuracy of global statistics.

This topic details the statistics collection strategy and automatic update rules for partitioned tables in OceanBase Database.

Default collection strategy

By default, OceanBase Database collects statistics for all partitioned tables in the APPROX_GLOBAL AND PARTITION granularity (by using the GATHER_INDEX_STATS procedure in the DBMS_STATS package). In this strategy, the system generates two types of statistics: partition-level statistics and global statistics.

Partition-level statistics

The system collects independent statistics for each partition:

  • Basic statistics: such as the minimum (min), maximum (max), and number of distinct values (NDV) of a column.
  • Histograms: used to describe the distribution characteristics of data.

Global statistics

  • Basic statistics (min, max, and number of distinct values): derived from the basic statistics of each partition (for example, the global min is the minimum of the mins of all partitions).
  • Global histograms: cannot be derived from the results of partitions and must be collected by sampling the entire table.

The global basic statistics do not require a full table scan, significantly improving the efficiency of statistics collection for large partitioned tables, especially for AP scenarios at the TB or PB level.

Automatic update mechanism

To ensure the timeliness of statistics, OceanBase Database has an automatic detection and update mechanism. The mechanism triggers a refresh based on the data change ratio.

Triggers for global statistics updates

The following conditions will trigger an update of global statistics:

  • Missing global statistics: the table does not have global statistics.
  • Stale global data: the number of rows modified in the entire table is ≥ 10% (controlled by STALE_PERCENT).
  • Stale partition statistics: even if the global data remains unchanged, if the statistics of any partition are stale, the global statistics will be updated to ensure consistency.

Update process:

  • Global basic statistics: derived from partition-level statistics (min, max, and number of distinct values).
  • Global histograms: collected separately because histograms cannot be derived from partition results.

Triggers for partition-level statistics updates

The statistics of a single partition are updated under the following conditions:

  • Missing partition statistics: the partition does not have statistics.
  • Stale partition data: the number of rows modified in the partition is ≥ 10% (controlled by STALE_PERCENT).

Update process:

  • Recollect the statistics of the expired partition.
  • Recollect the global statistics.

Configuring STALE_PERCENT

Users can flexibly adjust the threshold for staleness using the DBMS_STATS package:

-- Set the global default value (affects all new tables)
CALL dbms_stats.set_global_prefs('STALE_PERCENT', '50');

-- Set at the schema level
CALL dbms_stats.set_schema_prefs('MY_SCHEMA', 'STALE_PERCENT', '30');

-- Set at the table level
CALL dbms_stats.set_table_prefs('MY_SCHEMA', 'MY_TABLE', 'STALE_PERCENT', '20');

Automatic detection capabilities

The OceanBase Database background continuously executes the following detection tasks:

  • Statistics status check: scans all existing global and partition-level statistics.
  • Stale partition detection: checks whether each partition is stale or missing.
  • Missing statistics detection: automatically marks missing items and adds them to the collection queue.
  • Independent trigger updates: supports independent triggers for global and partition-level statistics updates, which do not affect each other.

Key parameters

Parameter Default value Description
STALE_PERCENT 10 (10%) A parameter of stored procedures (set_schema_prefs, set_table_prefs) that specifies the threshold for determining whether statistics are stale
Default collection granularity APPROX_GLOBAL AND PARTITION Applies only to partitioned tables. For non-partitioned tables, it is GLOBAL (implemented by using the GATHER_INDEX_STATS procedure in the DBMS_STATS package)

Comparison with non-partitioned tables

Feature Partitioned table Non-partitioned table (Single table)
Statistics type Global + Partition Only global
Source of global basic statistics Derived from partition statistics Generated by full table scan
Supports partition pruning optimization Yes No
Default collection granularity APPROX_GLOBAL AND PARTITION GLOBAL

Example: Manage statistics of a partitioned table in OceanBase Database

This example is based on OceanBase Database (MySQL-compatible mode) and demonstrates how to create a partitioned table, view the collection and automatic update behavior of its statistics, and show how global statistics can be derived from partition statistics.

Step 1: Create a test partitioned table

-- Create a database and a user (if they do not exist)
CREATE DATABASE IF NOT EXISTS sales_db;
USE sales_db;

-- Create a sales table partitioned by RANGE
CREATE TABLE sales (
    id BIGINT NOT NULL,
    region VARCHAR(50),
    amount DECIMAL(10,2),
    sale_date DATE,
    PRIMARY KEY (id, sale_date)
) PARTITION BY RANGE COLUMNS(sale_date) (
    PARTITION p2023_q1 VALUES LESS THAN ('2023-04-01'),
    PARTITION p2023_q2 VALUES LESS THAN ('2023-07-01'),
    PARTITION p2023_q3 VALUES LESS THAN ('2023-10-01'),
    PARTITION p2023_q4 VALUES LESS THAN ('2024-01-01')
);

Step 2: Insert initial data

-- Insert test data into each partition
INSERT INTO sales VALUES
(1, 'North', 1500.00, '2023-01-15'),
(2, 'South', 2300.50, '2023-02-20'),
(3, 'East', 980.75, '2023-04-10'),
(4, 'West', 3200.00, '2023-06-25'),
(5, 'North', 1800.25, '2023-07-05'),
(6, 'South', 2100.00, '2023-09-18');

Step 3: Collect statistics (default granularity: APPROX_GLOBAL AND PARTITION)

CALL DBMS_STATS.GATHER_TABLE_STATS(
    ownname => 'sales_db',
    tabname => 'sales',
    method_opt => 'FOR ALL COLUMNS SIZE AUTO',
    granularity => 'APPROX_GLOBAL AND PARTITION'
);

Step 4: View statistics

View global table-level statistics

SELECT
    TABLE_NAME,
    OBJECT_TYPE,
    NUM_ROWS,
    AVG_ROW_LEN,
    LAST_ANALYZED
FROM oceanbase.DBA_TAB_STATISTICS
WHERE OWNER = 'SALES_DB' AND TABLE_NAME = 'SALES'
ORDER BY OBJECT_TYPE;

Expected result:

+------------+-------------+----------+-------------+----------------------------+
| TABLE_NAME | OBJECT_TYPE | NUM_ROWS | AVG_ROW_LEN | LAST_ANALYZED              |
+------------+-------------+----------+-------------+----------------------------+
| sales      | PARTITION   |        2 |          73 | 2026-01-14 10:45:58.067339 |
| sales      | PARTITION   |        2 |          72 | 2026-01-14 10:45:58.067339 |
| sales      | PARTITION   |        2 |          73 | 2026-01-14 10:45:58.067339 |
| sales      | PARTITION   |        0 |           0 | 2026-01-14 10:45:58.067339 |
| sales      | TABLE       |        6 |          72 | 2026-01-14 10:45:58.079250 |
+------------+-------------+----------+-------------+----------------------------+
5 rows in set

View global column-level statistics (for example, the amount column)

SELECT
    COLUMN_NAME,
    NUM_DISTINCT,
    LOW_VALUE,
    HIGH_VALUE,
    NUM_NULLS,
    HISTOGRAM
FROM oceanbase.DBA_TAB_COL_STATISTICS
WHERE OWNER = 'SALES_DB'
  AND TABLE_NAME = 'SALES'
  AND COLUMN_NAME = 'AMOUNT';

Expected result:

+-------------+--------------+-----------+------------+-----------+-----------+
| COLUMN_NAME | NUM_DISTINCT | LOW_VALUE | HIGH_VALUE | NUM_NULLS | HISTOGRAM |
+-------------+--------------+-----------+------------+-----------+-----------+
| amount      |            6 | 980.75    | 3200.00    |         0 | NULL      |
+-------------+--------------+-----------+------------+-----------+-----------+
1 row in set

View partition-level column statistics (for example, the amount column in the p2023_q1 partition)

SELECT
    PARTITION_NAME,
    COLUMN_NAME,
    NUM_DISTINCT,
    LOW_VALUE,
    HIGH_VALUE,
    NUM_NULLS
FROM oceanbase.DBA_PART_COL_STATISTICS
WHERE OWNER = 'SALES_DB'
  AND TABLE_NAME = 'SALES'
  AND PARTITION_NAME = 'P2023_Q1'
  AND COLUMN_NAME = 'AMOUNT';

Expected result:

+----------------+-------------+--------------+-----------+------------+-----------+
| PARTITION_NAME | COLUMN_NAME | NUM_DISTINCT | LOW_VALUE | HIGH_VALUE | NUM_NULLS |
+----------------+-------------+--------------+-----------+------------+-----------+
| p2023_q1       | amount      |            2 | 1500.00   | 2300.50    |         0 |
+----------------+-------------+--------------+-----------+------------+-----------+
1 row in set

Step 5: Verify the automatic update mechanism (optional)

-- Insert new data into the p2023_q4 partition (simulate a change of >10%)
INSERT INTO sales
SELECT 1000 + n, 'Central', ROUND(RAND()*5000, 2), '2023-11-01'
FROM (SELECT @n := @n + 1 AS n FROM information_schema.columns a, information_schema.columns b, (SELECT @n := 0) r LIMIT 50) t;

-- Wait for the automatic detection in the background (or manually trigger it)
CALL DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO();
-- If the change ratio is ≥ STALE_PERCENT (default 10%), the system will automatically mark the statistics as stale and may trigger a collection

-- Query the LAST_ANALYZED time to see if it has been updated
SELECT PARTITION_NAME, LAST_ANALYZED
FROM oceanbase.DBA_TAB_STATISTICS
WHERE OWNER = 'SALES_DB' AND TABLE_NAME = 'SALES'
ORDER BY PARTITION_NAME;

Expected result:

+----------------+----------------------------+
| PARTITION_NAME | LAST_ANALYZED              |
+----------------+----------------------------+
| NULL           | 2026-01-14 10:45:58.079250 |
| p2023_q1       | 2026-01-14 10:45:58.067339 |
| p2023_q2       | 2026-01-14 10:45:58.067339 |
| p2023_q3       | 2026-01-14 10:45:58.067339 |
| p2023_q4       | 2026-01-14 10:45:58.067339 |
+----------------+----------------------------+
5 rows in set

References

For more information about the PL/SQL procedures referenced in this example, see:

Contact Us