Overview of statistics

2024-04-19 08:42:50  Updated

In OceanBase Database, "statistics" refer to optimizer statistics, which are a collection of data that describes the table and column information in the database.

The optimizer cost model selects a plan based on the statistics on objects such as tables, columns, and predicates involved in a query. The optimizer uses statistics to optimize plan selection. Therefore, statistics are the key to selecting the optimal execution plan in the cost model.

In the OceanBase Database optimizer, statistics are stored as common data in internal tables. The optimizer maintains the local cache of statistics to speed up access to statistics.

OceanBase Database supports table-level statistics and column-level statistics.

Table-level statistics

The statistics on a table include the following information:

  • Basic table information, including tenant_id, table_id, and partition_id

  • Table statistics type, including GLOBAL, PARTITION, and SUBPARTITION

  • Number of table rows

  • Number of macroblocks occupied by the table

  • Number of microblocks occupied by the table

  • Average length of rows in the table

  • Timestamp when the statistics were collected

  • Lock status of table-level statistics

Column-level statistics

The statistics on a column include the following information:

  • Basic information of the column, including tenant_id, table_id, partition_id, and column_id

  • Column statistics type, including GLOBAL, PARTITION, and SUBPARTITION

  • Number of distinct values (NDV) in the column

  • Number of NULL values in the column

  • Maximum and minimum values in the column

  • Size of the sampled data in the column

  • Histogram density of the column

  • Number of histogram buckets for the column

  • Histogram type (frequency/TopK/hybrid histogram)

Contact Us