As we all know, statistics are collected before a plan is generated. Therefore, if a large number of ADD, DELETE, and MODIFY operations are performed after statistics collection, and statistics are not collected again before a plan is generated, the optimizer may fail to accurately estimate the number of rows due to the use of expired statistics. As a result, the optimal execution plan is not selected. To resolve this issue, the OceanBase Database optimizer implements a real-time row estimation mechanism based on the storage layer, which obtains real-time and accurate statistics.
When a plan is generated, multiple base table scan paths, which are referred to as "base table paths", are generated based on table indexes. At the storage layer of OceanBase Database, indexes are stored in a tree structure. Therefore, when a plan is generated, multiple query ranges are generated for each base table path based on the predicates of the table, so as to quickly scan and obtain the qualifying data, avoiding useless data scanning. Based on this characteristic, the OceanBase Database optimizer sends the query ranges of base table paths to the storage layer in advance when the plan is generated, and the storage layer quickly estimates the total number of rows of the query ranges and returns it to the optimizer. This way, the optimizer obtains a real-time and accurate number of rows.
However, not all query ranges or partitions are used to estimate the number rows based on the storage layer. This is because row estimation based on the storage layer is time-consuming. To control the overall time consumed for plan generation, OceanBase Database restricts such estimation in the following aspects:
- The number of query ranges for each base table path cannot exceed the value of the system variable
range_index_dive_limit. Otherwise, only the specified number of query ranges are randomly selected for row estimation based on the storage layer. - The number of partitions after partitioned-table pruning cannot exceed the value of the system variable
partition_index_dive_limit. Otherwise, only the specified number of partitions are randomly selected for row estimation based on the storage layer.
Note that row estimation based on the storage layer applies only to base table paths and does not apply to other operators, such as JOIN and GROUP BY.