As we all know, statistical information is collected before plan generation. Therefore, if a significant number of insert, update, or delete operations occur after the statistical information is collected and it is not promptly updated, the plan generation may rely on outdated statistical information, resulting in inaccurate row estimation and the inability to select the optimal execution plan. To address this issue, OceanBase Database's optimizer implements a real-time row estimation mechanism based on the storage layer to obtain real-time and accurate statistical information.
When generating a plan, OceanBase Database creates multiple base table scan paths, known as "base table paths", based on table indexes. As the storage layer of OceanBase Database uses a tree structure to organize indexes, during plan generation, multiple query ranges are generated for each base table path based on the table's predicates. This process allows for the quick scanning and retrieval of the specified data area, thereby avoiding unnecessary data scans. Leveraging this feature, OceanBase Database's optimizer sends the relevant query ranges from the base table path to the storage layer in advance during plan generation. Subsequently, the storage layer rapidly estimates the total number of rows in the query ranges and returns this information to the optimizer. This facilitates the optimizer in obtaining real-time and accurate row estimates, a process known as storage layer-based row estimation.
However, not all base table paths can be used to estimate the number of rows based on the storage layer. This is because row estimation based on the storage layer can be time-consuming. To manage overall time consumption during plan generation, the following two restrictions are placed on storage layer-based row estimation:
The number of query ranges on the base table path must not exceed 10. Otherwise, storage layer-based row estimation is not performed.
For partitioned tables, only one partition can be retained on a base table path for scanning after pruning. Otherwise, storage layer-based row estimation cannot be performed. Non-partitioned tables do not have this restriction.
Note that row estimation based on the storage layer applies only to base table paths and does not apply to other operators, such as JOIN and GROUP BY.