OceanBase Database provides an SQL query engine similar to that of conventional relational databases. When an SQL query is sent to the OBServer, the SQL engine performs a series of operations such as syntax parsing, semantic analysis, query rewriting, and query optimization before the executor executes the query. The difference is that in distributed databases, the query optimizer generates distributed execution plans based on the data distribution information.
If the data queried is distributed in multiple servers, a distributed execution plan is needed. This is a typical scenario of online analytical processing (OLAP) analytical databases. OLAP scenarios have high requirements on the capability of the query optimizer. Therefore, OceanBase Database has conducted various optimizations, such as operator pushdown, intelligent connection, and partition pruning, on its query optimizer. These queries also involve a large amount of data, so the query execution engine of OceanBase Database is also optimized. Many technologies are applied, including parallel processing, task splitting, dynamic partitioning, pipeline scheduling, task pruning, subtask result merging, and concurrency limiting. Based on these optimizations, OceanBase Database supports complex query processing in OLAP scenarios.
The distributed execution process can be divided into three stages:
- Job tree generation: In this stage, the distributed plan is divided into multiple sub-plans, each of which is encapsulated by a job. The parent-child relationship between multiple jobs is inherited from that between the corresponding operators. These relationships make up a job tree.
- Job scheduling: This stage determines the execution time of each job.
- Job execution: This stage manages the task splitting of each job and the execution order of multiple tasks. Task sending, execution result receiving, and execution error processing are also performed in this stage.