What is query acceleration
In analytical processing (AP) scenarios, a single query often needs to scan a large amount of data, perform complex aggregations and joins, and the query response time (RT) and resource utilization directly determine the analysis efficiency and business experience. Query acceleration refers to the process of optimizing the optimizer, parallel execution, and targeted configuration to make analytical SQL run faster, more stably, and with less resource consumption. This allows for shorter response times and more predictable resource usage without significantly rewriting business SQL.
How to accelerate queries
OceanBase Database accelerates queries in AP scenarios through two main approaches:
- On the optimizer side: The optimizer generates optimal execution plans based on accurate and timely statistics (including rowstore/columnstore path selection, distributed plans, and SkipIndex utilization), avoiding high-cost plans such as full table scans on large tables and incorrect join orders.
- On the execution side: Parallel queries (multi-threaded or multi-node collaboration) fully utilize CPU and I/O resources, reducing single-query RT. Combined with appropriate parallelism (manual DOP or auto DOP), it accelerates queries while controlling resource contention.
What are the methods for query acceleration
This section provides actionable configuration and tuning methods based on the two approaches mentioned above, along with table structure and index design considerations. The relevant documentation is as follows:
| Method | Documentation | Description |
|---|---|---|
| Table structure and index design | Data table design and query optimization in AP scenarios | Explains how to design data tables to lay a solid foundation for query acceleration in AP scenarios, from storage format selection, primary key and partition design, to index usage in HTAP scenarios. This provides a better physical basis for subsequent statistics and parallel configurations. |
| Statistics | Statistics | Introduces the role of optimizer statistics in AP scenarios, as well as table-level and column-level statistics, histograms, automatic and manual collection, and online collection mechanisms. It emphasizes AP-specific optimization strategies: window maintenance configuration, statistics collection strategies for large tables (parallelism, histograms, partition granularity, sampling, and locking), ensuring the optimizer has real-time and accurate statistics to generate high-quality execution plans. |
| Parallelism (DOP) | Query parallelism selection | Introduces the concept of parallel queries (DOP), differences in parallelism requirements between OLTP and OLAP, and distributed plans and operator parallelism. It explains how to accelerate analytical queries using manual DOP or auto DOP, and provides configuration recommendations for auto DOP, such as parallel_min_scan_time_threshold and parallel_degree_limit, to balance RT improvement and resource control. |
These documents can be used in conjunction with OceanBase Database's general SQL optimization, distributed execution plans, and parallel execution sections for a deeper understanding of plan generation, cost models, and execution details.
When to accelerate queries
In the following scenarios, it is recommended to actively configure and tune query acceleration:
- Analyzing large tables or partitioned tables: When table/partition data volume is very large, full table or large range scans take a long time. Accurate statistics are needed to select rowstore/columnstore paths and scan ranges, and parallelism can be used to improve throughput and reduce RT.
- After batch imports or large-scale updates: Significant data changes may result in outdated or inaccurate default statistics, leading to suboptimal execution plans. Online statistics collection, window maintenance, or manual collection can be used, and parallelism can be enabled or adjusted as needed.
- When analysis query RT does not meet requirements: If the bottleneck is identified as plans or parallel resources, check if statistics are up-to-date and accurate, and consider enabling auto DOP or increasing manual DOP and related parameters.
- When resource contention occurs due to concurrent queries: High system load after enabling parallelism may require limiting parallelism thresholds and upper limits using parameters like
parallel_degree_limitandparallel_min_scan_time_threshold, or using resource groups/tenant isolation for overall control.
For the above scenarios, you can refer to the Data table design and query optimization in AP scenarios, Statistics, and Query parallelism selection documents mentioned in the previous section. Follow the order of "table structure, then statistics and parallelism" to configure and tune step by step. This will help you achieve better execution plans, shorter response times, and more controlled resource usage in AP scenarios.
References
For more information about execution plans, cost models, and parallel execution principles, see the SQL optimization, Distributed execution plans, and Parallel execution sections of OceanBase Database.
