What is query acceleration
In an analytical processing (AP) scenario, a single query often needs to scan a large amount of data and perform complex aggregations and joins. The query response time (RT) and resource utilization directly determine the analysis efficiency and business experience. Query acceleration refers to the process of optimizing the optimizer, parallel execution, and targeted configurations to make analytical SQL run faster, more stably, and with less resource consumption. This allows for shorter response times and more predictable resource usage without significantly rewriting business SQL.
How does query acceleration work
In an AP scenario, OceanBase Database achieves query acceleration through the following two approaches:
- On the optimizer side: The optimizer generates optimal execution plans based on accurate and timely statistics (including path selection for rowstore and columnstore, distributed plans, and SkipIndex utilization), avoiding high-cost plans such as full table scans on large tables or incorrect join orders.
- On the execution side: Parallel queries (multi-threaded or multi-machine collaboration) fully utilize CPU and I/O resources, reducing single-query RT. Combined with appropriate parallelism (manual DOP or auto DOP), query acceleration is achieved while controlling resource contention.
What are the methods for query acceleration
This section provides actionable configuration and tuning methods based on the two approaches mentioned above, along with table structure and index design. The relevant documents are as follows:
Method |
Document |
Description |
|---|---|---|
| Table structure and index design | Data table design and query optimization in AP scenarios | From the perspectives of storage format selection, primary key and partition design, and index usage in HTAP scenarios, this document explains how to lay a solid foundation for AP query acceleration through reasonable data table design, providing a better physical basis for subsequent statistics and parallel configurations. |
| Statistics | Statistics | This document introduces the role of optimizer statistics in AP scenarios, including table-level and column-level statistics, histograms, automatic and manual collection, and online collection. It focuses on AP optimization strategies: window maintenance configuration, statistics collection strategies for large tables (parallelism, histograms, partition granularity, sampling, and locking), ensuring that the optimizer has real-time and accurate statistics to generate high-quality execution plans. |
| Parallelism (DOP) | Query parallelism | This document introduces the concept of parallel queries (DOP), the differences in parallelism requirements between OLTP and OLAP, and distributed plans and operator parallelism. It explains how to accelerate analytical queries through manual DOP settings or Auto DOP, and provides guidance on enabling Auto DOP and configuring key parameters (such as parallel_min_scan_time_threshold and parallel_degree_limit) to balance RT improvement and resource control. |
These documents can be used in conjunction with OceanBase Database's general SQL optimization, distributed execution plans, and parallel execution sections to further understand plan generation, cost models, and execution details.
When to perform query acceleration
In the following scenarios, it is recommended to actively configure and tune for query acceleration:
- Analyzing large tables or partitioned tables: When the table or partition contains a large amount of data, full table or large-scale scans take a long time. Accurate statistics are needed to select between rowstore and columnstore paths and scan ranges, and parallelism can be used to improve throughput and reduce RT.
- After bulk import or large-scale updates: Significant data changes may result in outdated or inaccurate default statistics, leading to suboptimal execution plans. Online statistics collection, window maintenance, or manual collection can be used, and parallelism can be enabled or adjusted as needed.
- When analytical query RT does not meet requirements: If the bottleneck is identified as plan or parallel resources, check if the statistics are up-to-date and accurate. Then, evaluate whether to enable Auto DOP or increase manual DOP and related parameters.
- When resource contention occurs due to concurrent queries: High system load after enabling parallelism requires limiting parallelism thresholds and upper limits through parameters like
parallel_degree_limitandparallel_min_scan_time_threshold, or using resource groups/tenant isolation for overall control.
For these scenarios, you can refer to the Data table design and query optimization in AP scenarios, Statistics, and Query parallelism documents mentioned earlier, following the order of "table structure, then statistics and parallelism" to configure and tune, achieving better execution plans, shorter response times, and more controlled resource usage in AP scenarios.
References
For more information about execution plans, cost models, and parallel execution principles, see the SQL optimization, Distributed execution plans, and Parallel execution sections of OceanBase Database.
