What is query acceleration
In analytical processing (AP) scenarios, a single query often needs to scan massive data and perform complex aggregations and joins. The query response time (RT) and resource utilization directly determine the analysis efficiency and business experience. Query acceleration refers to optimizing the optimizer, parallel execution, and targeted configurations to make analytical SQL run faster, more stably, and with fewer resources. This allows for shorter response times and more predictable resource usage without significantly rewriting business SQL.
How to accelerate queries
OceanBase Database accelerates queries in AP scenarios through the following two approaches:
- On the optimizer side: The optimizer generates better execution plans based on accurate and timely statistics (including rowstore/columnstore path selection, distributed plans, and SkipIndex utilization), avoiding high-cost plans such as full table scans on large tables and incorrect join orders.
- On the execution side: Parallel queries (multi-threaded or multi-machine collaboration) fully utilize CPU and I/O resources, shortening single-query RT. Combined with appropriate parallelism (manual DOP or auto DOP), query acceleration is achieved while controlling resource contention.
What are the methods for query acceleration
This section provides operational configuration and optimization methods based on the two approaches mentioned above, combined with table structure and index design. The relevant documents are as follows:
| Method | Document | Description |
|---|---|---|
| Table structure and index design | Data table design and query optimization in AP scenarios | From the perspectives of storage format selection, primary key and partition design, and index usage in HTAP scenarios, this document explains how to lay a solid foundation for AP query acceleration through reasonable data table design, providing a better physical basis for subsequent statistics and parallel configurations. |
| Statistics | Statistics | This document introduces the role of optimizer statistics in AP scenarios, as well as table-level and column-level statistics, histograms, automatic and manual collection, and online collection mechanisms. It focuses on AP optimization strategies: window configuration, statistics collection strategies for ultra-large tables (parallelism, histograms, partition granularity, sampling, and locking), ensuring the optimizer has real-time and accurate statistics to generate high-quality execution plans. |
| Parallelism (DOP) | Query parallelism selection | This document introduces the concept of parallel queries (DOP), the differences in parallelism requirements between OLTP and OLAP, and distributed plans and operator parallelism. It explains how to accelerate analytical queries through manual DOP settings or Auto DOP, and provides guidance on enabling Auto DOP and configuring key parameters (such as parallel_min_scan_time_threshold and parallel_degree_limit) to balance improved RT with resource control. |
These documents can be used in conjunction with OceanBase Database's general SQL optimization, distributed execution plans, and parallel execution sections to further understand plan generation, cost models, and execution details.
When to accelerate queries
In the following scenarios, it is recommended to actively configure and optimize query acceleration:
- Analyzing large tables or partitioned tables: When the table or partition data volume is very large, full table or large-scale scans take a long time. Accurate statistics are needed to select rowstore/columnstore paths and scan ranges, and parallelism can be used to improve throughput and shorten RT.
- After bulk import or large-scale updates: Significant data changes may result in outdated or inaccurate default statistics, leading to suboptimal execution plans. Online statistics collection, window maintenance, or manual collection can be used, and parallelism can be enabled or adjusted as needed.
- When analysis query RT does not meet requirements: If the bottleneck is identified as plan or parallel resources, check if statistics are up-to-date and accurate, and evaluate whether to enable Auto DOP or increase manual DOP and related parameters.
- When resource contention occurs due to concurrent queries: High system load after enabling parallelism requires limiting parallelism thresholds and upper limits through parameters such as
parallel_degree_limitandparallel_min_scan_time_threshold, or using resource groups/tenant isolation for overall control.
For these scenarios, you can refer to the Data table design and query optimization in AP scenarios, Statistics, and Query parallelism selection documents mentioned above, and configure and optimize in the order of "table structure, then statistics and parallelism" to achieve better execution plans, shorter response times, and more controllable resource usage in AP scenarios.
References
For more information about execution plans, cost models, and parallel execution principles, see the SQL optimization, Distributed execution plans, and Parallel execution sections of OceanBase Database.
