Overview
As data volumes grow rapidly, optimizing query performance and resource utilization has become a key objective in database management. degree of parallelism (DOP) is a critical metric for evaluating parallel execution capabilities, and can significantly enhance query response times and system throughput.
Parallel queries improve SQL query efficiency by executing tasks simultaneously across multiple processing units. The core principle is to optimize query plans and fully utilize the system's computing resources and I/O capabilities, thereby accelerating the execution of individual queries.
Characteristics of parallel queries
Differences between OLTP and OLAP
Different types of SQL queries exhibit distinct characteristics in parallel processing:
- Online transaction processing (OLTP): Involves a large number of short transactions, each processing a small amount of data. There is limited need for parallelism, with greater emphasis on fast transaction response.
- Online analytical processing (OLAP): Tends to handle large-scale data analysis tasks, processing massive datasets. OLAP relies heavily on parallel queries to boost throughput and efficiency.
Query decomposition and operator parallelism
In parallel queries, the query is decomposed into multiple operators, with parallel processing implemented at different stages. These stages include:
- Data reading: Scanning data in parallel.
- Computation execution: Multiple processing units simultaneously participate in computations, accelerating complex operations such as aggregation and sorting.
- Result output: Optimizing the merging and returning of data.
Optimize parallel queries
Generate a distributed query plan
Distributed execution is a crucial component of parallel queries. In OceanBase Database, the generation of a distributed query plan determines the efficiency of query execution. By analyzing data distribution, operator dependencies, and resource usage, the system can create an efficient execution plan.
For more information, see: Generate a distributed plan.
Adjust the DOP
Appropriately adjusting the DOP based on the query complexity and data volume is crucial for optimizing query performance:
- Low DOP: Inadequate utilization of system resources, leading to prolonged query execution time.
- High DOP: Risk of resource contention, potentially reducing system throughput.
By monitoring query execution plans and analyzing execution time and resource usage, the DOP can be precisely adjusted to achieve the optimal balance.
How do you adjust the degree of parallelism?
To accelerate queries and meet business requirements, parallel execution is often used to optimize query performance. But how much parallelism is appropriate? In the optimizer, the amount of parallel resources is described using the degree of parallelism (DOP). In practical scenarios, whether to enable parallelism and how to set the DOP are typically determined based on query execution characteristics and business needs, guided by experience.
OceanBase Database provides both manual and automatic methods for setting the degree of parallelism.
Manually specify the optimizer's degree of parallelism
You can manually set the degree of parallelism. For more information, see Set the degree of parallelism.
Automatic optimizer-driven DOP selection
Auto DOP
When you manually specify the degree of parallelism (DOP), parallel execution can be enabled for all queries in the current session by using a system variable, and the DOP can be specified. However, this approach can introduce unnecessary parallel execution overhead for queries that do not require parallel acceleration or a high DOP, potentially leading to performance degradation. Alternatively, you can specify the DOP for individual queries using hints, but this requires evaluating each query separately, which becomes impractical in environments with a large number of queries.
To overcome the inconvenience and limitations of manually specifying the DOP, the query optimizer provides the auto DOP feature. This feature evaluates the estimated execution time of a query during query plan generation and automatically determines whether to enable parallelism and the appropriate DOP. By doing so, it avoids the performance issues that can arise from manually setting the DOP.
Notice
- After enabling auto DOP, you need to adjust the settings of two related parameters,
parallel_min_scan_time_thresholdandparallel_degree_limit. Otherwise, auto DOP may have adverse effects on the AP system.
- auto DOP is based solely on the estimated query execution cost to reduce response time (RT). If there are slow queries in the workload that users do not prioritize, resource isolation measures should be implemented. Otherwise, enabling auto DOP and applying the parallelism determined by auto DOP for all queries may negatively impact the system.
Enable auto DOP
The optimizer's DOP selection strategy currently has two modes: AUTO and MANUAL. In AUTO mode, the optimizer can automatically determine whether to enable parallelism and the appropriate DOP based on the actual query conditions. In MANUAL mode, parallelism can only be enabled through manual intervention.
The optimizer's DOP selection strategy can be configured at the global or session level using the system variable parallel_degree_policy, or at the query level using a higher-priority hint.
Notice
After you modify a system variable at the global level, the modification takes effect only on new connections, and existing connections must be closed.
-- Enable auto DOP at the global level.
set global parallel_degree_policy = AUTO;
-- Enable auto DOP at the session level.
set session parallel_degree_policy = AUTO;
set parallel_degree_policy = AUTO;
-- Disable auto DOP at the global level.
set global parallel_degree_policy = MANUAL;
-- Disable auto DOP at the session level.
set session parallel_degree_policy = MANUAL;
set parallel_degree_policy = MANUAL;
Hints take precedence over system variables. You can use hints to enable or disable auto DOP at the query level.
-- Enable auto DOP at the query level.
select /*+parallel(auto)*/ * from t1;
-- Disable auto DOP at the query level.
select /*+parallel(manual)*/ * from t1;
select /*+parallel(8)*/ * from t1;
Parameter settings
The DOP determined by auto DOP for a query is influenced by the following two system variables. You can also modify these variables based on different deployment resources and business workload.
Notice
After you modify a system variable at the global level, the modification takes effect only on new connections, and existing connections must be closed.
parallel_min_scan_time_threshold: specifies the threshold for estimated scan cost in milliseconds (ms) for enabling parallelism on base tables. This variable affects whether to enable parallelism for a query and the DOP used. The default value is 1000, and the minimum value is 10.
Scenarios:
You can modify this variable to control the number of parallel queries and the DOP in the AP system. After you enable auto DOP, you can decrease the value of parallel_min_scan_time_threshold to enable parallelism for more queries and increase the DOP for existing parallel queries. In the opposite case, you can increase the value of this variable.
The default value of this variable is relatively conservative. If you enable auto DOP, you must evaluate the system workload and set this variable to an appropriate value smaller than the default value, to avoid excessive parallelism for a large number of queries, which can consume excessive system resources.
Examples:
-- Set the estimated scan cost threshold for enabling parallelism on base tables in the current session to 10 ms.
set parallel_min_scan_time_threshold = 10;
-- Set the estimated scan cost threshold for enabling parallelism on base tables globally to 10 ms.
set global parallel_min_scan_time_threshold = 10;
explain select /*+parallel(auto)*/ * from t1;
-- Set the variable to 20. The final DOP is 4. The estimated scan cost for base tables is 12 ms after parallelism is enabled.
set parallel_min_scan_time_threshold = 20;
=========================================================
|ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)|
---------------------------------------------------------
|0 |PX COORDINATOR | |512000 |619089 |
|1 |└─EXCHANGE OUT DISTR |:EX10000|512000 |230501 |
|2 | └─PX BLOCK ITERATOR| |512000 |12189 |
|3 | └─TABLE FULL SCAN|t1 |512000 |12189 |
=========================================================
Outputs & filters:
-------------------------------------
0 - output([INTERNAL_FUNCTION(t1.c1, t1.c2, t1.c3, t1.c4)]), filter(nil), rowset=256
1 - output([INTERNAL_FUNCTION(t1.c1, t1.c2, t1.c3, t1.c4)]), filter(nil), rowset=256
dop=4
-- Set the variable to 10. The final DOP is 6. The estimated scan cost for base tables is 8 ms after parallelism is enabled.
set parallel_min_scan_time_threshold = 10;
=========================================================
|ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)|
---------------------------------------------------------
|0 |PX COORDINATOR | |512000 |542256 |
|1 |└─EXCHANGE OUT DISTR |:EX10000|512000 |153667 |
|2 | └─PX BLOCK ITERATOR| |512000 |8126 |
|3 | └─TABLE FULL SCAN|t1 |512000 |8126 |
=========================================================
Outputs & filters:
-------------------------------------
0 - output([INTERNAL_FUNCTION(t1.c1, t1.c2, t1.c3, t1.c4)]), filter(nil), rowset=256
1 - output([INTERNAL_FUNCTION(t1.c1, t1.c2, t1.c3, t1.c4)]), filter(nil), rowset=256
dop=6
parallel_degree_limit: specifies the maximum DOP allowed after auto DOP is enabled. The default value is 0. In this case, the optimizer limits the maximum DOP based on the number of CPU cores of the tenant and the value of the parallel_servers_target variable.
Scenarios:
After you enable auto DOP in an AP system, the workload can be high if a large number of queries require high DOPs. In this case, you can set the parallel_degree_limit variable to a value smaller than the number of CPU cores to limit the maximum DOP allowed.
If only one AP query is being executed in the system, retain the default value of this variable so that the maximum DOP allowed for the query is equal to the number of CPU cores. If you want to increase the DOP when auto DOP is enabled, you can set this variable to a value larger than the number of CPU cores to remove the restriction on the DOP by the number of CPU cores. This modification is supported only for ultimate performance optimization for a single AP query. It is not allowed in a production environment.
Examples:
-- Set the maximum DOP allowed for parallelism determined by auto DOP in the current session to 8.
set parallel_degree_limit = 8;
-- Set the maximum DOP allowed for parallelism determined by auto DOP globally to 8.
set global parallel_degree_limit = 8;
For more information about auto DOP, see Auto DOP.
Parallel execution
Parallel execution refers to the process of dividing a large task into multiple smaller tasks and using multiple threads or processes to concurrently process these smaller tasks. This approach allows for the efficient utilization of more CPU and I/O resources to reduce operation response time.
Parallel execution can be further divided into parallel query, parallel DDL, and parallel DML.
You can start parallel execution in the following ways:
- Specify the degree of parallelism (DOP) or enable auto DOP.
- For partitioned tables with more than one query partition, parallel query is automatically started.
For more information, see Execution of parallel queries.