The performance of parallel queries (PX) in OceanBase Database is controlled by two types of parameters (variables or configuration parameters):
- Parallelism variables: These determine the number of worker threads used during query execution.
- EXCHANGE (Shuffle) configuration parameters: These control the parameters for data transmission between Data Flow Operators (DFOs), which is also known as data shuffling.
Core logic of parameter configuration:
- Parallelism parameter tuning: The
parallel_servers_targetparameter controls the size of the PX resource pool in the cluster. If PX queries are frequently queued, you can increase the parameter value. - Shuffle parameter tuning: The
dtl_buffer_sizeparameter is generally suitable for most scenarios. You can only adjust this parameter for special requirements, such as low-frequency, large-row transfers. However, we do not recommend that you modify this parameter.
Parallelism parameters
Core parameter: parallel_servers_target
This parameter specifies the maximum number of available parallel workers in the cluster. If the requested degree of parallelism (DOP) of a query plus the number of currently allocated workers exceeds this value, the query is queued.
Parameter |
Description |
Value range |
Default value |
Configuration recommendation |
|---|---|---|---|---|
parallel_servers_target |
Specifies whether a parallel query needs to be queued. If the number of workers required by the query plus the total number of currently allocated workers is less than or equal to the parameter value, the query is directly executed. Otherwise, the query is queued. | [0, 9223372036854775807] | 0 (The value is calculated based on the number of CPU cores, and the actual value may vary.) | This parameter mainly controls whether a parallel query is queued when there are not enough workers to handle it. |
Example: View the value of the parallelism parameter
obclient> SHOW VARIABLES LIKE '%paral%';
+-------------------------+-------+
| Variable_name | Value |
+-------------------------+-------+
| parallel_servers_target | 4 |
+-------------------------+-------+
1 rows in set
Example description:
- In the example,
parallel_servers_target=4indicates that the cluster allows a maximum of 4 parallel workers to run simultaneously. - If a query request has a DOP of 3 and currently uses 2 workers, the query is directly executed. If the query request has a DOP of 3 and currently uses 3 workers, the query is queued.
EXCHANGE (Shuffle) parameters
EXCHANGE (Shuffle) parameters are mainly used to control the Data Transfer Layer (DTL) parameters for data shuffling between Data Flow Operators (DFOs), such as the buffer size during data transmission. OceanBase Database implements cross-node data transmission through the DTL module.
Core parameter: dtl_buffer_size
This parameter specifies the size of the buffer for data transmission between EXCHANGE operators, such as Transmit and Receive. When the data reaches this threshold, data transmission is triggered to reduce the overhead of sending data row by row.
Parameter |
Description |
Value range |
Default value |
Configuration recommendation |
|---|---|---|---|---|
dtl_buffer_size |
Specifies the size of the buffer for data transmission between EXCHANGE operators, such as Transmit and Receive. That is, data is sent only when it reaches the buffer size threshold, reducing the overhead of sending data row by row. |
[4K, 2M] | 64K | In PX scenarios, data transmission between EXCHANGE operators depends on the value of this parameter. Generally, you do not need to modify this parameter, because the default value is optimized. If you want to reduce the number of data transmissions, you can try modifying this parameter. However, we do not recommend that you modify this parameter. |
Example: View the value of the Shuffle parameter
obclient> SHOW PARAMETERS LIKE '%dtl%'\G
*************************** 1. row ***************************
zone: zone1
svr_type: observer
svr_ip: 172.xx.xxx.xxx
svr_port: 2882
name: dtl_buffer_size
data_type: CAPACITY
value: 64K
info: to be removed
section: OBSERVER
scope: CLUSTER
source: DEFAULT
edit_level: DYNAMIC_EFFECTIVE
1 row in set
Example description:
dtl_buffer_size=64Kindicates that data is sent when it reaches 64 KB.- Adjusting this parameter requires a trade-off: increasing the value reduces the number of data transmissions but increases memory usage, while decreasing the value increases the frequency of data transmissions but reduces memory pressure.
Related concepts
Data Flow Operator (DFO):
- A data flow operator that coordinates interactions between PX operators.
EXCHANGEoperators perform data shuffling between DFOs.
Data Transfer Layer (DTL):
- The data transfer layer that handles cross-node data transmission.
dtl_buffer_sizeaffects the transmission efficiency and memory usage of DTL.
