The parameters (variables or configuration items) of OceanBase Database's parallel query (PX) determine the speed of parallel queries. These parameters mainly include those related to parallelism and EXCHANGE.
Parallelism variables
The parallelism-related variables mainly determine the number of Workers when a query is executed concurrently. The following table describes these variables in detail.
Variable |
Description |
Value range |
Default value |
Configuration suggestion |
|---|---|---|---|---|
| parallel_servers_target | Controls whether the query needs to wait in the queue before it is executed. If the sum of the required parallelism and the number of Workers exceeds the value of this variable, the query needs to wait in the queue. Otherwise, the query is executed. | [0, 1800] | 10 (The value is calculated based on the number of CPU cores.) | This variable mainly controls whether to continue or wait in the queue when the number of Workers is insufficient to process the query in the PX scenario. |
| _force_parallel_query_dop | Specifies the default parallelism of a query SQL statement in the current session. If no PARALLEL hint is specified, the parallelism of the query SQL statement is controlled by this variable. | [1, +∞] | 1 | This variable is recommended for scenarios where a batch of parallel query SQL statements needs to be executed in the same session without manually adding hints to each SQL statement. |
| _force_parallel_dml_dop | Specifies the default parallelism of a DML SQL statement in the current session. If no PARALLEL hint is specified, the parallelism of the DML SQL statement is controlled by this variable. | [1,+∞] | 1 | This variable is recommended for scenarios where a batch of parallel DML SQL statements needs to be executed in the same session without manually adding hints to each SQL statement. |
You can run the SHOW VARIABLES statement to view the values of these variables.
EXCHANGE (Shuffle) configuration items
The EXCHANGE (Shuffle) configuration items mainly control memory management and data redistribution strategies during data transfer between distributed function operators (DFOs). In other words, they control memory usage during data shuffling. OceanBase Database encapsulates data transfer into a module called the Data Transfer Layer (DTL).
Configuration item |
Description |
Value range |
Default value |
Configuration suggestion |
|---|---|---|---|---|
| dtl_buffer_size | Specifies the size of the buffer used to send data between EXCHANGE operators (Transmit and Receive). Data is sent only when it reaches the upper limit of this buffer, reducing the cost of transmitting each row. |
[4k, 2M] | 64K | In the PX scenario, the size of this configuration item affects the data sent between EXCHANGE operators. Generally, this configuration item does not need to be modified. If you want to reduce the number of data sends, you can try modifying this configuration item. However, it is generally not recommended to modify this value. |
You can run the SHOW PARAMETERS statement to view the values of these configuration items. Here is an example:
obclient>SHOW PARAMETERS LIKE '%dtl%';
+-------+----------+----------------+----------+-----------------+-----------+-------+---------------+----------+---------+---------+-------------------+
| zone | svr_type | svr_ip | svr_port | name | data_type | value | info | section | scope | source | edit_level |
+-------+----------+----------------+----------+-----------------+-----------+-------+---------------+----------+---------+---------+-------------------+
| zone1 | observer | 100.XXX.XXX.XXX | 36500 | dtl_buffer_size | NULL | 64K | to be removed | OBSERVER | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |
+-------+----------+----------------+----------+-----------------+-----------+-------+---------------+----------+---------+---------+-------------------+
1 row in set
Other parallel-related parameters
Parameter |
Description |
Value range |
Default value |
Configuration suggestion |
|---|---|---|---|---|
| _enable_px_batch_rescan | Specifies whether to use BATCH RESCAN when generating a distributed PX RESCAN plan for a nested loop join (NLJ). This can improve performance. | True or False | True | Enabling this parameter improves performance but consumes more memory. |
| _bloom_filter_enabled | Specifies whether to enable the Bloom filter in a hash join. | True or False | True | This parameter is enabled by default when the parallelism is greater than 1. If the filter conditions of the hash join are not effective, enabling the Bloom filter will add extra overhead. In this scenario, you can consider disabling this feature. |
