The OceanBase Database optimizer supports two strategies for obtaining the degree of parallelism (DOP): AUTO and MANUAL. In MANUAL mode, you need to manually specify the DOP. You can use system variables to enable parallel execution for all queries in the current session and specify the DOP. However, this may cause additional parallel execution overheads for queries that do not require parallel execution or do not require a high DOP in the session, resulting in performance degradation. You can also use hints to specify the DOP for specific queries. However, in this case, you need to look into each business query separately, which is not feasible for scenarios with a large number of business queries.
In AUTO mode, the optimizer automatically enables parallel execution and determines the DOP based on the actual situation of the query. The Auto DOP feature of the OceanBase Database optimizer evaluates the execution time required for the query when a query plan is generated, and then automatically determines whether to enable parallel execution and specifies an appropriate DOP for the current query. This eliminates the necessity of manual DOP configuration and consequent performance degradation.
Enable Auto DOP
The OceanBase Database optimizer provides the parallel_degree_policy system variable to allow you to enable and disable Auto DOP at the global and session levels. You can also use hints with higher priorities to enable and disable Auto DOP at the query level. The syntaxes are as follows:
/* Enable Auto DOP at the global level. */
SET global parallel_degree_policy = AUTO;
/* Enable Auto DOP at the session level. */
SET session parallel_degree_policy = AUTO;
SET parallel_degree_policy = AUTO;
/* Disable Auto DOP at the global level. */
SET global parallel_degree_policy = MANUAL;
/* Disable Auto DOP at the session level. */
SET session parallel_degree_policy = MANUAL;
SET parallel_degree_policy = MANUAL;
/* Use a hint to enable Auto DOP at the query level. */
SELECT /*+PARALLEL(AUTO)*/ * FROM ...;
/* Use a hint to disable Auto DOP at the query level. */
SELECT /*+PARALLEL(MANUAL)*/ * FROM ...;
SELECT /*+PARALLEL(8)*/ * FROM ...;
Configure the Auto DOP strategy
In the Auto DOP strategy, each DOP obtained is affected by two system variables: parallel_degree_limit and parallel_min_scan_time_threshold.
The system variable parallel_degree_limit specifies the maximum DOP allowed when the Auto DOP strategy is used. The default value is 0, which specifies not to limit the value of an automatically obtained DOP.
The system variable parallel_min_scan_time_threshold is used to calculate a DOP in the Auto DOP strategy. It specifies the reference execution time for a base table scan when parallel execution is enabled. When the estimated execution time for a base table scan exceeds the value of this variable, parallel execution is enabled for the base table scan and an appropriate DOP is selected based on calculation upon the value of this variable. The default value is 1000 in milliseconds. By reducing the value of parallel_min_scan_time_threshold, you can lower the threshold, so that the optimizer can enable parallel execution for base table scans with less estimated execution time, and scans tables with a fixed data volume at a higher DOP if parallel execution has already been enabled for the tables.
Here are some configuration examples:
/* Set the maximum DOP for the Auto DOP strategy at the global level. */
SET GLOBAL parallel_degree_limit = 64;
/* Set the maximum DOP for the Auto DOP strategy at the session level. */
SET SESSION parallel_degree_limit = 64;
SET parallel_degree_limit = 64;
/* Set the threshold for enabling parallel execution for the Auto DOP strategy at the global level. */
SET GLOBAL parallel_min_scan_time_threshold = 100;
/* Set the threshold for enabling parallel execution for the Auto DOP strategy at the session level. */
SET SESSION parallel_min_scan_time_threshold = 100;
SET parallel_min_scan_time_threshold = 100;
Obtain a DOP by using the Auto DOP strategy
When the Auto DOP strategy is used, the optimizer uses a certain algorithm (Table Auto DOP calculation method) to determine the DOP used by a base table scan operator. During plan generation, other operators inherit the DOP of their child operators.
In the following example, certain amount of data is inserted into the tp1 and tp2 tables, the Auto DOP strategy is used, and the join order and distributed join method of the two tables are specified. For table a, the DOP automatically obtained by using the Auto DOP strategy is 4. For table b, the DOP is 8. After two tables are joined by using the HASH HASH distributed join method, the DOP of the HASH JOIN operator is obtained by inheritance. Among the DOP values of the left and right child operators, the greater one 8 is inherited.
obclient> CREATE TABLE tp1(c1 INT, c2 INT, c3 INT) PARTITION BY HASH(c1) PARTITIONS 4;
Query OK, 0 rows affected
obclient> CREATE TABLE tp2(c1 INT, c2 INT, c3 INT) PARTITION BY HASH(c1) PARTITIONS 8;
Query OK, 0 rows affected
obclient> EXPLAIN BASIC SELECT /*+LEADING(a b) USE_HASH(a b) PQ_DISTRIBUTE(b HASH HASH) */
* FROM tp1 a, tp2 b WHERE a.c1 = b.c1;
+------------------------------------------------------------------------------------------------+
| Query Plan |
+------------------------------------------------------------------------------------------------+
| =========================================== |
| |ID|OPERATOR |NAME | |
| ------------------------------------------- |
| |0 |PX COORDINATOR | | |
| |1 | EXCHANGE OUT DISTR |:EX10002| |
| |2 | HASH JOIN | | |
| |3 | EXCHANGE IN DISTR | | |
| |4 | EXCHANGE OUT DISTR (HASH)|:EX10000| |
| |5 | PX BLOCK ITERATOR | | |
| |6 | TABLE SCAN |a | |
| |7 | EXCHANGE IN DISTR | | |
| |8 | EXCHANGE OUT DISTR (HASH)|:EX10001| |
| |9 | PX BLOCK ITERATOR | | |
| |10| TABLE SCAN |b | |
| =========================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([INTERNAL_FUNCTION(a.c1, a.c2, a.c3, b.c1, b.c2, b.c3)]), filter(nil), rowset=256 |
| 1 - output([INTERNAL_FUNCTION(a.c1, a.c2, a.c3, b.c1, b.c2, b.c3)]), filter(nil), rowset=256 |
| dop=8 |
| 2 - output([a.c1], [b.c1], [a.c2], [a.c3], [b.c2], [b.c3]), filter(nil), rowset=256 |
| equal_conds([a.c1 = b.c1]), other_conds(nil) |
| 3 - output([a.c1], [a.c2], [a.c3]), filter(nil), rowset=256 |
| 4 - output([a.c1], [a.c2], [a.c3]), filter(nil), rowset=256 |
| (#keys=1, [a.c1]), dop=4 |
| 5 - output([a.c1], [a.c2], [a.c3]), filter(nil), rowset=256 |
| 6 - output([a.c1], [a.c2], [a.c3]), filter(nil), rowset=256 |
| access([a.c1], [a.c2], [a.c3]), partitions(p[0-3]) |
| is_index_back=false, is_global_index=false, |
| range_key([a.__pk_increment]), range(MIN ; MAX)always true |
| 7 - output([b.c1], [b.c2], [b.c3]), filter(nil), rowset=256 |
| 8 - output([b.c1], [b.c2], [b.c3]), filter(nil), rowset=256 |
| (#keys=1, [b.c1]), dop=8 |
| 9 - output([b.c1], [b.c2], [b.c3]), filter(nil), rowset=256 |
| 10 - output([b.c1], [b.c2], [b.c3]), filter(nil), rowset=256 |
| access([b.c1], [b.c2], [b.c3]), partitions(p[0-7]) |
| is_index_back=false, is_global_index=false, |
| range_key([B.__pk_increment]), range(MIN ; MAX)always true |
+------------------------------------------------------------------------------------------------+