Hints related to parallel execution |V2.2.77|OceanBase Database| docs|Distributed Database

Hints related to parallel execution

Last Updated：2023-08-18 09:26:34 Updated

PARALLEL hint

The PARALLEL hint is a statement-level hint that indicates the optimizer to specify the number of parallel threads that can be used for a parallel operation. It overrides the value of PARALLEL_DEGREE_ POLICY. You can use this hint in the SELECT, INSERT, MERGE, UPDATE, and DELETE portions of a statement, as well as the table scan portion. The PARALLEL hint is ignored if any parallel limit is violated.

Syntax for the PARALLEL hint:

/*+ PARALLEL(integer) */

Notice

If the statement includes SORT BY or GROUP BY clauses, the number of available threads can be twice the value specified in the PARALLEL hint.

The value of the integer parameter in the PARALLE hint specifies the degree of parallelism (DOP).

In the following example, the specified DOP is 5:

SELECT /*+ PARALLEL(5) */ last_name
FROM employees;

USE_PX Hint

The USE_PX hint forces the server to execute SQL statements in parallel execution (PX) mode, which allows multithreading in statement execution. USE_PX and PARALLEL hints are often used in pair.

Syntax for the USE_PX hint:

/*+ USE_PX  */

Example:

SELECT /*+ USE_PX PARALLEL(4)*/ e.department_id, sum(e.salary)
FROM employees e
WHERE e.department_id = 1001;
GROUP BY e.department_id;

NO_USE_PX Hint

The NO_USE_PX hint forces the server not to execute SQL statements in PX mode.

Syntax for the NO_USE_PX hint:

/*+ NO_USE_PX  */

Example:

SELECT /*+ NO_USE_PX*/ e.department_id, sum(e.salary)
FROM employees e
WHERE e.department_id = 1001;
GROUP BY e.department_id;

PQ_DISTRIBUTE Hint

The PQ_DISTRIBUTE hint instructs the optimizer how to distribute rows between the producer server and the consumer server. The producer server is also known as the query server, and the consumer server is also known as the load server. This hint allows you to control the row distribution in a join or load operation. Syntax for the PQ_DISTRIBUTE hint:

/*+ PQ_DISTRIBUTE
  ( [ @ queryblock ] tablespec 
    { distribution | outer_distribution inner_distribution }
  ) */

Control row distribution for the load server

You can control the row distribution for the parallel INSERT ... SELECT statements and the parallel CREATE TABLE ... AS SELECT statements, to determine how to distribute rows between the producer server and the consumer server. Use the distribution syntax to specify a distribution method. The following table describes the distribution methods and their semantics.

Distribution method	Description
NONE	No distribution is performed. The query and load operations are combined into each query server. Each server loads all partitions. The lack of distribution avoids the overheads of row distribution when no skew occurs. Skew may occur when empty segments exist or a predicate in the statement filters out all the rows evaluated by the query. If skew occurs due to this method, use `RANDOM` or `RANDOM_ LOCAL` distribution instead. Notice Excercise caution when you use this distribution method. Each process needs to load at least 512 KB of Program Global Area (PGA) memory. If compression is applied, each server consumes about 1.5 MB of PGA memory.
PARTITION	This method uses the partitioning information of `tablespec` to distribute the rows from the query server to the consumer server. We recommend that you use this distribution method when the combination of the query and load opeations is impossible or unwanted and when the number of partitions being loaded is greater than or equal to the number of load servers and the input data will be evenly distributed across the partitions being loaded, which means row skew will not occur.
RANDOM	You can use this method to cyclically distribute rows from the producers to the consumers. We recommend that you use this distribution method when the input data is highly skewed.
RANDOM_LOCAL	You can use this method to distribute rows from the producers to a group of servers that maintain a given group of partitions. A partition can be loaded by two or more servers, but no server will load all partitions. We recommend that you use this distribution method when the input data is skewed and combining query and load operations is impossible due to insufficient memory.

In the following example, the query and load portions of a direct-path insert operation are combined into each query server:

INSERT /*+ APPEND PARALLEL(target_table, 16) PQ_DISTRIBUTE(target_table, NONE) */
INTO target_table
SELECT * FROM source_table;

In the following example, a table is created and the optimizer distribute the rows by using the partitioning of target_table :

CREATE /*+ PQ_DISTRIBUTE(target_table, PARTITION) */ TABLE target_table
NOLOGGING PARALLEL 16
PARTITION BY HASH (l_orderkey) PARTITIONS 512
AS SELECT * FROM source_table;

Control row distribution for joins

You can specify two distribution methods, one for the outer table and the other for the inner table.

outside_distribution specifies the distribution method for the outer table.
inner_distribution specifies the distribution method for the inner table.

The distribution methods include HASH, BROADCAST, PARTITION, and NONE. Only the six combinations of distribution methods in the following table are valid.

Distribution method	Description
HASH, HASH	A hash function on the join keys is used to map rows of each table to query servers. After the mapping is complete, each query server performs the join operation between a pair of resulting partitions. We recommend that you use this distribution method when the tables are comparable in size and the join operation is implemented by using HASH JOIN or SORT MERGE JOIN.
BROADCAST, NONE	All rows of the outer table are broadcast to each query server. Rows of the inner table are randomly partitioned. We recommend that you use this distribution method when the size of the outer table is far smaller than that of the inner table, or when the size of the inner table multiplied by the number of query servers is greater than the size of the outer table.
NONE, BROADCAST	All rows of the inner table are broadcast to each consumer server. Rows of the outer table are randomly partitioned. We recommend that you use this distribution method when the size of the inner table is far smaller than that of the outer table, or when the size of the inner table multiplied by the number of query servers is smaller than that of the outer table.
PARTITION, NONE	The rows of the outer table are mapped by using the partitioning of the inner table. The inner table must be partitioned on the join keys. We recommend that you use this distribution method when the number of partitions of the outer table equals or almost equals a multiple of the number of query servers, for example, 14 partitions and 15 query servers. Notice If the inner table is not partitioned or not equally joined on the partitioning key, the optimizer ignores this hint.
NONE, PARTITION	The rows of the inner table are mapped by using the partitioning of the outer table. The outer table must be partitioned on the join keys. We recommend that you use this distribution method when the number of partitions of the outer table equals or almost equals a multiple of the number of query servers, for example, 14 partitions and 15 query servers. Notice If the outer table is not partitioned or not equally joined on the partitioning key, the optimizer ignores this hint.
NONE, NONE	Each query server performs the join operation between a pair of matching partitions, one from each table. Both tables must be equally partitioned on the join keys.

In the following example, Tables r and s are joined by using a hash join, and the query contains a hint to use hash distribution:

SELECT /*+ORDERED PQ_DISTRIBUTE(s HASH, HASH) USE_HASH (s)*/ column_list
FROM r,s
WHERE r.c=s.c;

To broadcast the outer table r , use the following query statement:

SELECT /*+ORDERED PQ_DISTRIBUTE(s BROADCAST, NONE) USE_HASH (s) */ column_list
FROM r,s
WHERE r.c=s.c;