Run and disable a parallel query|V4.2.0| docs|Distributed Database

Run and disable a parallel query

Last Updated：2023-10-31 11:17:11 Updated

This topic uses examples to describe how to run and disable a parallel query on a partitioned table, on a non-partitioned table, or across multiple tables. It also describes system views about parallel execution.

Run a parallel query on a partitioned table

For a partitioned table that has more than one partition to be queried, the system automatically runs a parallel query. By default, the degree of parallelism (DOP) for the query is 1.

In the following example, a partitioned table ptable is created, and a full table scan is performed on the entire table. You can run the EXPLAIN command to view the execution plan generated by the optimizer. The execution plan shows that the default dop for a parallel query is 1 for a partitioned table. Assume that your OceanBase cluster has three OBServer nodes, and the table ptable has 16 partitions scattered in these three OBServer nodes. Each OBServer will start a worker thread to scan data in these partitions, which means a total of three worker threads are started.

obclient> CREATE TABLE PTABLE(c1 INT , c2 INT) PARTITION BY HASH(c1) PARTITIONS 16;
Query OK, 0 rows affected

obclient> EXPLAIN SELECT * FROM ptable\G
Query Plan: ======================================================
|ID|OPERATOR               |NAME    |EST. ROWS|COST  |
------------------------------------------------------
|0 |PX COORDINATOR         |        |1600000  |737992|
|1 | EXCHANGE OUT DISTR    |:EX10000|1600000  |618888|
|2 |  PX PARTITION ITERATOR|        |1600000  |618888|
|3 |   TABLE SCAN          |ptable  |1600000  |618888|
======================================================

Outputs & filters:
-------------------------------------
  0 - output([INTERNAL_FUNCTION(ptable.c1, ptable.c2)]), filter(nil)
  1 - output([INTERNAL_FUNCTION(ptable.c1, ptable.c2)]), filter(nil), dop=1
  2 - output([ptable.c1], [ptable.c2]), filter(nil)
  3 - output([ptable.c1], [ptable.c2]), filter(nil),
      access([ptable.c1], [ptable.c2]), partitions(p[0-15])

To query a partitioned table, you can add the PARALLEL hint to the query to set the dop to a number greater than or equal to 2 to manually run a parallel query. Then, you can run the EXPLAIN command to view the execution plan generated by the optimizer.

obclient> EXPLAIN SELECT /*+ PARALLEL(8) */ * FROM ptable\G
Query Plan: ==================================================
|ID|OPERATOR           |NAME    |EST. ROWS|COST  |
--------------------------------------------------
|0 |PX COORDINATOR     |        |1600000  |737992|
|1 | EXCHANGE OUT DISTR|:EX10000|1600000  |618888|
|2 |  PX BLOCK ITERATOR|        |1600000  |618888|
|3 |   TABLE SCAN      |ptable  |1600000  |618888|
==================================================

Outputs & filters:
-------------------------------------
  0 - output([INTERNAL_FUNCTION(ptable.c1, ptable.c2)]), filter(nil)
  1 - output([INTERNAL_FUNCTION(ptable.c1, ptable.c2)]), filter(nil), dop=8
  2 - output([ptable.c1], [ptable.c2]), filter(nil)
  3 - output([ptable.c1], [ptable.c2]), filter(nil),
      access([ptable.c1], [ptable.c2]), partitions(p[0-15])

In this execution plan, the dop value for the parallel query is 8. If the number of OBServer nodes where the queried partition is located is less than or equal to the dop value, the worker threads are assigned to the queried OBServer nodes based on the specified rule. In this case, the total number of worker threads is equal to the dop. If the number of queried OBServer nodes is greater than the dop value, each OBServer starts at least one worker thread. In this case, the total number of worker threads is greater than the dop value.

For example, if dop = 8 and 16 partitions are evenly distributed on four OBServer nodes, each OBServer starts two worker threads to scan the corresponding partitions. In this case, a total of eight worker threads are started. If 16 partitions are evenly distributed on 16 OBServer nodes, one partition for an OBServer, each OBServer starts one worker thread to scan the corresponding partition. In this case, a total of 16 worker threads are started.

To query a partitioned table that has zero or one partition to be queried, the system does not run a parallel query. The following example adds a filter condition c1 = 1 to the query on ptable.

obclient> EXPLAIN SELECT * FROM ptable WHERE c1 = 1\G
*************************** 1. row ***************************
Query Plan:
======================================
|ID|OPERATOR  |NAME  |EST. ROWS|COST |
--------------------------------------
|0 |TABLE SCAN|ptable|990      |85222|
======================================

Outputs & filters:
-------------------------------------
  0 - output([ptable.c1], [ptable.c2]), filter([ptable.c1 = 1]),
      access([ptable.c1], [ptable.c2]), partitions(p1)

The plan indicates that only one partition needs to be queried. Therefore, the system did not run a parallel query. To run a parallel query on a single partition, you can only add the PARALLEL hint to the query. Then, you can run the EXPLAIN command to view the execution plan generated by the optimizer.

obclient> EXPLAIN SELECT /*+ PARALLEL(8) */ * FROM ptable WHERE c1 = 1\G
Query Plan: ================================================
|ID|OPERATOR           |NAME    |EST. ROWS|COST|
------------------------------------------------
|0 |PX COORDINATOR     |        |990      |457 |
|1 | EXCHANGE OUT DISTR|:EX10000|990      |383 |
|2 |  PX BLOCK ITERATOR|        |990      |383 |
|3 |   TABLE SCAN      |ptable  |990      |383 |
================================================

Outputs & filters:
-------------------------------------
  0 - output([INTERNAL_FUNCTION(ptable.c1, ptable.c2)]), filter(nil)
  1 - output([INTERNAL_FUNCTION(ptable.c1, ptable.c2)]), filter(nil), dop=8
  2 - output([ptable.c1], [ptable.c2]), filter(nil)
  3 - output([ptable.c1], [ptable.c2]), filter(nil),
      access([ptable.c1], [ptable.c2]), partitions(p1)

Note

To run a query in parallel on a single partition by specifying the PARALLEL hint in the query, you must set the DOP of the query to a value greater than or equal to two. If the DOP value of a query is null or less than two, the system does run a parallel query.

Run a parallel query on a non-partitioned table

A non-partitioned table is essentially a partitioned table with one partition. Therefore, you can run a parallel query on a non-partitioned table only by adding the PARALLEL hint to the query.

In the following example, a non-partitioned table stable is created, and a full table scan is performed on the stable table. You can run the EXPLAIN command to view the execution plan generated by the optimizer.

obclient> CREATE TABLE stable(c1 INT, c2 INT);
Query OK, 0 rows affected

obclient> EXPLAIN SELECT * FROM stable\G
*************************** 1. row ***************************
Query Plan:
======================================
|ID|OPERATOR  |NAME  |EST. ROWS|COST |
--------------------------------------
|0 |TABLE SCAN|stable|100000   |68478|
======================================

Outputs & filters:
-------------------------------------
  0 - output([stable.c1], [stable.c2]), filter(nil),
      access([stable.c1], [stable.c2]), partitions(p0)

The execution plan shows that the system does not run a parallel query on a non-partitioned table if no hint is added to the query.

You can add the PARALLEL hint to a query and set its dop value to a value greater than or equal to 2 to run a parallel query on the non-partitioned table. Then, you can run the EXPLAIN command to view the execution plan generated by the optimizer.

obclient> EXPLAIN SELECT /*+ PARALLEL(4)*/  * FROM stable\G
Query Plan: =================================================
|ID|OPERATOR           |NAME    |EST. ROWS|COST |
-------------------------------------------------
|0 |PX COORDINATOR     |        |100000   |46125|
|1 | EXCHANGE OUT DISTR|:EX10000|100000   |38681|
|2 |  PX BLOCK ITERATOR|        |100000   |38681|
|3 |   TABLE SCAN      |stable  |100000   |38681|
=================================================

Outputs & filters:
-------------------------------------
  0 - output([INTERNAL_FUNCTION(stable.c1, stable.c2)]), filter(nil)
  1 - output([INTERNAL_FUNCTION(stable.c1, stable.c2)]), filter(nil), dop=4
  2 - output([stable.c1], [stable.c2]), filter(nil)
  3 - output([stable.c1], [stable.c2]), filter(nil),
      access([stable.c1], [stable.c2]), partitions(p0)

Run a parallel query on multiple tables

Multi-table JOIN queries are most frequently used. If the number of partitions in each table exceeds 1, a parallel query is run on each table.

In the following example, partitioned tables p1table and p2table are created.

obclient> CREATE TABLE p1table(c1 INT ,c2 INT) PARTITION BY HASH(c1) PARTITIONS 2;
Query OK, 0 rows affected

obclient> CREATE TABLE p2table(c1 INT ,c2 INT) PARTITION BY HASH(c1) PARTITIONS 4;
Query OK, 0 rows affected

Use the join condition p1table.c1=p2table.c2 to join the result sets for a query on p1table and p2table. The following example shows the execution plan for the query:

*************************** 1. row ***************************
Query Plan: ===============================================================
|ID|OPERATOR                     |NAME    |EST. ROWS|COST     |
---------------------------------------------------------------
|0 |PX COORDINATOR               |        |784080000|295962589|
|1 | EXCHANGE OUT DISTR          |:EX10001|784080000|179228805|
|2 |  HASH JOIN                  |        |784080000|179228805|
|3 |   PX PARTITION ITERATOR     |        |200000   |77361    |
|4 |    TABLE SCAN               |p1      |200000   |77361    |
|5 |   EXCHANGE IN DISTR         |        |400000   |184498   |
|6 |    EXCHANGE OUT DISTR (PKEY)|:EX10000|400000   |154722   |
|7 |     PX PARTITION ITERATOR   |        |400000   |154722   |
|8 |      TABLE SCAN             |p2      |400000   |154722   |
===============================================================

Outputs & filters:
-------------------------------------
  0 - output([INTERNAL_FUNCTION(p1.c1, p1.c2, p2.c1, p2.c2)]), filter(nil)
  1 - output([INTERNAL_FUNCTION(p1.c1, p1.c2, p2.c1, p2.c2)]), filter(nil), dop=1
  2 - output([p1.c1], [p1.c2], [p2.c1], [p2.c2]), filter(nil),
      equal_conds([p1.c1 = p2.c2]), other_conds(nil)
  3 - output([p1.c1], [p1.c2]), filter(nil)
  4 - output([p1.c1], [p1.c2]), filter(nil),
      access([p1.c1], [p1.c2]), partitions(p[0-1])
  5 - output([p2.c1], [p2.c2]), filter(nil)
  6 - (#keys=1, [p2.c2]), output([p2.c1], [p2.c2]), filter(nil), dop=1
  7 - output([p2.c1], [p2.c2]), filter(nil)
  8 - output([p2.c1], [p2.c2]), filter(nil),
      access([p2.c1], [p2.c2]), partitions(p[0-3])

By default, the system runs the query in parallel on both p1table and p2table because they both have more than one partition to be queried. The default value of dop is 1. Likewise, you can change the DOP of a query by adding the PARALLEL hint to it.

In the following example, the JOIN condition is changed to p1table.c1=p2table.c2 and p2table.c1=1 so that only a single partition is selected for p2table. The following example shows the execution plan for the query:

obclient> EXPLAIN SELECT  * FROM p1table p1 JOIN p2table p2 ON p1.c1=p2.c2 AND p2.c1=1\G
*************************** 1. row ***************************
Query Plan: ==============================================================
|ID|OPERATOR                      |NAME    |EST. ROWS|COST   |
--------------------------------------------------------------
|0 |PX COORDINATOR                |        |1940598  |1394266|
|1 | EXCHANGE OUT DISTR           |:EX10001|1940598  |1105349|
|2 |  MERGE JOIN                  |        |1940598  |1105349|
|3 |   SORT                       |        |200000   |657776 |
|4 |    PX PARTITION ITERATOR     |        |200000   |77361  |
|5 |     TABLE SCAN               |p1      |200000   |77361  |
|6 |   SORT                       |        |990      |2092   |
|7 |    EXCHANGE IN DISTR         |        |990      |457    |
|8 |     EXCHANGE OUT DISTR (PKEY)|:EX10000|990      |383    |
|9 |      TABLE SCAN              |p2      |990      |383    |
==============================================================

Outputs & filters:
-------------------------------------
  0 - output([INTERNAL_FUNCTION(p1.c1, p1.c2, p2.c1, p2.c2)]), filter(nil)
  1 - output([INTERNAL_FUNCTION(p1.c1, p1.c2, p2.c1, p2.c2)]), filter(nil), dop=1
  2 - output([p1.c1], [p1.c2], [p2.c1], [p2.c2]), filter(nil),
      equal_conds([p1.c1 = p2.c2]), other_conds(nil)
  3 - output([p1.c1], [p1.c2]), filter(nil), sort_keys([p1.c1, ASC]), local merge sort
  4 - output([p1.c1], [p1.c2]), filter(nil)
  5 - output([p1.c1], [p1.c2]), filter(nil),
      access([p1.c1], [p1.c2]), partitions(p[0-1])
  6 - output([p2.c1], [p2.c2]), filter(nil), sort_keys([p2.c2, ASC])
  7 - output([p2.c1], [p2.c2]), filter(nil)
  8 - (#keys=1, [p2.c2]), output([p2.c1], [p2.c2]), filter(nil), is_single, dop=1
  9 - output([p2.c1], [p2.c2]), filter(nil),
      access([p2.c1], [p2.c2]), partitions(p1)

1 row in set

In this plan, p2table has only one partition be scanned, and p1table has two. By default, a parallel query is run on p1table and not on p2table. Likewise, you can change the DOP of a query by adding the PARALLEL hint to it. Then, you can run a query in parallel on a single partition of p2table.

System views for parallel execution

OceanBase Database provides the system views GV$OB_SQL_AUDIT and V$OB_SQL_AUDIT for you to view the status of parallel execution and the statistics.

GV$OB_SQL_AUDIT V$OB_SQL_AUDIT each contain many fields. The qc_id, dfo_id, sqc_id, and worker_id fields are related to parallel execution.

For more information, see (G)V$OB_SQL_AUDIT.