Run and disable a parallel query |V3.2.4|OceanBase Database| docs|Distributed Database

Run and disable a parallel query

Last Updated：2023-10-24 09:23:04 Updated

This topic uses examples to describe how to run and disable a parallel query on a partitioned table, on a non-partitioned table, or across multiple tables. It also describes system views about parallel execution.

Run a parallel query on a partitioned table

For a partitioned table that has more than one partition to be queried, the system automatically runs a parallel query. By default, the degree of parallelism (DOP) for the query is 1.

In the following example, a partitioned table named ptable is created and a full table scan is performed on the entire table. You can run the EXPLAIN command to view the execution plan generated by the optimizer. The execution plan shows that the default DOP for a parallel query is 1 for a partitioned table. Assume that your OceanBase cluster has 3 OBServers, and the table ptable has 16 partitions scattered in these 3 OBServers. Each OBServer will start a worker thread to scan data in these partitions, which means a total of 3 worker threads are started.

obclient>CREATE TABLE PTABLE(c1 INT , c2 INT) PARTITION BY HASH(c1) PARTITIONS 16;
Query OK, 0 rows affected 

obclient>EXPLAIN SELECT * FROM ptable\G;
Query Plan: ======================================================
|ID|OPERATOR               |NAME    |EST. ROWS|COST  |
------------------------------------------------------
|0 |PX COORDINATOR         |        |1600000  |737992|
|1 | EXCHANGE OUT DISTR    |:EX10000|1600000  |618888|
|2 |  PX PARTITION ITERATOR|        |1600000  |618888|
|3 |   TABLE SCAN          |ptable  |1600000  |618888|
======================================================

Outputs & filters:
-------------------------------------
  0 - output([INTERNAL_FUNCTION(ptable.c1, ptable.c2)]), filter(nil)
  1 - output([INTERNAL_FUNCTION(ptable.c1, ptable.c2)]), filter(nil), dop=1
  2 - output([ptable.c1], [ptable.c2]), filter(nil)
  3 - output([ptable.c1], [ptable.c2]), filter(nil),
      access([ptable.c1], [ptable.c2]), partitions(p[0-15])

To query a partitioned table, you can add the PARALLEL hint to the query to set the DOP to a number greater than or equal to 2 to manually run a parallel query. Then, you can run the EXPLAIN command to view the execution plan generated by the optimizer.

obclient>EXPLAIN SELECT /*+ PARALLEL(8) */ * FROM ptable\G;
Query Plan: ==================================================
|ID|OPERATOR           |NAME    |EST. ROWS|COST  |
--------------------------------------------------
|0 |PX COORDINATOR     |        |1600000  |737992|
|1 | EXCHANGE OUT DISTR|:EX10000|1600000  |618888|
|2 |  PX BLOCK ITERATOR|        |1600000  |618888|
|3 |   TABLE SCAN      |ptable  |1600000  |618888|
==================================================

Outputs & filters:
-------------------------------------
  0 - output([INTERNAL_FUNCTION(ptable.c1, ptable.c2)]), filter(nil)
  1 - output([INTERNAL_FUNCTION(ptable.c1, ptable.c2)]), filter(nil), dop=8
  2 - output([ptable.c1], [ptable.c2]), filter(nil)
  3 - output([ptable.c1], [ptable.c2]), filter(nil),
      access([ptable.c1], [ptable.c2]), partitions(p[0-15])

In this execution plan, the DOP for parallel query is 8. If the number of OBServers where the queried partition is located is less than or equal to the DOP, the worker threads are assigned to the queried OBServers based on the specified rule. In this case, the total number of the worker threads is equal to the DOP. If the number of queried OBServers is greater than the DOP, each OBServer starts at least one worker thread. In this case, the total number of worker threads is greater than the DOP.

For example, if DOP = 8 and 16 partitions are evenly distributed on 4 OBServers, each OBServer starts 2 worker threads to scan the corresponding partitions. In this case, a total of 8 worker threads are started. If 16 partitions are evenly distributed on 16 OBServers, one partition for an OBServer, each OBServer starts one worker thread to scan the corresponding partition. In this case, a total of 16 worker threads are started.

To query a partitioned table that has zero or one partition to be queried, the system does not run a parallel query. The following example adds a filter condition c1 = 1 to the query on ptable.

obclient>EXPLAIN SELECT * FROM ptable WHERE c1 = 1\G;
*************************** 1. row ***************************
Query Plan:
======================================
|ID|OPERATOR  |NAME  |EST. ROWS|COST |
--------------------------------------
|0 |TABLE SCAN|ptable|990      |85222|
======================================

Outputs & filters:
-------------------------------------
  0 - output([ptable.c1], [ptable.c2]), filter([ptable.c1 = 1]),
      access([ptable.c1], [ptable.c2]), partitions(p1)

The plan indicates that only one partition needs to be queried. Therefore, the system did not run a parallel query. To run a parallel query on a single partition, you can only add the PARALLEL hint to the query. Then, you can run the EXPLAIN command to view the execution plan generated by the optimizer.

obclient>EXPLAIN SELECT /*+ PARALLEL(8) */ * FROM ptable WHERE c1 = 1\G;
Query Plan: ================================================
|ID|OPERATOR           |NAME    |EST. ROWS|COST|
------------------------------------------------
|0 |PX COORDINATOR     |        |990      |457 |
|1 | EXCHANGE OUT DISTR|:EX10000|990      |383 |
|2 |  PX BLOCK ITERATOR|        |990      |383 |
|3 |   TABLE SCAN      |ptable  |990      |383 |
================================================

Outputs & filters:
-------------------------------------
  0 - output([INTERNAL_FUNCTION(ptable.c1, ptable.c2)]), filter(nil)
  1 - output([INTERNAL_FUNCTION(ptable.c1, ptable.c2)]), filter(nil), dop=8
  2 - output([ptable.c1], [ptable.c2]), filter(nil)
  3 - output([ptable.c1], [ptable.c2]), filter(nil),
      access([ptable.c1], [ptable.c2]), partitions(p1)

Note

To run a query in parallel on a single partition by specifying the PARALLEL hint in the query, you must set the DOP of the query to a value greater than or equal to two. If the DOP value of a query is null or less than two, the system does run a parallel query.

Run a parallel query on a non-partitioned table

A non-partitioned table is essentially a partitioned table with one partition. Therefore, you can run a parallel query on a non-partitioned table only by adding the PARALLEL hint to the query.

In the following example, a non-partitioned table stable is created and a full table scan is performed on the stable table. You can run the EXPLAIN command to view the execution plan generated by the optimizer.

obclient>CREATE TABLE stable(c1 INT, c2 INT);
Query OK, 0 rows affected 

obclient>EXPLAIN SELECT * FROM stable\G;
*************************** 1. row ***************************
Query Plan:
======================================
|ID|OPERATOR  |NAME  |EST. ROWS|COST |
--------------------------------------
|0 |TABLE SCAN|stable|100000   |68478|
======================================

Outputs & filters:
-------------------------------------
  0 - output([stable.c1], [stable.c2]), filter(nil),
      access([stable.c1], [stable.c2]), partitions(p0)

The execution plan shows that the system does not run a parallel query on a non-partitioned table if no hint is added to the query.

You can add the PARALLEL hint to a query and set its DOP to a value greater than or equal to two to run a parallel query on a non-partitioned table. Then, you can run the EXPLAIN command to view the execution plan generated by the optimizer.

obclient>EXPLAIN SELECT /*+ PARALLEL(4)*/  * FROM stable\G;
Query Plan: =================================================
|ID|OPERATOR           |NAME    |EST. ROWS|COST |
-------------------------------------------------
|0 |PX COORDINATOR     |        |100000   |46125|
|1 | EXCHANGE OUT DISTR|:EX10000|100000   |38681|
|2 |  PX BLOCK ITERATOR|        |100000   |38681|
|3 |   TABLE SCAN      |stable  |100000   |38681|
=================================================

Outputs & filters:
-------------------------------------
  0 - output([INTERNAL_FUNCTION(stable.c1, stable.c2)]), filter(nil)
  1 - output([INTERNAL_FUNCTION(stable.c1, stable.c2)]), filter(nil), dop=4
  2 - output([stable.c1], [stable.c2]), filter(nil)
  3 - output([stable.c1], [stable.c2]), filter(nil),
      access([stable.c1], [stable.c2]), partitions(p0)

Run a parallel query on multiple tables

Multi-table JOIN queries are most frequently used. If the number of partitions in each table exceeds 1, a parallel query is run on each table.

In the following example, partitioned tables p1table and p2table are created.

obclient>CREATE TABLE p1table(c1 INT ,c2 INT) PARTITION BY HASH(c1) PARTITIONS 2;
Query OK, 0 rows affected 

obclient>CREATE TABLE p2table(c1 INT ,c2 INT) PARTITION BY HASH(c1) PARTITIONS 4;
Query OK, 0 rows affected

Use the join condition p1table.c1=p2table.c2 to join the result sets for a query on p1table and p2table. The following example shows the execution plan for the query:

*************************** 1. row ***************************
Query Plan: ===============================================================
|ID|OPERATOR                     |NAME    |EST. ROWS|COST     |
---------------------------------------------------------------
|0 |PX COORDINATOR               |        |784080000|295962589|
|1 | EXCHANGE OUT DISTR          |:EX10001|784080000|179228805|
|2 |  HASH JOIN                  |        |784080000|179228805|
|3 |   PX PARTITION ITERATOR     |        |200000   |77361    |
|4 |    TABLE SCAN               |p1      |200000   |77361    |
|5 |   EXCHANGE IN DISTR         |        |400000   |184498   |
|6 |    EXCHANGE OUT DISTR (PKEY)|:EX10000|400000   |154722   |
|7 |     PX PARTITION ITERATOR   |        |400000   |154722   |
|8 |      TABLE SCAN             |p2      |400000   |154722   |
===============================================================

Outputs & filters:
-------------------------------------
  0 - output([INTERNAL_FUNCTION(p1.c1, p1.c2, p2.c1, p2.c2)]), filter(nil)
  1 - output([INTERNAL_FUNCTION(p1.c1, p1.c2, p2.c1, p2.c2)]), filter(nil), dop=1
  2 - output([p1.c1], [p1.c2], [p2.c1], [p2.c2]), filter(nil),
      equal_conds([p1.c1 = p2.c2]), other_conds(nil)
  3 - output([p1.c1], [p1.c2]), filter(nil)
  4 - output([p1.c1], [p1.c2]), filter(nil),
      access([p1.c1], [p1.c2]), partitions(p[0-1])
  5 - output([p2.c1], [p2.c2]), filter(nil)
  6 - (#keys=1, [p2.c2]), output([p2.c1], [p2.c2]), filter(nil), dop=1
  7 - output([p2.c1], [p2.c2]), filter(nil)
  8 - output([p2.c1], [p2.c2]), filter(nil),
      access([p2.c1], [p2.c2]), partitions(p[0-3])

By default, the system runs the query in parallel on both p1table and p2table because they both have more than one partition to be queried. The default value of DOP is 1. Likewise, you can change the DOP of a query by adding the PARALLEL hint to it.

In the following example, the JOIN condition is changed to p1table.c1=p2table.c2 and p2table.c1=1, so that only a single partition is selected for p2table. The following example shows the execution plan for the query:

obclient>EXPLAIN SELECT  * FROM p1table p1 JOIN p2table p2 ON p1.c1=p2.c2 AND p2.c1=1\G;
*************************** 1. row ***************************
Query Plan: ==============================================================
|ID|OPERATOR                      |NAME    |EST. ROWS|COST   |
--------------------------------------------------------------
|0 |PX COORDINATOR                |        |1940598  |1394266|
|1 | EXCHANGE OUT DISTR           |:EX10001|1940598  |1105349|
|2 |  MERGE JOIN                  |        |1940598  |1105349|
|3 |   SORT                       |        |200000   |657776 |
|4 |    PX PARTITION ITERATOR     |        |200000   |77361  |
|5 |     TABLE SCAN               |p1      |200000   |77361  |
|6 |   SORT                       |        |990      |2092   |
|7 |    EXCHANGE IN DISTR         |        |990      |457    |
|8 |     EXCHANGE OUT DISTR (PKEY)|:EX10000|990      |383    |
|9 |      TABLE SCAN              |p2      |990      |383    |
==============================================================

Outputs & filters:
-------------------------------------
  0 - output([INTERNAL_FUNCTION(p1.c1, p1.c2, p2.c1, p2.c2)]), filter(nil)
  1 - output([INTERNAL_FUNCTION(p1.c1, p1.c2, p2.c1, p2.c2)]), filter(nil), dop=1
  2 - output([p1.c1], [p1.c2], [p2.c1], [p2.c2]), filter(nil),
      equal_conds([p1.c1 = p2.c2]), other_conds(nil)
  3 - output([p1.c1], [p1.c2]), filter(nil), sort_keys([p1.c1, ASC]), local merge sort
  4 - output([p1.c1], [p1.c2]), filter(nil)
  5 - output([p1.c1], [p1.c2]), filter(nil),
      access([p1.c1], [p1.c2]), partitions(p[0-1])
  6 - output([p2.c1], [p2.c2]), filter(nil), sort_keys([p2.c2, ASC])
  7 - output([p2.c1], [p2.c2]), filter(nil)
  8 - (#keys=1, [p2.c2]), output([p2.c1], [p2.c2]), filter(nil), is_single, dop=1
  9 - output([p2.c1], [p2.c2]), filter(nil),
      access([p2.c1], [p2.c2]), partitions(p1)

1 row in set

In this plan, p2table only has one partition be scanned and p1table has two. By default, a parallel query is run on p1table and not on p2table. Likewise, you can change the DOP of a query by adding the PARALLEL hint to it. Then, you can run a query in parallel on a single partition of p2table.

Disable a parallel query

The system automatically runs a parallel query on a partitioned table that has more than one partition to be queried. If you do not want to run the parallel query, add the hint /*+ NO_USE_PX */ to the query to disable the parallel query.

For example, you can add the hint /*+ NO_USE_PX */ to a parallel query on ptable to disable the parallel query. The execution plan generated by the optimizer shows that no parallel query is run in the ptable scan.

obclient>EXPLAIN SELECT /*+ NO_USE_PX */ * FROM ptable\G;
*************************** 1. row ***************************
Query Plan: ================================================
|ID|OPERATOR           |NAME  |EST. ROWS|COST  |
------------------------------------------------
|0 |EXCHANGE IN DISTR  |      |1600000  |737992|
|1 | EXCHANGE OUT DISTR|      |1600000  |618888|
|2 |  TABLE SCAN       |ptable|1600000  |618888|
================================================

Outputs & filters:
-------------------------------------
  0 - output([INTERNAL_FUNCTION(ptable.c1, ptable.c2)]), filter(nil)
  1 - output([INTERNAL_FUNCTION(ptable.c1, ptable.c2)]), filter(nil)
  2 - output([ptable.c1], [ptable.c2]), filter(nil),
      access([ptable.c1], [ptable.c2]), partitions(p[0-15])

1 row in set

System views for parallel execution

OceanBase Database provides system views gv$sql_audit and v$sql_audit to demonstrate the status and some statistics for parallel execution.

The gv$sql_audit and v$sql_audit views have many fields, among which qc_id, dfo_id, sqc_id, and worker_id are related to parallel execution.

For more information, see SQL Tuning Guide.