JOIN|V4.3.1| docs|Distributed Database

JOIN

Last Updated：2024-06-28 05:30:31 Updated

JOIN operators join data in two tables based on specified conditions.

Joins are classified into three types: inner join, outer join, and semi/anti join.

OceanBase Database supports the following JOIN operators: NESTED-LOOP JOIN, MERGE JOIN, and HASH JOIN.

NESTED-LOOP JOIN

In the following example, queries Q1 and Q2 use NESTED-LOOP JOIN based on the hint. Operator 0 is a NESTED-LOOP JOIN operator and has two subnodes: operators 1 and 2. The execution logic of this operator is as follows:

Run Operator 1 to read a row.
Run Operator 2 to read all rows.
Join the result sets of the operators 1 and 2, and apply the filter condition to export the results.
Repeat Step 1 until Operator 1 stops iteration.

obclient> CREATE TABLE t1 (c1 INT, c2 INT);
Query OK, 0 rows affected

obclient> CREATE TABLE t2 (d1 INT, d2 INT, PRIMARY KEY (d1));
Query OK, 0 rows affected

Q1:
obclient> EXPLAIN SELECT /*+USE_NL(t1, t2)*/ t1.c2 + t2.d2 FROM t1, t2 WHERE c2 = d2;

Query Plan:
============================================
|ID|OPERATOR         |NAME|EST. ROWS|COST  |
-------------------------------------------
|0 |NESTED-LOOP JOIN |    |9782     |411238|
|1 | TABLE FULL SCAN |T1  |999      |647   |
|2 | MATERIAL        |    |999      |1519  |
|3 |  TABLE FULL SCAN|T2  |999      |647   |
============================================
Outputs & filters:
-------------------------------------
  0 - output([T1.C2 + T2.D2]), filter(nil),
      conds([T1.C2 = T2.D2]), nl_params_(nil)
  1 - output([T1.C2]), filter(nil),
      access([T1.C2]), partitions(p0)
  2 - output([T2.D2]), filter(nil)
  3 - output([T2.D2]), filter(nil),
      access([T2.D2]), partitions(p0)

The MATERIAL operator materializes the data output of subsequent operators. For more information, see MATERIAL.

Q2:
obclient> EXPLAIN SELECT /*+USE_NL(t1, t2)*/ t1.c2 + t2.d2 FROM t1, t2 WHERE c1 = d1;

Query Plan:
| ==========================================
|ID|OPERATOR        |NAME|EST. ROWS|COST |
------------------------------------------
|0 |NESTED-LOOP JOIN|    |990      |37346|
|1 | TABLE FULL SCAN|T1  |999      |669  |
|2 | TABLE GET      |T2  |1        |36   |
==========================================
Outputs & filters:
-------------------------------------
  0 - output([T1.C2 + T2.D2]), filter(nil),
      conds(nil), nl_params_([T1.C1])
  1 - output([T1.C1], [T1.C2]), filter(nil),
      access([T1.C1], [T1.C2]), partitions(p0)
  2 - output([T2.D2]), filter(nil),
      access([T2.D2]), partitions(p0)

In the preceding example, the Outputs & filters section shows in detail the output information of the NESTED-LOOP JOIN operator.

Field	Description
output	The output expressions of the operator.
filter	The filter conditions of the operator. In this example, `filter` is set to `nil` because no filter condition is configured for the `NESTED-LOOP JOIN` operator.
conds	The join conditions. For example, the join condition in query Q1 is `t1.c2 = t2.d2`.
nl_params_	The pushdown parameters generated based on the data of the table on the left of the `NESTED-LOOP JOIN` operator. For example, the pushdown parameter in query Q2 is `t1.c1`. For the iteration of each row of the table on the left, `NESTED-LOOP JOIN` creates a parameter based on `nl_params` and, based on this parameter and the original join condition `c1 = d1`, creates a filter condition applicable to the table on the right: `d1 = ?`. The filter condition is pushed down to the table on the right, extracting the query range of the index. The query range is the range of data to be scanned. In query Q2, Operator 2 is a `TABLE GET` operator because of the pushdown condition `d1=?`.

In the following example, no join condition is specified in query Q3. Operator 0 is displayed as NESTED-LOOP JOIN CARTESIAN, which is logically a NESTED-LOOP JOIN operator with no join conditions.

Q3:
obclient> EXPLAIN SELECT t1.c2 + t2.d2 FROM t1, t2;
*************************** 1. row ***************************
Query Plan:
| =====================================================
|ID|OPERATOR                  |NAME|EST. ROWS|COST  |
-----------------------------------------------------
|0 |NESTED-LOOP JOIN CARTESIAN|    |998001   |747480|
|1 | TABLE FULL SCAN          |T1  |999      |647   |
|2 | MATERIAL                 |    |999      |1519  |
|3 | TABLE FULL SCAN          |T2  |999      |647   |
=====================================================
Outputs & filters:
-------------------------------------
  0 - output([T1.C2 + T2.D2]), filter(nil),
      conds(nil), nl_params_(nil)
  1 - output([T1.C2]), filter(nil),
      access([T1.C2]), partitions(p0)
  2 - output([T2.D2]), filter(nil)
  3 - output([T2.D2]), filter(nil),
      access([T2.D2]), partitions(p0)

MERGE JOIN

In the following example, query Q4 uses MERGE JOIN by adding the hint USE_MERGE. Operator 0 is a MERGE JOIN operator and has two subnodes: operators 1 and 3. The operator merges data of the left and right subnodes, and therefore requires that the data of the two subnodes be sorted in relation to the join column.

Take query Q4 as an example. The join condition is t1.c2 = t2.d2, which means sorting data of t1 by c2, and data of t2 by d2. In query Q4, the output of Operator 2 is not sorted, and the output of Operator 4 is sorted by d1. Both of them do not meet the sorting requirements of a merge join. Therefore, operators 1 and 3 are assigned for sorting.

Q4:
obclient> EXPLAIN SELECT /*+USE_MERGE(t1, t2)*/ t1.c2 + t2.d2 FROM t1, t2 WHERE c2 = d2 AND c1 + d1 > 10;

Query Plan:
| ======================================
|ID|OPERATOR    |NAME|EST. ROWS|COST |
--------------------------------------
|0 |MERGE JOIN      |    |3261     |14199|
|1 |SORT            |    |999      |4505 |
|2 |TABLE FULL SCAN |T1  |999      |669  |
|3 |SORT            |    |999      |4483 |
|4 |TABLE FULL  SCAN|T2  |999      |647  |
==========================================
Outputs & filters:
-------------------------------------
  0 - output([T1.C2 + T2.D2]), filter(nil),
      equal_conds([T1.C2 = T2.D2]), other_conds([T1.C1 + T2.D1 > 10])
  1 - output([T1.C2], [T1.C1]), filter(nil), sort_keys([T1.C2, ASC])
  2 - output([T1.C2], [T1.C1]), filter(nil),
      access([T1.C2], [T1.C1]), partitions(p0)
  3 - output([T2.D2], [T2.D1]), filter(nil), sort_keys([T2.D2, ASC])
  4 - output([T2.D2], [T2.D1]), filter(nil),
      access([T2.D2], [T2.D1]), partitions(p0)

In the following example, the join condition in query Q5 is t1.c1 = t2.d1, which means sorting data of t1 by c1, and data of t2 by d1. In this execution plan, the primary table scan is performed for t2, and the results are sorted by d1. Therefore, it is unnecessary to assign an additional SORT operator. Ideally, proper indexes are selected for tables on the left and right of the join, and the data order specified by the indexes can meet the requirements of a merge join. In this case, no SORT operator is needed.

Q5:
obclient> EXPLAIN SELECT /*+USE_MERGE(t1, t2)*/ t1.c2 + t2.d2 FROM t1, t2 WHERE c1 = d1;

Query Plan:
| =====================================
|ID|OPERATOR    |NAME|EST. ROWS|COST|
-------------------------------------
|0 |MERGE JOIN       |    |990      |6096|
|1 | SORT            |    |999      |4505|
|2 |  TABLE FULL SCAN|T1  |999      |669 |
|3 | TABLE FULL SCAN |T2  |999      |647 |
==========================================
Outputs & filters:
-------------------------------------
  0 - output([T1.C2 + T2.D2]), filter(nil),
      equal_conds([T1.C1 = T2.D1]), other_conds(nil)
  1 - output([T1.C2], [T1.C1]), filter(nil), sort_keys([T1.C1, ASC])
  2 - output([T1.C1], [T1.C2]), filter(nil),
      access([T1.C1], [T1.C2]), partitions(p0)
  3 - output([T2.D1], [T2.D2]), filter(nil),
      access([T2.D1], [T2.D2]), partitions(p0)

In the preceding example, the Outputs & filters section shows in detail the output information of the MERGE JOIN operator.

Field	Description
output	The output expressions of the operator.
filter	The filter conditions of the operator. In this example, `filter` is set to `nil` because no filter condition is configured for the `MERGE JOIN` operator.
equal_conds	The equality join conditions for `MERGE JOIN`. The result sets of the subnodes on the left and right must be sorted in relation to the join column.
other_conds	Other join conditions. For example, query Q4 has an additional condition `t1.c1 + t2.d1 > 10`.

HASH JOIN

In the following example, query Q6 uses HASH JOIN based on the USE_HASH hint. Operator 0 is a HASH JOIN operator and has two subnodes: operators 1 and 2. The execution logic of this operator is as follows:

Read data from the subnode on the left to generate a hash value based on the join column, such as t1.c1, and then create a hash table.
Read data from the subnode on the right to generate a hash value based on the join column, such as t2.d1, and try to join with the data of t1 in the corresponding hash table.

Q6:
obclient> EXPLAIN SELECT /*+USE_HASH(t1, t2)*/ t1.c2 + t2.d2 FROM t1, t2 WHERE c1 = d1 AND c2 + d2 > 1;

Query Plan:
| ====================================
|ID|OPERATOR   |NAME|EST. ROWS|COST|
------------------------------------
|0 |HASH JOIN       |    |330      |4850|
|1 | TABLE FULL SCAN|T1  |999      |669 |
|2 | TABLE FULL SCAN|T2  |999      |647 |
=========================================
Outputs & filters:
-------------------------------------
  0 - output([T1.C2 + T2.D2]), filter(nil),
      equal_conds([T1.C1 = T2.D1]), other_conds([T1.C2 + T2.D2 > 1])
  1 - output([T1.C1], [T1.C2]), filter(nil),
      access([T1.C1], [T1.C2]), partitions(p0)
  2 - output([T2.D1], [T2.D2]), filter(nil),
      access([T2.D1], [T2.D2]), partitions(p0)

In the preceding example, the Outputs & filters section shows in detail the output information of the HASH JOIN operator.

Field	Description
output	The output expressions of the operator.
filter	The filter conditions of the operator. In this example, `filter` is set to `nil` because no filter condition is configured for the `HASH JOIN` operator.
equal_conds	The equality join conditions. The join columns on the left and right sides are used to calculate the hash value.
other_conds	Other join conditions. For example, query Q6 has an additional join condition `t1.c2 + t2.d2 > 1`.