Join algorithms|V4.1.0| docs|Distributed Database

Join algorithms

Last Updated：2023-08-01 06:02:28 Updated

OceanBase Database supports three join algorithms: NESTED LOOP JOIN, HASH JOIN, and MERGE JOIN.

HASH JOIN and MERGE JOIN apply only to equivalent JOIN conditions. NESTED LOOP JOIN applies to all JOIN conditions.

NESTED LOOP JOIN

NESTED LOOP JOIN scans a table (the left-side table). Each time when a record in the table is read, it scans another table (the right-side table) to find data that meets the conditions.

The scan can either be a fast index scan or a full table scan. Generally, the performance of a full table scan is low, so the optimizer does not use NESTED LOOP JOIN if the column specified in the JOIN condition does not have an index. In OceanBase Database, the execution plan shows whether it is possible to use a fast index scan.

In the following example, the first plan executes a full table scan for the right-side table because the JOIN condition is t1.c = t2.c, but table t2 does not have an index on the c column. For the second plan, the index is used to quickly locate the matching rows in the right-side table. This is because that the JOIN condition is t1.b = t2.b, and index k1 created on column b is selected for table t2 as the access path. This means that for every value in column b in each row of table t1, the matching row in table t2 can be quickly located.

obclient> CREATE TABLE t1(a INT PRIMARY KEY, b INT, c INT, KEY k1(b));
Query OK, 0 rows affected

obclient> CREATE TABLE t2(a INT PRIMARY KEY, b INT, c INT, KEY k1(b));
Query OK, 0 rows affected

obclient> EXPLAIN EXTENDED_NOADDR SELECT/*+USE_NL(t1, t2)*/ * FROM t1, t2
      WHERE t1.c = t2.c;
+-----------------------------------------------------------------+
| Query Plan                                                                              |
+-----------------------------------------------------------------+
| ===========================================
|ID|OPERATOR        |NAME|EST. ROWS|COST  |
-------------------------------------------
|0 |NESTED-LOOP JOIN|    |1980     |623742|
|1 | TABLE SCAN     |t1  |1000     |455   |
|2 | TABLE SCAN     |t2  |2        |622   |
===========================================

Outputs & filters:
-------------------------------------
  0 - output([t1.a], [t1.b], [t1.c], [t2.a], [t2.b], [t2.c]), filter(nil),
      conds(nil), nl_params_([t1.c])
  1 - output([t1.c], [t1.a], [t1.b]), filter(nil),
      access([t1.c], [t1.a], [t1.b]), partitions(p0),
      is_index_back=false,
      range_key([t1.a]), range(MIN ; MAX)always true
  2 - output([t2.c], [t2.a], [t2.b]), filter([? = t2.c]),
      access([t2.c], [t2.a], [t2.b]), partitions(p0),
      is_index_back=false, filter_before_indexback[false],
      range_key([t2.a]), range(MIN ; MAX)

obclient> EXPLAIN EXTENDED_NOADDR SELECT/*+USE_NL(t1, t2)*/ * FROM t1, t2
      WHERE t1.b = t2.b;
+-----------------------------------------------------------------+
| Query Plan                                                                              |
+-----------------------------------------------------------------+
| ============================================
|ID|OPERATOR        |NAME  |EST. ROWS|COST |
--------------------------------------------
|0 |NESTED-LOOP JOIN|      |1980     |94876|
|1 | TABLE SCAN     |t1    |1000     |455  |
|2 | TABLE SCAN     |t2(k1)|2        |94   |
============================================

Outputs & filters:
-------------------------------------
  0 - output([t1.a], [t1.b], [t1.c], [t2.a], [t2.b], [t2.c]), filter(nil),
      conds(nil), nl_params_([t1.b])
  1 - output([t1.b], [t1.a], [t1.c]), filter(nil),
      access([t1.b], [t1.a], [t1.c]), partitions(p0),
      is_index_back=false,
      range_key([t1.a]), range(MIN ; MAX)always true
  2 - output([t2.b], [t2.a], [t2.c]), filter(nil),
      access([t2.b], [t2.a], [t2.c]), partitions(p0),
      is_index_back=true,
      range_key([t2.b], [t2.a]), range(MIN ; MAX),
      range_cond([? = t2.b])

The NESTED LOOP JOIN algorithm is used to perform multiple full table scans on the right-side table. Each scan requires an iteration in the storage layer, so the scan cost is relatively high. Therefore, OceanBase Database allows the materialization of the results from right-side table scanning in the memory, so that data in the memory is directly scanned the next time. However, materialization in memory is not costless. Therefore, the OceanBase Database optimizer determines whether to perform right-side table materialization based on the costs.

As an optimized variant of the NESTED LOOP JOIN algorithm, the BLOCKED NESTED LOOP JOIN algorithm is used to read a block-sized row from the left-side table at a time and scan the right-side table to find the matching data. This reduces the number of reads from the right-side table.

The NESTED LOOP JOIN algorithm is typically used when the left-side table has a limited number of rows and the right-side table has an index on the column specified in the JOIN condition. In this case, the matching data for each row in the right-side table can be quickly located by using the index.

OceanBase Database also allows you to use the NESTED LOOP JOIN algorithm to join multiple tables by adding the hint /*+ USE_NL(table_name_list) */ to the query. In the following example, the system uses the HASH JOIN algorithm. The user can add the preceding hint to the query to change the algorithm to the NESTED LOOP JOIN algorithm.

obclient> CREATE TABLE t1(c1 INT, c2 INT);
Query OK, 0 rows affected  

obclient> CREATE TABLE t2(c1 INT, c2 INT);
Query OK, 0 rows affected  

obclient> EXPLAIN SELECT * FROM t1,t2 WHERE t1.c1 = t2.c1;
+-----------------------------------------------------------------+
| Query Plan                                                                              |
+-----------------------------------------------------------------+
| ========================================
|ID|OPERATOR   |NAME|EST. ROWS|COST    |
----------------------------------------
|0 |HASH JOIN  |    |98010000 |66774608|
|1 | TABLE SCAN|T1  |100000   |68478   |
|2 | TABLE SCAN|T2  |100000   |68478   |
========================================

Outputs & filters:
-------------------------------------
  0 - output([T1.C1], [T1.C2], [T2.C1], [T2.C2]), filter(nil),
      equal_conds([T1.C1 = T2.C1]), other_conds(nil)
  1 - output([T1.C1], [T1.C2]), filter(nil),
      access([T1.C1], [T1.C2]), partitions(p0)
  2 - output([T2.C1], [T2.C2]), filter(nil),
      access([T2.C1], [T2.C2]), partitions(p0)

obclient> EXPLAIN SELECT /*+USE_NL(t1, c2)*/* FROM  t1, t2 WHERE t1.c1 = t2.c1;
+-----------------------------------------------------------------+
| Query Plan                                                                              |
+-----------------------------------------------------------------+
| ===============================================
|ID|OPERATOR        |NAME|EST. ROWS|COST      |
-----------------------------------------------
|0 |NESTED-LOOP JOIN|    |98010000 |4595346207|
|1 | TABLE SCAN     |T1  |100000   |68478     |
|2 | MATERIAL       |    |100000   |243044    |
|3 |  TABLE SCAN    |T2  |100000   |68478     |
===============================================

Outputs & filters:
-------------------------------------
  0 - output([T1.C1], [T1.C2], [T2.C1], [T2.C2]), filter(nil),
      conds([T1.C1 = T2.C1]), nl_params_(nil)
  1 - output([T1.C1], [T1.C2]), filter(nil),
      access([T1.C1], [T1.C2]), partitions(p0)
  2 - output([T2.C1], [T2.C2]), filter(nil)
  3 - output([T2.C1], [T2.C2]), filter(nil),
      access([T2.C1], [T2.C2]), partitions(p0)

NESTED LOOP JOIN has two variants:

BLOCKED NESTED LOOP JOIN

The BLOCKED NESTED LOOP JOIN algorithm is implemented in OceanBase Database in the form of the BATCH NESTED LOOP JOIN algorithm, which reads data rows from the left-side table in batches (1,000 rows by default) before scanning the right-side table to find the matching data. This algorithm matches data in the left-side table and that in the right-side table in batches. This approach reduces the number of reads from the right-side table and the number of inner loops.

In the following example, the batch_join=true field indicates that the BATCH NESTED LOOP JOIN algorithm is used in the query.

obclient>CREATE TABLE t1(c1 INT PRIMARY KEY);
Query OK, 0 rows affected  

obclient>CREATE TABLE t2(c1 INT PRIMARY KEY);
Query OK, 0 rows affected  

obclient>EXPLAIN EXTENDED_NOADDR SELECT /*+USE_NL(t1,t2)*/*  FROM t1,t2
          WHERE t1.c1=t2.c1\G
*************************** 1. row ***************************
Query Plan:
============================================
|ID|OPERATOR        |NAME|EST. ROWS|COST   |
--------------------------------------------
|0 |NESTED-LOOP JOIN|    |100001   |3728786|
|1 | TABLE SCAN     |t1  |100000   |59654  |
|2 | TABLE GET      |t2  |1        |36     |
============================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t2.c1]), filter(nil),
      conds(nil), nl_params_([t1.c1]), inner_get=false, self_join=false, batch_join=true
  1 - output([t1.c1]), filter(nil),
      access([t1.c1]), partitions(p0),
      is_index_back=false,
      range_key([t1.c1]), range(MIN ; MAX)always true
  2 - output([t2.c1]), filter(nil),
      access([t2.c1]), partitions(p0),
      is_index_back=false,
      range_key([t2.c1]), range(MIN ; MAX),
      range_cond([? = t2.c1])

INDEX NESTED LOOP JOIN

The INDEX NESTED LOOP JOIN algorithm directly matches the index of the right-side table based on matching conditions of the left-side table to join tables. It avoids comparison with every record of the right-side table and reduces the matching times in the right-side table.

As shown in the following example, the JOIN condition is t1.c1 = t2.c1. The INDEX NESTED LOOP JOIN is used when column c1 of table t2 or column c1 of table t1 is indexed.

obclient> CREATE TABLE t1(c1 INT PRIMARY KEY);
Query OK, 0 rows affected  

obclient> CREATE TABLE t2(c1 INT ,c2 INT);
Query OK, 0 rows affected  

obclient> EXPLAIN SELECT /*+ORDERED USE_NL(t2,t1)*/ * FROM t2,
          (SELECT /*+NO_MERGE*/ * FROM t1)t1  WHERE t1.c1 = t2.c1 AND t2.c2 = 1\G
*************************** 1. row ***************************
Query Plan:
===========================================
|ID|OPERATOR |NAME|EST. ROWS|COST |
-------------------------------------------
|0 |NESTED-LOOP JOIN| |981 |117272|
|1 | TABLE SCAN |t2 |990 |80811 |
|2 | SUBPLAN SCAN |t1 |1 |37 |
|3 | TABLE GET |t1 |1 |36 |
===========================================
Outputs & filters:
-------------------------------------
0 - output([t2.c1], [t2.c2], [t1.c1]), filter(nil), conds(nil), nl_params_([t2.c1])
1 - output([t2.c1], [t2.c2]), filter([t2.c2 = 1]), access([t2.c1], [t2.c2]), partitions(p0)
2 - output([t1.c1]), filter(nil), access([t1.c1])
3 - output([t1.c1]), filter(nil), access([t1.c1]), partitions(p0)

In the outputs & filters section, parameter [t2.c1] of nl_param indicates the execution of conditional pushdown optimization. For more information, see JOIN.

In most cases of query optimization, the OceanBase Database optimizer prioritizes the INDEX NESTED LOOP JOIN algorithm before it uses the BATCH NESTED LOOP JOIN algorithm. These two algorithms can be used together. The NESTED LOOP JOIN algorithm is used as the final option.

DAS GROUP RESCAN

In OceanBase Database V4.1.0, DAS GROUP RESCAN applies to the single-level execution of the SUBPLAN FILTER (SPF) operator and two-level execution of the NESTED LOOP JOIN (NLJ) operator. DAS Group Rescan performs batching on the left branches of the SPF and NLJ operators to reduce the number of scans on their right branches. By default, DAS Group Rescan is enabled. You can disable it by using the _NLJ_BATCHING_ENABLED system variable. This variable must be enclosed in double quotation marks (") in Oracle mode. Syntax:

/* MySQL mode */
SET _NLJ_BATCHING_ENABLED=false;
/* Oracle mode */
SET "_NLJ_BATCHING_ENABLED"=false;

In the plan execution results of the NLJ operator, batch_join=true indicates that DAS GROUP RESCAN is performed. Sample code:

obclient> CREATE TABLE t1 (a INT, b INT, c INT, PRIMARY KEY(a, b));
obclient> CREATE TABLE t2 (a INT, b INT, c INT, PRIMARY KEY(a, b));
obclient> CREATE TABLE t3 (a INT, b INT , c INT, PRIMARY KEY(b, c));
obclient> EXPLAIN EXTENDED_NOADDR select /*+no_rewrite leading(t1 v) use_nl(t1 v)*/ count(*), sum(t1.a+t1.b+t1.c+v.a+v.b+v.c) FROM t1, (select /*+no_rewrite leading(t2 t3) use_nl(t2 t3)*/ t2.a, t2.b, t3.c from t2, t3 where t2.c = t3.b) v WHERE t1.a = v.c AND t1.c = v.a;
==========================================================
|ID|OPERATOR                  |NAME|EST.ROWS|EST.TIME(us)|
----------------------------------------------------------
|0 |SCALAR GROUP BY           |    |1       |183393      |
|1 | NESTED-LOOP JOIN         |    |1401    |183346      |
|2 |  TABLE SCAN              |t1  |1010    |83          |
|3 |  SUBPLAN SCAN            |v   |11      |196         |
|4 |   NESTED-LOOP JOIN       |    |11      |196         |
|5 |    DISTRIBUTED TABLE SCAN|t2  |10      |19          |
|6 |    DISTRIBUTED TABLE GET |t3  |1       |16          |
==========================================================
Outputs & filters:
-------------------------------------
  0 - output([T_FUN_COUNT(*)], [T_FUN_SUM(t1.a + t1.b + t1.c + v.a + v.b + v.c)]), filter(nil), rowset=256
      group(nil), agg_func([T_FUN_COUNT(*)], [T_FUN_SUM(t1.a + t1.b + t1.c + v.a + v.b + v.c)])
  1 - output([t1.a], [t1.c], [t1.b], [v.c], [v.a], [v.b]), filter(nil), rowset=256
      conds(nil), nl_params_([t1.a], [t1.c]), batch_join=true
  2 - output([t1.a], [t1.b], [t1.c]), filter(nil), rowset=256
      access([t1.a], [t1.b], [t1.c]), partitions(p0)
      is_index_back=false, is_global_index=false,
      range_key([t1.a], [t1.b]), range(MIN,MIN ; MAX,MAX)always true
  3 - output([v.c], [v.a], [v.b]), filter(nil), rowset=256
      access([v.c], [v.a], [v.b])
  4 - output([t2.a], [t2.b], [t3.c]), filter(nil), rowset=256
      conds(nil), nl_params_([t2.c]), batch_join=true
  5 - output([t2.a], [t2.b], [t2.c]), filter(nil), rowset=256
      access([GROUP_ID], [t2.a], [t2.b], [t2.c]), partitions(p0)
      is_index_back=false, is_global_index=false,
      range_key([t2.a], [t2.b]), range(MIN,MIN ; MAX,MAX)always true,
      range_cond([:2 = t2.a])
  6 - output([t3.c]), filter(nil), rowset=256
      access([GROUP_ID], [t3.c]), partitions(p0)
      is_index_back=false, is_global_index=false,
      range_key([t3.b], [t3.c]), range(MIN ; MAX),
      range_cond([:1 = t3.c], [:3 = t3.b])

In the plan execution results of the SPF operator, batch_das=true indicates that DAS GROUP RESCAN is performed. Sample code:

obclient> CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 DECIMAL, c3 INT, c4 VARCHAR(20)) PARTITION BY HASH(c1) PARTITIONS 2;
obclient> CREATE TABLE t2 (c1 INT PRIMARY KEY, c2 DECIMAL, c3 INT, c4 VARCHAR(20));
obclient> EXPLAIN EXTENDED_NOADDR SELECT c1,c1 in (SELECT c1 FROM t2 WHERE t2.c1 <= t1.c1) AND c1 IN (SELECT c2 FROM t2 WHERE t2.c1 <= t1.c1) or c1 + (SELECT c1 FROM t2 WHERE t2.c1 = t1.c1) AS x FROM t1;
=============================================================
|ID|OPERATOR                 |NAME    |EST.ROWS|EST.TIME(us)|
-------------------------------------------------------------
|0 |PX COORDINATOR           |        |1       |59          |
|1 | EXCHANGE OUT DISTR      |:EX10000|1       |59          |
|2 |  SUBPLAN FILTER         |        |1       |58          |
|3 |   PX PARTITION ITERATOR |        |1       |4           |
|4 |    TABLE SCAN           |t1      |1       |4           |
|5 |   DISTRIBUTED TABLE SCAN|t2      |8       |19          |
|6 |   DISTRIBUTED TABLE SCAN|t2      |8       |19          |
|7 |   DISTRIBUTED TABLE GET |t2      |1       |18          |
=============================================================
Outputs & filters:
-------------------------------------
  0 - output([INTERNAL_FUNCTION(t1.c1, t1.c1 = ANY(subquery(1)) AND cast(t1.c1, DECIMAL(11, 0)) = ANY(subquery(2)) OR BOOL(t1.c1 + subquery(3)))]), filter(nil),
rowset=256
  1 - output([INTERNAL_FUNCTION(t1.c1, t1.c1 = ANY(subquery(1)) AND cast(t1.c1, DECIMAL(11, 0)) = ANY(subquery(2)) OR BOOL(t1.c1 + subquery(3)))]), filter(nil),
rowset=256
      dop=1
  2 - output([t1.c1], [t1.c1 = ANY(subquery(1))], [cast(t1.c1, DECIMAL(11, 0)) = ANY(subquery(2))], [subquery(3)]), filter(nil), rowset=256
      exec_params_([t1.c1], [t1.c1], [t1.c1]), onetime_exprs_(nil), init_plan_idxs_(nil), batch_das=true
  3 - output([t1.c1]), filter(nil), rowset=256
      force partition granule
  4 - output([t1.c1]), filter(nil), rowset=256
      access([t1.c1]), partitions(p[0-1])
      is_index_back=false, is_global_index=false,
      range_key([t1.c1]), range(MIN ; MAX)always true
  5 - output([t2.c1]), filter(nil), rowset=256
      access([GROUP_ID], [t2.c1]), partitions(p0)
      is_index_back=false, is_global_index=false,
      range_key([t2.c1]), range(MIN ; MAX)always true,
      range_cond([t2.c1 <= :0])
  6 - output([t2.c2]), filter(nil), rowset=256
      access([GROUP_ID], [t2.c2]), partitions(p0)
      is_index_back=false, is_global_index=false,
      range_key([t2.c1]), range(MIN ; MAX)always true,
      range_cond([t2.c1 <= :1])
  7 - output([t2.c1]), filter(nil), rowset=256
      access([GROUP_ID], [t2.c1]), partitions(p0)
      is_index_back=false, is_global_index=false,
      range_key([t2.c1]), range(MIN ; MAX)always true,
      range_cond([t2.c1 = :2])

MERGE JOIN

To join two tables, the MERGE JOIN algorithm sorts them by JOIN fields before scanning them for joining. External sorting is required if the memory is insufficient.

MERGE JOIN starts by taking a record from each table for matching. If the record meets the JOIN condition, it is put into the result set. Otherwise, the optimizer discards the record that has a smaller JOIN field value and continues by taking the next record from the same table until the matching record is found.

Temporary space is required for merging two tables with multiple JOIN fields. For example, when MERGE JOIN is applied for A JOIN B, if A and B both have JOIN fields of multiple records, such as A1, A2... An for A and B1, B2... Bn for B. Then, every record of A must be matched with its counterpart in B. In this case, the pointer needs to move from B1 to Bn many times and read a record each time. So, it is faster if the records from B1 to Bn are fetched to an in-memory temporary table in advance than reading data from its original storage page or disk. In some scenarios, sorting is skipped if an index is applied to JOIN fields in a consistent order.

Generally, MERGE JOIN is suitable for ordered tables. Otherwise, HASH JOIN would be better. The following example displays two plans that use the MERGE JOIN algorithm. Sorting is required for the first one but not for the second one. This is because the two k1 indexes are used as the access paths of the two tables, and the k1 indexes are sorted by column b.

obclient> CREATE TABLE t1(a INT PRIMARY KEY, b INT, c INT, KEY k1(b));
Query OK, 0 rows affected  

obclient> CREATE TABLE t2(a INT PRIMARY KEY, b INT, c INT, KEY k1(b));
Query OK, 0 rows affected  

obclient> EXPLAIN SELECT/*+USE_MERGE(t1, t2)*/ * FROM t1, t2 WHERE t1.c = t2.c;
*************************** 1. row ***************************
Query Plan:
| =====================================
|ID|OPERATOR    |NAME|EST. ROWS|COST|
-------------------------------------
|0 |MERGE JOIN  |    |1980     |6011|
|1 | SORT       |    |1000     |2198|
|2 |  TABLE SCAN|t1  |1000     |455 |
|3 | SORT       |    |1000     |2198|
|4 |  TABLE SCAN|t2  |1000     |455 |
=====================================

Outputs & filters:
-------------------------------------
  0 - output([t1.a], [t1.b], [t1.c], [t2.a], [t2.b], [t2.c]), filter(nil),
      equal_conds([t1.c = t2.c]), other_conds(nil)
  1 - output([t1.a], [t1.b], [t1.c]), filter(nil), sort_keys([t1.c, ASC])
  2 - output([t1.c], [t1.a], [t1.b]), filter(nil),
      access([t1.c], [t1.a], [t1.b]), partitions(p0)
  3 - output([t2.a], [t2.b], [t2.c]), filter(nil), sort_keys([t2.c, ASC])
  4 - output([t2.c], [t2.a], [t2.b]), filter(nil),
      access([t2.c], [t2.a], [t2.b]), partitions(p0)


obclient> EXPLAIN SELECT/*+USE_MERGE(t1, t2),INDEX(t1 k1),INDEX(t2 k1)*/ *
        FROM t1, t2 WHERE t1.b = t2.b;
*************************** 1. row ***************************
Query Plan:
| =======================================
|ID|OPERATOR   |NAME  |EST. ROWS|COST |
---------------------------------------
|0 |MERGE JOIN |      |1980     |12748|
|1 | TABLE SCAN|t1(k1)|1000     |5566 |
|2 | TABLE SCAN|t2(k1)|1000     |5566 |
=======================================

Outputs & filters:
-------------------------------------
  0 - output([t1.a], [t1.b], [t1.c], [t2.a], [t2.b], [t2.c]), filter(nil),
      equal_conds([t1.b = t2.b]), other_conds(nil)
  1 - output([t1.b], [t1.a], [t1.c]), filter(nil),
      access([t1.b], [t1.a], [t1.c]), partitions(p0)
  2 - output([t2.b], [t2.a], [t2.c]), filter(nil),
      access([t2.b], [t2.a], [t2.c]), partitions(p0)

OceanBase Database also allows you to use the MERGE JOIN algorithm to join multiple tables by adding the hint /*+ USE_MERGE(table_name_list) */ to the query. In the following example, the system uses the HASH JOIN algorithm. The user can add the preceding hint to the query to change the algorithm to the MERGE JOIN algorithm.

obclient> CREATE TABLE t1(c1 INT, c2 INT);
Query OK, 0 rows affected  

obclient> CREATE TABLE t2(c1 INT, c2 INT);
Query OK, 0 rows affected  

obclient> EXPLAIN SELECT * FROM t1,t2 WHERE t1.c1 = t2.c1;
*************************** 1. row ***************************
Query Plan:
| ========================================
|ID|OPERATOR   |NAME|EST. ROWS|COST    |
----------------------------------------
|0 |HASH JOIN  |    |98010000 |66774608|
|1 | TABLE SCAN|T1  |100000   |68478   |
|2 | TABLE SCAN|T2  |100000   |68478   |
========================================

Outputs & filters:
-------------------------------------
  0 - output([T1.C1], [T1.C2], [T2.C1], [T2.C2]), filter(nil),
      equal_conds([T1.C1 = T2.C1]), other_conds(nil)
  1 - output([T1.C1], [T1.C2]), filter(nil),
      access([T1.C1], [T1.C2]), partitions(p0)
  2 - output([T2.C1], [T2.C2]), filter(nil),
      access([T2.C1], [T2.C2]), partitions(p0)

obclient> EXPLAIN SELECT /*+USE_MERGE(t1,t2)*/* FROM t1, t2 WHERE t1.c1 = t2.c1;
*************************** 1. row ***************************
Query Plan:
 | =========================================
|ID|OPERATOR    |NAME|EST. ROWS|COST    |
-----------------------------------------
|0 |MERGE JOIN  |    |98010000 |67488837|
|1 | SORT       |    |100000   |563680  |
|2 |  TABLE SCAN|T1  |100000   |68478   |
|3 | SORT       |    |100000   |563680  |
|4 |  TABLE SCAN|T2  |100000   |68478   |
=========================================

Outputs & filters:
-------------------------------------
  0 - output([T1.C1], [T1.C2], [T2.C1], [T2.C2]), filter(nil),
      equal_conds([T1.C1 = T2.C1]), other_conds(nil)
  1 - output([T1.C1], [T1.C2]), filter(nil), sort_keys([T1.C1, ASC])
  2 - output([T1.C1], [T1.C2]), filter(nil),
      access([T1.C1], [T1.C2]), partitions(p0)
  3 - output([T2.C1], [T2.C2]), filter(nil), sort_keys([T2.C1, ASC])
  4 - output([T2.C1], [T2.C2]), filter(nil),
      access([T2.C1], [T2.C2]), partitions(p0)

HASH JOIN

To join two tables, HASH JOIN uses the smaller table, or "the build table", to create a hash table based on JOIN conditions. It then scans the larger table, or "the probe table", line by line to find the matching row by probing the hash table. If the build table is so large that the hash table cannot be stored in memory, OceanBase Database splits the build table and the probe table into multiple partitions by JOIN conditions, each including a standalone pair of build table and probe table. In this way, a large HASH JOIN task is divided into several separate subtasks, so that HASH JOIN can be executed in the memory for every partition. In most cases, the HASH JOIN algorithm has higher efficiency than other JOIN algorithms.

The following example shows a plan that uses the HASH JOIN algorithm.

obclient> CREATE TABLE t1(a INT PRIMARY KEY, b INT, c INT, KEY k1(b));
Query OK, 0 rows affected  

obclient>CREATE TABLE t2(a INT PRIMARY KEY, b INT, c INT, KEY k1(b));
Query OK, 0 rows affected  

obclient> EXPLAIN SELECT/*+USE_HASH(t1, t2)*/ * FROM t1, t2 WHERE t1.c = t2.c;
*************************** 1. row ***************************
Query Plan:
| ====================================
|ID|OPERATOR   |NAME|EST. ROWS|COST|
------------------------------------
|0 |HASH JOIN  |    |1980     |4093|
|1 | TABLE SCAN|t1  |1000     |455 |
|2 | TABLE SCAN|t2  |1000     |455 |
====================================

Outputs & filters:
-------------------------------------
  0 - output([t1.a], [t1.b], [t1.c], [t2.a], [t2.b], [t2.c]), filter(nil),
      equal_conds([t1.c = t2.c]), other_conds(nil)
  1 - output([t1.c], [t1.a], [t1.b]), filter(nil),
      access([t1.c], [t1.a], [t1.b]), partitions(p0)
  2 - output([t2.c], [t2.a], [t2.b]), filter(nil),
      access([t2.c], [t2.a], [t2.b]), partitions(p0)

OceanBase Database also allows you to use the HASH JOIN algorithm to join multiple tables by adding the hint /*+ USE_HASH(table_name_list) */ to the query. In the following example, the system uses the MERGE JOIN algorithm. The user can add the preceding hint to the query to change the algorithm to the HASH JOIN algorithm.

obclient> CREATE TABLE t1(c1 INT, c2 INT, PRIMARY KEY(c1));
Query OK, 0 rows affected  

obclient> CREATE TABLE t2(c1 INT, c2 INT, PRIMARY KEY(c1));
Query OK, 0 rows affected  

obclient> EXPLAIN SELECT * FROM t1, t2 WHERE t1.c1 = t2.c1;
*************************** 1. row ***************************
Query Plan:
| ======================================
|ID|OPERATOR   |NAME|EST. ROWS|COST  |
--------------------------------------
|0 |MERGE JOIN |    |100001   |219005|
|1 | TABLE SCAN|T1  |100000   |61860 |
|2 | TABLE SCAN|T2  |100000   |61860 |
======================================

Outputs & filters:
-------------------------------------
  0 - output([T1.C1], [T1.C2], [T2.C1], [T2.C2]), filter(nil),
      equal_conds([T1.C1 = T2.C1]), other_conds(nil)
  1 - output([T1.C1], [T1.C2]), filter(nil),
      access([T1.C1], [T1.C2]), partitions(p0)
  2 - output([T2.C1], [T2.C2]), filter(nil),
      access([T2.C1], [T2.C2]), partitions(p0)

obclient> EXPLAIN SELECT /*+USE_HASH(t1, t2)*/ * FROM t1, t2 WHERE t1.c1 = t2.c1;
*************************** 1. row ***************************
Query Plan:
 | ======================================
|ID|OPERATOR   |NAME|EST. ROWS|COST  |
--------------------------------------
|0 |HASH JOIN  |    |100001   |495180|
|1 | TABLE SCAN|T1  |100000   |61860 |
|2 | TABLE SCAN|T2  |100000   |61860 |
======================================

Outputs & filters:
-------------------------------------
  0 - output([T1.C1], [T1.C2], [T2.C1], [T2.C2]), filter(nil),
      equal_conds([T1.C1 = T2.C1]), other_conds(nil)
  1 - output([T1.C1], [T1.C2]), filter(nil),
      access([T1.C1], [T1.C2]), partitions(p0)
  2 - output([T2.C1], [T2.C2]), filter(nil),
      access([T2.C1], [T2.C2]), partitions(p0)