Join algorithms|V4.3.1| docs|Distributed Database

Join algorithms

Last Updated：2024-06-28 05:30:31 Updated

OceanBase Database supports three join algorithms: nested loop join, hash join, and merge join.

Hash joins and merge joins apply only to equality join conditions. Nested loop joins apply to all join conditions.

Nested loop join

A nested loop join scans a table (the left-side table). Each time when a record in the table is read, it scans another table (the right-side table) to find data that meets the conditions.

The scan can either be a fast index scan or a full table scan. Generally, the performance of a full table scan is low, so the optimizer does not use a nested loop join if the column specified in the join condition does not have an index. In OceanBase Database, the execution plan shows whether it is possible to use a fast index scan.

In the following example, the first plan executes a full table scan for the right-side table because the join condition is t1.c = t2.c, but table t2 does not have an index on the c column. For the second plan, the index is used to quickly locate the matching rows in the right-side table. This is because that the join condition is t1.b = t2.b, and index k1 created on column b is selected for table t2 as the access path. This means that for every value in column b in each row of table t1, the matching row in table t2 can be quickly located.

obclient> CREATE TABLE t1(a INT PRIMARY KEY, b INT, c INT, KEY k1(b));
Query OK, 0 rows affected

obclient> CREATE TABLE t2(a INT PRIMARY KEY, b INT, c INT, KEY k1(b));
Query OK, 0 rows affected

obclient> EXPLAIN EXTENDED_NOADDR SELECT/*+USE_NL(t1, t2)*/ * FROM t1, t2
      WHERE t1.c = t2.c;
+-----------------------------------------------------------------+
| Query Plan                                                                              |
+-----------------------------------------------------------------+
| ===========================================
|ID|OPERATOR        |NAME|EST. ROWS|COST  |
-------------------------------------------
|0 |NESTED-LOOP JOIN|    |1980     |623742|
|1 | TABLE SCAN     |t1  |1000     |455   |
|2 | TABLE SCAN     |t2  |2        |622   |
===========================================

Outputs & filters:
-------------------------------------
  0 - output([t1.a], [t1.b], [t1.c], [t2.a], [t2.b], [t2.c]), filter(nil),
      conds(nil), nl_params_([t1.c])
  1 - output([t1.c], [t1.a], [t1.b]), filter(nil),
      access([t1.c], [t1.a], [t1.b]), partitions(p0),
      is_index_back=false,
      range_key([t1.a]), range(MIN ; MAX)always true
  2 - output([t2.c], [t2.a], [t2.b]), filter([? = t2.c]),
      access([t2.c], [t2.a], [t2.b]), partitions(p0),
      is_index_back=false, filter_before_indexback[false],
      range_key([t2.a]), range(MIN ; MAX)

obclient> EXPLAIN EXTENDED_NOADDR SELECT/*+USE_NL(t1, t2)*/ * FROM t1, t2
      WHERE t1.b = t2.b;
+-----------------------------------------------------------------+
| Query Plan                                                                              |
+-----------------------------------------------------------------+
| ============================================
|ID|OPERATOR        |NAME  |EST. ROWS|COST |
--------------------------------------------
|0 |NESTED-LOOP JOIN|      |1980     |94876|
|1 | TABLE SCAN     |t1    |1000     |455  |
|2 | TABLE SCAN     |t2(k1)|2        |94   |
============================================

Outputs & filters:
-------------------------------------
  0 - output([t1.a], [t1.b], [t1.c], [t2.a], [t2.b], [t2.c]), filter(nil),
      conds(nil), nl_params_([t1.b])
  1 - output([t1.b], [t1.a], [t1.c]), filter(nil),
      access([t1.b], [t1.a], [t1.c]), partitions(p0),
      is_index_back=false,
      range_key([t1.a]), range(MIN ; MAX)always true
  2 - output([t2.b], [t2.a], [t2.c]), filter(nil),
      access([t2.b], [t2.a], [t2.c]), partitions(p0),
      is_index_back=true,
      range_key([t2.b], [t2.a]), range(MIN ; MAX),
      range_cond([? = t2.b])

A nested loop join may perform multiple full table scans on the right-side table. Each scan requires an iteration in the storage layer, so the scan cost is relatively high. Therefore, OceanBase Database allows the materialization of the results from right-side table scanning in the memory, so that data in the memory is directly scanned the next time. However, materialization in memory is not costless. Therefore, the OceanBase Database optimizer determines whether to perform right-side table materialization based on the costs.

As an optimized variant of nested loop joins, block nested loop joins are used to read a block-sized row from the left-side table at a time and scan the right-side table to find the matching data. This reduces the number of reads from the right-side table.

A nested loop join is typically used when the left-side table has a limited number of rows and the right-side table has an index on the column specified in the join condition. In this case, the matching data for each row in the right-side table can be quickly located by using the index.

OceanBase Database also allows you to use a nested loop join to join multiple tables by adding the hint /*+ USE_NL(table_name_list) */ to the query. In the following example, the system selects the hash join algorithm. The user can add the preceding hint to the query to use a nested loop join.

obclient> CREATE TABLE t1(c1 INT, c2 INT);
Query OK, 0 rows affected  

obclient> CREATE TABLE t2(c1 INT, c2 INT);
Query OK, 0 rows affected  

obclient> EXPLAIN SELECT * FROM t1,t2 WHERE t1.c1 = t2.c1;
+-----------------------------------------------------------------+
| Query Plan                                                                              |
+-----------------------------------------------------------------+
| ========================================
|ID|OPERATOR   |NAME|EST. ROWS|COST    |
----------------------------------------
|0 |HASH JOIN  |    |98010000 |66774608|
|1 | TABLE SCAN|T1  |100000   |68478   |
|2 | TABLE SCAN|T2  |100000   |68478   |
========================================

Outputs & filters:
-------------------------------------
  0 - output([T1.C1], [T1.C2], [T2.C1], [T2.C2]), filter(nil),
      equal_conds([T1.C1 = T2.C1]), other_conds(nil)
  1 - output([T1.C1], [T1.C2]), filter(nil),
      access([T1.C1], [T1.C2]), partitions(p0)
  2 - output([T2.C1], [T2.C2]), filter(nil),
      access([T2.C1], [T2.C2]), partitions(p0)

obclient> EXPLAIN SELECT /*+USE_NL(t1, t2)*/* FROM  t1, t2 WHERE t1.c1 = t2.c1;
+-----------------------------------------------------------------+
| Query Plan                                                                              |
+-----------------------------------------------------------------+
| ===============================================
|ID|OPERATOR        |NAME|EST. ROWS|COST      |
-----------------------------------------------
|0 |NESTED-LOOP JOIN|    |98010000 |4595346207|
|1 | TABLE SCAN     |T1  |100000   |68478     |
|2 | MATERIAL       |    |100000   |243044    |
|3 |  TABLE SCAN    |T2  |100000   |68478     |
===============================================

Outputs & filters:
-------------------------------------
  0 - output([T1.C1], [T1.C2], [T2.C1], [T2.C2]), filter(nil),
      conds([T1.C1 = T2.C1]), nl_params_(nil)
  1 - output([T1.C1], [T1.C2]), filter(nil),
      access([T1.C1], [T1.C2]), partitions(p0)
  2 - output([T2.C1], [T2.C2]), filter(nil)
  3 - output([T2.C1], [T2.C2]), filter(nil),
      access([T2.C1], [T2.C2]), partitions(p0)

Nested loop joins have two variants:

Block nested loop join

A block nested loop join is implemented in OceanBase Database in the form of a batch nested loop join, which reads data rows from the left-side table in batches (1,000 rows by default) before scanning the right-side table to find the matching data. This algorithm matches data in the left-side table and that in the right-side table in batches. This approach reduces the number of reads from the right-side table and the number of inner loops.

In the following example, the batch_join=true field indicates that a batch nested loop join is used in the query.

obclient>CREATE TABLE t1(c1 INT PRIMARY KEY);
Query OK, 0 rows affected  

obclient>CREATE TABLE t2(c1 INT PRIMARY KEY);
Query OK, 0 rows affected  

obclient>EXPLAIN EXTENDED_NOADDR SELECT /*+USE_NL(t1,t2)*/*  FROM t1,t2
          WHERE t1.c1=t2.c1\G
*************************** 1. row ***************************
Query Plan:
============================================
|ID|OPERATOR        |NAME|EST. ROWS|COST   |
--------------------------------------------
|0 |NESTED-LOOP JOIN|    |100001   |3728786|
|1 | TABLE SCAN     |t1  |100000   |59654  |
|2 | TABLE GET      |t2  |1        |36     |
============================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t2.c1]), filter(nil),
      conds(nil), nl_params_([t1.c1]), inner_get=false, self_join=false, batch_join=true
  1 - output([t1.c1]), filter(nil),
      access([t1.c1]), partitions(p0),
      is_index_back=false,
      range_key([t1.c1]), range(MIN ; MAX)always true
  2 - output([t2.c1]), filter(nil),
      access([t2.c1]), partitions(p0),
      is_index_back=false,
      range_key([t2.c1]), range(MIN ; MAX),
      range_cond([? = t2.c1])

Index nested loop join

An index nested loop join directly matches the index of the right-side table based on matching conditions of the left-side table to join tables. It avoids comparison with every record of the right-side table and reduces the matching times in the right-side table.

As shown in the following example, the join condition is t1.c1 = t2.c1. An index nested loop join is used when column c1 of table t2 or column c1 of table t1 is indexed.

obclient> CREATE TABLE t1(c1 INT PRIMARY KEY);
Query OK, 0 rows affected  

obclient> CREATE TABLE t2(c1 INT ,c2 INT);
Query OK, 0 rows affected  

obclient> EXPLAIN SELECT /*+ORDERED USE_NL(t2,t1)*/ * FROM t2,
          (SELECT /*+NO_MERGE*/ * FROM t1)t1  WHERE t1.c1 = t2.c1 AND t2.c2 = 1\G
*************************** 1. row ***************************
Query Plan:
===========================================
|ID|OPERATOR |NAME|EST. ROWS|COST |
-------------------------------------------
|0 |NESTED-LOOP JOIN| |981 |117272|
|1 | TABLE SCAN |t2 |990 |80811 |
|2 | SUBPLAN SCAN |t1 |1 |37 |
|3 | TABLE GET |t1 |1 |36 |
===========================================
Outputs & filters:
-------------------------------------
0 - output([t2.c1], [t2.c2], [t1.c1]), filter(nil), conds(nil), nl_params_([t2.c1])
1 - output([t2.c1], [t2.c2]), filter([t2.c2 = 1]), access([t2.c1], [t2.c2]), partitions(p0)
2 - output([t1.c1]), filter(nil), access([t1.c1])
3 - output([t1.c1]), filter(nil), access([t1.c1]), partitions(p0)

In the outputs & filters section, parameter [t2.c1] of nl_param indicates the execution of conditional pushdown optimization. For more information, see JOIN.

In most cases of query optimization, the OceanBase Database optimizer prioritizes an index nested loop join before it uses a batch nested loop join. The two optimization methods can be used together. A nested loop join is used as the final option.

DAS group rescan

In OceanBase Database V4.1.0, distribute access service (DAS) group rescan applies to single-level nested loop join (NLJ) execution. This optimization performs batching on the left side of the NLJ to reduce the number of scans on its right side. By default, DAS group rescan is enabled. You can disable it by using the _NLJ_BATCHING_ENABLED system variable. This variable must be enclosed in double quotation marks (") in Oracle mode. The syntax is as follows:

/* MySQL mode */
SET _NLJ_BATCHING_ENABLED=false;
/* Oracle mode */
SET "_NLJ_BATCHING_ENABLED"=false;

In the execution plan results for NLJ, you can see the use_batch=true flag at Operator 4 (4 - output). This means that the DAS group rescan optimization is enabled for the NLJ operator. Here is an example:

obclient> CREATE TABLE t1 (a INT, b INT, c INT, PRIMARY KEY(a, b));
obclient> CREATE TABLE t2 (a INT, b INT, c INT, PRIMARY KEY(a, b));
obclient> CREATE TABLE t3 (a INT, b INT , c INT, PRIMARY KEY(b, c));
obclient> EXPLAIN EXTENDED_NOADDR select /*+no_rewrite leading(t1 v) use_nl(t1 v)*/ count(*), sum(t1.a+t1.b+t1.c+v.a+v.b+v.c) FROM t1, (select /*+no_rewrite leading(t2 t3) use_nl(t2 t3)*/ t2.a, t2.b, t3.c from t2, t3 where t2.c = t3.b) v WHERE t1.a = v.c AND t1.c = v.a;
+-----------------------------------------------------------------------------------------------------------+
| Query Plan                                                                                                |
+-----------------------------------------------------------------------------------------------------------+
| ====================================================================                                      |
| |ID|OPERATOR                            |NAME|EST.ROWS|EST.TIME(us)|                                      |
| --------------------------------------------------------------------                                      |
| |0 |SCALAR GROUP BY                     |    |1       |44          |                                      |
| |1 |└─NESTED-LOOP JOIN                  |    |1       |44          |                                      |
| |2 |  ├─TABLE FULL SCAN                 |t1  |1       |4           |                                      |
| |3 |  └─SUBPLAN SCAN                    |v   |1       |39          |                                      |
| |4 |    └─NESTED-LOOP JOIN              |    |1       |39          |                                      |
| |5 |      ├─DISTRIBUTED TABLE RANGE SCAN|t2  |1       |21          |                                      |
| |6 |      └─DISTRIBUTED TABLE GET       |t3  |1       |18          |                                      |
| ====================================================================                                      |
| Outputs & filters:                                                                                        |
| -------------------------------------                                                                     |
|   0 - output([T_FUN_COUNT(*)], [T_FUN_SUM(t1.a + t1.b + t1.c + v.a + v.b + v.c)]), filter(nil), rowset=16 |
|       group(nil), agg_func([T_FUN_COUNT(*)], [T_FUN_SUM(t1.a + t1.b + t1.c + v.a + v.b + v.c)])           |
|   1 - output([t1.a], [t1.c], [t1.b], [v.c], [v.a], [v.b]), filter(nil), rowset=16                         |
|       conds(nil), nl_params_([t1.a(:1)], [t1.c(:2)]), use_batch=false                                     |
|   2 - output([t1.a], [t1.b], [t1.c]), filter(nil), rowset=16                                              |
|       access([t1.a], [t1.b], [t1.c]), partitions(p0)                                                      |
|       is_index_back=false, is_global_index=false,                                                         |
|       range_key([t1.a], [t1.b]), range(MIN,MIN ; MAX,MAX)always true                                      |
|   3 - output([v.c], [v.a], [v.b]), filter(nil), rowset=16                                                 |
|       access([v.c], [v.a], [v.b])                                                                         |
|   4 - output([t2.a], [t2.b], [t3.c]), filter(nil), rowset=16                                              |
|       conds(nil), nl_params_([t2.c(:3)]), use_batch=true                                                  |
|   5 - output([t2.a], [t2.b], [t2.c]), filter(nil), rowset=16                                              |
|       access([t2.a], [t2.b], [t2.c]), partitions(p0)                                                      |
|       is_index_back=false, is_global_index=false,                                                         |
|       range_key([t2.a], [t2.b]), range(MIN,MIN ; MAX,MAX)always true,                                     |
|       range_cond([:2 = t2.a])                                                                             |
|   6 - output([t3.c]), filter(nil), rowset=16                                                              |
|       access([GROUP_ID], [t3.c]), partitions(p0)                                                          |
|       is_index_back=false, is_global_index=false,                                                         |
|       range_key([t3.b], [t3.c]), range(MIN ; MAX),                                                        |
|       range_cond([:1 = t3.c], [:3 = t3.b])                                                                |
| Used Hint:                                                                                                |
| -------------------------------------                                                                     |
|   /*+                                                                                                     |
|                                                                                                           |
|       LEADING(("t1" "v"))                                                                                 |
|       USE_NL("v")                                                                                         |
|       LEADING(("t2" "t3"))                                                                                |
|       USE_NL("t3")                                                                                        |
|       NO_REWRITE                                                                                          |
|       NO_REWRITE                                                                                          |
|   */                                                                                                      |
| Qb name trace:                                                                                            |
| -------------------------------------                                                                     |
|   stmt_id:0, stmt_type:T_EXPLAIN                                                                          |
|   stmt_id:1, SEL$1                                                                                        |
|   stmt_id:2, SEL$2                                                                                        |
| Outline Data:                                                                                             |
| -------------------------------------                                                                     |
|   /*+                                                                                                     |
|       BEGIN_OUTLINE_DATA                                                                                  |
|       LEADING(@"SEL$1" ("test"."t1"@"SEL$1" "v"@"SEL$1"))                                                 |
|       USE_NL(@"SEL$1" "v"@"SEL$1")                                                                        |
|       FULL(@"SEL$1" "test"."t1"@"SEL$1")                                                                  |
|       LEADING(@"SEL$2" ("test"."t2"@"SEL$2" "test"."t3"@"SEL$2"))                                         |
|       USE_NL(@"SEL$2" "test"."t3"@"SEL$2")                                                                |
|       FULL(@"SEL$2" "test"."t2"@"SEL$2")                                                                  |
|       USE_DAS(@"SEL$2" "test"."t2"@"SEL$2")                                                               |
|       FULL(@"SEL$2" "test"."t3"@"SEL$2")                                                                  |
|       USE_DAS(@"SEL$2" "test"."t3"@"SEL$2")                                                               |
|       OPTIMIZER_FEATURES_ENABLE('4.2.1.0')                                                                |
|       END_OUTLINE_DATA                                                                                    |
|   */                                                                                                      |
| Optimization Info:                                                                                        |
| -------------------------------------                                                                     |
|   t1:                                                                                                     |
|       table_rows:1                                                                                        |
|       physical_range_rows:1                                                                               |
|       logical_range_rows:1                                                                                |
|       index_back_rows:0                                                                                   |
|       output_rows:1                                                                                       |
|       table_dop:1                                                                                         |
|       dop_method:Table DOP                                                                                |
|       avaiable_index_name:[t1]                                                                            |
|       stats version:0                                                                                     |
|       dynamic sampling level:1                                                                            |
|   t2:                                                                                                     |
|       table_rows:1                                                                                        |
|       physical_range_rows:1                                                                               |
|       logical_range_rows:1                                                                                |
|       index_back_rows:0                                                                                   |
|       output_rows:1                                                                                       |
|       table_dop:1                                                                                         |
|       dop_method:DAS DOP                                                                                  |
|       avaiable_index_name:[t2]                                                                            |
|       stats version:0                                                                                     |
|       dynamic sampling level:1                                                                            |
|   t3:                                                                                                     |
|       table_rows:1                                                                                        |
|       physical_range_rows:1                                                                               |
|       logical_range_rows:1                                                                                |
|       index_back_rows:0                                                                                   |
|       output_rows:1                                                                                       |
|       table_dop:1                                                                                         |
|       dop_method:DAS DOP                                                                                  |
|       avaiable_index_name:[t3]                                                                            |
|       stats version:0                                                                                     |
|       dynamic sampling level:1                                                                            |
|   Plan Type:                                                                                              |
|       LOCAL                                                                                               |
|   Note:                                                                                                   |
|       Degree of Parallelisim is 1 because of table property                                               |
+-----------------------------------------------------------------------------------------------------------+
106 rows in set

Merge join

To join two tables, a merge join sorts them by join fields before scanning them for joining. External sorting is required if the memory is insufficient.

A merge join starts by taking a record from each table for matching. If the record meets the join condition, it is put into the result set. Otherwise, the optimizer discards the record that has a smaller join field value and continues by taking the next record from the same table until the matching record is found.

Temporary space is required for merging two tables with multiple join fields. For example, when a merge join is applied for A JOIN B, if A and B both have join fields of multiple records, such as A1, A2… An for A and B1, B2… Bn for B. Then, every record of A must be matched with its counterpart in B. In this case, the pointer needs to move from B1 to Bn many times and read a record each time. So, it is faster if the records from B1 to Bn are fetched to an in-memory temporary table in advance than reading data from its original storage page or disk. In some scenarios, sorting is skipped if an index is applied to join fields in a consistent order.

Generally, merge joins are suitable for ordered tables. Otherwise, hash joins would be better. The following example displays two plans that use a merge join. Sorting is required for the first one but not for the second one. This is because the two k1 indexes are used as the access paths of the two tables, and the k1 indexes are sorted by column b.

obclient> CREATE TABLE t1(a INT PRIMARY KEY, b INT, c INT, KEY k1(b));
Query OK, 0 rows affected  

obclient> CREATE TABLE t2(a INT PRIMARY KEY, b INT, c INT, KEY k1(b));
Query OK, 0 rows affected  

obclient> EXPLAIN SELECT/*+USE_MERGE(t1, t2)*/ * FROM t1, t2 WHERE t1.c = t2.c;
*************************** 1. row ***************************
Query Plan:
| =====================================
|ID|OPERATOR    |NAME|EST. ROWS|COST|
-------------------------------------
|0 |MERGE JOIN  |    |1980     |6011|
|1 | SORT       |    |1000     |2198|
|2 |  TABLE SCAN|t1  |1000     |455 |
|3 | SORT       |    |1000     |2198|
|4 |  TABLE SCAN|t2  |1000     |455 |
=====================================

Outputs & filters:
-------------------------------------
  0 - output([t1.a], [t1.b], [t1.c], [t2.a], [t2.b], [t2.c]), filter(nil),
      equal_conds([t1.c = t2.c]), other_conds(nil)
  1 - output([t1.a], [t1.b], [t1.c]), filter(nil), sort_keys([t1.c, ASC])
  2 - output([t1.c], [t1.a], [t1.b]), filter(nil),
      access([t1.c], [t1.a], [t1.b]), partitions(p0)
  3 - output([t2.a], [t2.b], [t2.c]), filter(nil), sort_keys([t2.c, ASC])
  4 - output([t2.c], [t2.a], [t2.b]), filter(nil),
      access([t2.c], [t2.a], [t2.b]), partitions(p0)


obclient> EXPLAIN SELECT/*+USE_MERGE(t1, t2),INDEX(t1 k1),INDEX(t2 k1)*/ *
        FROM t1, t2 WHERE t1.b = t2.b;
*************************** 1. row ***************************
Query Plan:
| =======================================
|ID|OPERATOR   |NAME  |EST. ROWS|COST |
---------------------------------------
|0 |MERGE JOIN |      |1980     |12748|
|1 | TABLE SCAN|t1(k1)|1000     |5566 |
|2 | TABLE SCAN|t2(k1)|1000     |5566 |
=======================================

Outputs & filters:
-------------------------------------
  0 - output([t1.a], [t1.b], [t1.c], [t2.a], [t2.b], [t2.c]), filter(nil),
      equal_conds([t1.b = t2.b]), other_conds(nil)
  1 - output([t1.b], [t1.a], [t1.c]), filter(nil),
      access([t1.b], [t1.a], [t1.c]), partitions(p0)
  2 - output([t2.b], [t2.a], [t2.c]), filter(nil),
      access([t2.b], [t2.a], [t2.c]), partitions(p0)

OceanBase Database also allows you to use a merge join to join multiple tables by adding the hint /*+ USE_MERGE(table_name_list) */ to the query. In the following example, the system selects the hash join algorithm. The user can add the preceding hint to the query to use a merge join.

obclient> CREATE TABLE t1(c1 INT, c2 INT);
Query OK, 0 rows affected  

obclient> CREATE TABLE t2(c1 INT, c2 INT);
Query OK, 0 rows affected  

obclient> EXPLAIN SELECT * FROM t1,t2 WHERE t1.c1 = t2.c1;
*************************** 1. row ***************************
Query Plan:
| ========================================
|ID|OPERATOR   |NAME|EST. ROWS|COST    |
----------------------------------------
|0 |HASH JOIN  |    |98010000 |66774608|
|1 | TABLE SCAN|T1  |100000   |68478   |
|2 | TABLE SCAN|T2  |100000   |68478   |
========================================

Outputs & filters:
-------------------------------------
  0 - output([T1.C1], [T1.C2], [T2.C1], [T2.C2]), filter(nil),
      equal_conds([T1.C1 = T2.C1]), other_conds(nil)
  1 - output([T1.C1], [T1.C2]), filter(nil),
      access([T1.C1], [T1.C2]), partitions(p0)
  2 - output([T2.C1], [T2.C2]), filter(nil),
      access([T2.C1], [T2.C2]), partitions(p0)

obclient> EXPLAIN SELECT /*+USE_MERGE(t1,t2)*/* FROM t1, t2 WHERE t1.c1 = t2.c1;
*************************** 1. row ***************************
Query Plan:
 | =========================================
|ID|OPERATOR    |NAME|EST. ROWS|COST    |
-----------------------------------------
|0 |MERGE JOIN  |    |98010000 |67488837|
|1 | SORT       |    |100000   |563680  |
|2 |  TABLE SCAN|T1  |100000   |68478   |
|3 | SORT       |    |100000   |563680  |
|4 |  TABLE SCAN|T2  |100000   |68478   |
=========================================

Outputs & filters:
-------------------------------------
  0 - output([T1.C1], [T1.C2], [T2.C1], [T2.C2]), filter(nil),
      equal_conds([T1.C1 = T2.C1]), other_conds(nil)
  1 - output([T1.C1], [T1.C2]), filter(nil), sort_keys([T1.C1, ASC])
  2 - output([T1.C1], [T1.C2]), filter(nil),
      access([T1.C1], [T1.C2]), partitions(p0)
  3 - output([T2.C1], [T2.C2]), filter(nil), sort_keys([T2.C1, ASC])
  4 - output([T2.C1], [T2.C2]), filter(nil),
      access([T2.C1], [T2.C2]), partitions(p0)

Hash join

To join two tables, a hash join uses the smaller table, or “the build table”, to create a hash table based on join conditions. It then scans the larger table, or “the probe table”, line by line to find the matching row by probing the hash table. If the build table is so large that the hash table cannot be stored in memory, OceanBase Database splits the build table and the probe table into multiple partitions by join conditions, each including a standalone pair of build table and probe table. In this way, a large hash join task is divided into several separate subtasks, so that a hash join can be executed in the memory for every partition. In most cases, hash joins have higher efficiency than other joins.

The following example shows a plan that uses a hash join.

obclient> CREATE TABLE t1(a INT PRIMARY KEY, b INT, c INT, KEY k1(b));
Query OK, 0 rows affected  

obclient>CREATE TABLE t2(a INT PRIMARY KEY, b INT, c INT, KEY k1(b));
Query OK, 0 rows affected  

obclient> EXPLAIN SELECT/*+USE_HASH(t1, t2)*/ * FROM t1, t2 WHERE t1.c = t2.c;
*************************** 1. row ***************************
Query Plan:
| ====================================
|ID|OPERATOR   |NAME|EST. ROWS|COST|
------------------------------------
|0 |HASH JOIN  |    |1980     |4093|
|1 | TABLE SCAN|t1  |1000     |455 |
|2 | TABLE SCAN|t2  |1000     |455 |
====================================

Outputs & filters:
-------------------------------------
  0 - output([t1.a], [t1.b], [t1.c], [t2.a], [t2.b], [t2.c]), filter(nil),
      equal_conds([t1.c = t2.c]), other_conds(nil)
  1 - output([t1.c], [t1.a], [t1.b]), filter(nil),
      access([t1.c], [t1.a], [t1.b]), partitions(p0)
  2 - output([t2.c], [t2.a], [t2.b]), filter(nil),
      access([t2.c], [t2.a], [t2.b]), partitions(p0)

OceanBase Database also allows you to use a hash join to join multiple tables by adding the hint /*+ USE_HASH(table_name_list) */ to the query. In the following example, the system selects the merge join algorithm. The user can add the preceding hint to the query to use a hash join.

obclient> CREATE TABLE t1(c1 INT, c2 INT, PRIMARY KEY(c1));
Query OK, 0 rows affected  

obclient> CREATE TABLE t2(c1 INT, c2 INT, PRIMARY KEY(c1));
Query OK, 0 rows affected  

obclient> EXPLAIN SELECT * FROM t1, t2 WHERE t1.c1 = t2.c1;
*************************** 1. row ***************************
Query Plan:
| ======================================
|ID|OPERATOR   |NAME|EST. ROWS|COST  |
--------------------------------------
|0 |MERGE JOIN |    |100001   |219005|
|1 | TABLE SCAN|T1  |100000   |61860 |
|2 | TABLE SCAN|T2  |100000   |61860 |
======================================

Outputs & filters:
-------------------------------------
  0 - output([T1.C1], [T1.C2], [T2.C1], [T2.C2]), filter(nil),
      equal_conds([T1.C1 = T2.C1]), other_conds(nil)
  1 - output([T1.C1], [T1.C2]), filter(nil),
      access([T1.C1], [T1.C2]), partitions(p0)
  2 - output([T2.C1], [T2.C2]), filter(nil),
      access([T2.C1], [T2.C2]), partitions(p0)

obclient> EXPLAIN SELECT /*+USE_HASH(t1, t2)*/ * FROM t1, t2 WHERE t1.c1 = t2.c1;
*************************** 1. row ***************************
Query Plan:
 | ======================================
|ID|OPERATOR   |NAME|EST. ROWS|COST  |
--------------------------------------
|0 |HASH JOIN  |    |100001   |495180|
|1 | TABLE SCAN|T1  |100000   |61860 |
|2 | TABLE SCAN|T2  |100000   |61860 |
======================================

Outputs & filters:
-------------------------------------
  0 - output([T1.C1], [T1.C2], [T2.C1], [T2.C2]), filter(nil),
      equal_conds([T1.C1 = T2.C1]), other_conds(nil)
  1 - output([T1.C1], [T1.C2]), filter(nil),
      access([T1.C1], [T1.C2]), partitions(p0)
  2 - output([T2.C1], [T2.C2]), filter(nil),
      access([T2.C1], [T2.C2]), partitions(p0)

OceanBase Database uses runtime filters to optimize hash joins. For more information, see Runtime filters.

References

Runtime filters