The JOIN operator is used to combine data from two tables based on specific conditions.
Note
The main types of joins (JOIN) include inner join, outer join, and semi/anti join.
JOIN operator types
OceanBase Database supports the following JOIN operators: NESTED-LOOP JOIN, MERGE JOIN, and HASH JOIN.
Operator type |
Feature Description |
Core Principles |
Application scenarios |
|---|---|---|---|
| NESTED-LOOP JOIN | The outer loop iterates through the left table, and the inner loop scans the right table for each row of data in the left table. | Scan one table (the left table). For each record read from the table, search the other table (the right table) for rows that meet the join conditions. |
|
| MERGE JOIN | Sorts the sequences before merging them. | First, sort the two input tables by the join fields. Then, scan the sorted tables in sequence simultaneously, like merging two ordered queues, to find and merge matching rows. |
|
| HASH JOIN | Build a hash table and perform detection. | It selects the smaller of the two tables as the build table, computes the hash value for each row's data based on the join conditions, and builds a hash table in memory. Then it scans the larger probe table, computes the hash value for each row's data, and looks up matching rows in the build table's hash table to complete the join. |
|
NESTED-LOOP JOIN
NESTED-LOOP JOIN (abbreviated as NLJ) indicates a nested-loop join.
You can use the USE_NL hint to force the optimizer to use NLJ. When the table specified by the USE_NL hint serves as the left table for the join, the nested-loop join (NL-JOIN) algorithm is used. For detailed information about the USE_NL hint, see the USE_NL Hint section in Join Operation Hint.
NESTED-LOOP JOIN example
Create table
tbl1.obclient> CREATE TABLE tbl1(col1 INT, col2 INT);Create table
tbl2and define columncol1as the primary key.obclient> CREATE TABLE tbl2(col1 INT PRIMARY KEY, col2 INT);Q1: Specify the
USE_NLhint to useNLJ, query rows wheretbl1.col2 = tbl2.col2, calculatetbl1.col2 + tbl2.col2, and view the execution plan for this query.obclient> EXPLAIN SELECT /*+USE_NL(tbl1, tbl2)*/ tbl1.col2 + tbl2.col2 FROM tbl1, tbl2 WHERE tbl1.col2 = tbl2.col2;The result is as follows:
+------------------------------------------------------------------------+ | Query Plan | +------------------------------------------------------------------------+ | =================================================== | | |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| | | --------------------------------------------------- | | |0 |NESTED-LOOP JOIN | |1 |5 | | | |1 |├─TABLE FULL SCAN |TBL1|1 |3 | | | |2 |└─MATERIAL | |1 |3 | | | |3 | └─TABLE FULL SCAN|TBL2|1 |3 | | | =================================================== | | Outputs & filters: | | ------------------------------------- | | 0 - output([TBL1.COL2 + TBL2.COL2]), filter(nil), rowset=16 | | conds([TBL1.COL2 = TBL2.COL2]), nl_params_(nil), use_batch=false | | 1 - output([TBL1.COL2]), filter(nil), rowset=16 | | access([TBL1.COL2]), partitions(p0) | | is_index_back=false, is_global_index=false, | | range_key([TBL1.__pk_increment]), range(MIN ; MAX)always true | | 2 - output([TBL2.COL2]), filter(nil), rowset=16 | | 3 - output([TBL2.COL2]), filter(nil), rowset=16 | | access([TBL2.COL2]), partitions(p0) | | is_index_back=false, is_global_index=false, | | range_key([TBL2.COL1]), range(MIN ; MAX)always true | +------------------------------------------------------------------------+ 21 rows in setQ2: Specify the
USE_NLhint to useNLJ, query rows wheretbl1.col1 = tbl2.col1, calculatetbl1.col2 + tbl2.col2, and view the execution plan for this query.obclient> EXPLAIN SELECT /*+USE_NL(tbl1, tbl2)*/ tbl1.col2 + tbl2.col2 FROM tbl1, tbl2 WHERE tbl1.col1 = tbl2.col1;The result is as follows:
+---------------------------------------------------------------------+ | Query Plan | +---------------------------------------------------------------------+ | ================================================= | | |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| | | ------------------------------------------------- | | |0 |NESTED-LOOP JOIN | |1 |19 | | | |1 |├─TABLE FULL SCAN|TBL1|1 |3 | | | |2 |└─TABLE GET |TBL2|1 |16 | | | ================================================= | | Outputs & filters: | | ------------------------------------- | | 0 - output([TBL1.COL2 + TBL2.COL2]), filter(nil), rowset=16 | | conds(nil), nl_params_([TBL1.COL1(:0)]), use_batch=true | | 1 - output([TBL1.COL1], [TBL1.COL2]), filter(nil), rowset=16 | | access([TBL1.COL1], [TBL1.COL2]), partitions(p0) | | is_index_back=false, is_global_index=false, | | range_key([TBL1.__pk_increment]), range(MIN ; MAX)always true | | 2 - output([TBL2.COL2]), filter(nil), rowset=16 | | access([GROUP_ID], [TBL2.COL2]), partitions(p0) | | is_index_back=false, is_global_index=false, | | range_key([TBL2.COL1]), range(MIN ; MAX), | | range_cond([:0 = TBL2.COL1]) | +---------------------------------------------------------------------+ 20 rows in setQ3: View the execution plan for the query that calculates
tbl1.col2 + tbl2.col2after performing a Cartesian product join ontbl1andtbl2(where no join conditions exist).obclient> EXPLAIN SELECT tbl1.col2 + tbl2.col2 FROM tbl1, tbl2;The return results are as follows:
+---------------------------------------------------------------------+ | Query Plan | +---------------------------------------------------------------------+ | =========================================================== | | |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| | | ----------------------------------------------------------- | | |0 |NESTED-LOOP JOIN CARTESIAN | |1 |5 | | | |1 |├─TABLE FULL SCAN |TBL1|1 |3 | | | |2 |└─MATERIAL | |1 |3 | | | |3 | └─TABLE FULL SCAN |TBL2|1 |3 | | | =========================================================== | | Outputs & filters: | | ------------------------------------- | | 0 - output([TBL1.COL2 + TBL2.COL2]), filter(nil), rowset=16 | | conds(nil), nl_params_(nil), use_batch=false | | 1 - output([TBL1.COL2]), filter(nil), rowset=16 | | access([TBL1.COL2]), partitions(p0) | | is_index_back=false, is_global_index=false, | | range_key([TBL1.__pk_increment]), range(MIN ; MAX)always true | | 2 - output([TBL2.COL2]), filter(nil), rowset=16 | | 3 - output([TBL2.COL2]), filter(nil), rowset=16 | | access([TBL2.COL2]), partitions(p0) | | is_index_back=false, is_global_index=false, | | range_key([TBL2.COL1]), range(MIN ; MAX)always true | +---------------------------------------------------------------------+ 21 rows in set
In the preceding Q1 and Q2 queries, the USE_NL hint is used to specify the use of the nested-loop join (NL-JOIN) algorithm. In the Q3 query, no join conditions are specified, so operator 0 is represented as a NESTED-LOOP JOIN CARTESIAN. Logically, it is still an NLJ operator, representing an NLJ operator without any join conditions.
In the execution results of Q1, Q2, and Q3 queries, in addition to the NLJ operator (operator 0), the following operators are also displayed:
TABLE FULL SCANandTABLE GET: Both of these operators belong to theTABLE SCANoperator category and are used to show which index (or primary table) the optimizer chooses to access data from. For more information, see TABLE SCAN.MATERIAL: This operator is used to materialize the data output by the lower-level operator. For more information, see MATERIAL.
In the execution plan displays for Q1, Q2, and Q3, outputs & filters provides detailed output information for the NESTED-LOOP JOIN operator as follows:
Information Name |
Meaning |
Example description |
|---|---|---|
| output | The list of columns or expressions output by the operator. | output([TBL1.COL2 + TBL2.COL2])indicates that the result of this join operation is computed and output.TBL1.COL2andTBL2.COL2The sum of . |
| filter | The filter condition (predicate) required by the operator. | filter(nil)Indicates that no rows need to be filtered after the join operation is complete. |
| rowset | Indicates the vectorization size of the current operator. | rowset=16Indicates that the vectorization size of the current operator is 16. |
| conds | The join condition specifies the criteria used to match rows during a join operation. It determines how rows from the two tables are matched. |
|
| nl_params_ | NESTED-LOOP JOINUnique to operators. This parameter represents the join condition parameters passed from the left table (the driving table) to the right table (the explored table), that is, based onNLJThe data in the left table generates the pushed-down parameters. |
|
| use_batch | Specifies whether to enable batch processing mode for the join operation. |
|
MERGE JOIN
MERGE JOIN (abbreviated as MJ) indicates a sorted merge join.
You can use the USE_MERGE hint to force the use of MJ. When the table specified in this hint serves as the right table for the join, the sorted merge join algorithm is used. For detailed information about the USE_MERGE hint, see the USE_MERGE Hint section in Join Operation Hint.
MERGE JOIN Example
Create table
tbl3.obclient> CREATE TABLE tbl3(col1 INT, col2 INT);Create table
tbl4and define columncol1as the primary key.obclient> CREATE TABLE tbl4(col1 INT PRIMARY KEY, col2 INT);Q4: Specify the
USE_MERGEhint to useMJ, query rows wheretbl3.col2 = tbl4.col2andtbl3.col1 + tbl4.col1 > 10, calculatetbl3.col2 + tbl4.col2, and view the execution plan for this query.obclient> EXPLAIN SELECT /*+USE_MERGE(tbl3, tbl4)*/ tbl3.col2 + tbl4.col2 FROM tbl3, tbl4 WHERE tbl3.col2 = tbl4.col2 AND tbl3.col1 + tbl4.col1 > 10;The return result is as follows:
+---------------------------------------------------------------------------------------+ | Query Plan | +---------------------------------------------------------------------------------------+ | =================================================== | | |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| | | --------------------------------------------------- | | |0 |MERGE JOIN | |1 |5 | | | |1 |├─SORT | |1 |3 | | | |2 |│ └─TABLE FULL SCAN|TBL3|1 |3 | | | |3 |└─SORT | |1 |3 | | | |4 | └─TABLE FULL SCAN|TBL4|1 |3 | | | =================================================== | | Outputs & filters: | | ------------------------------------- | | 0 - output([TBL3.COL2 + TBL4.COL2]), filter(nil), rowset=16 | | equal_conds([TBL3.COL2 = TBL4.COL2]), other_conds([TBL3.COL1 + TBL4.COL1 > 10]) | | merge_directions([ASC]) | | 1 - output([TBL3.COL2], [TBL3.COL1]), filter(nil), rowset=16 | | sort_keys([TBL3.COL2, ASC]) | | 2 - output([TBL3.COL2], [TBL3.COL1]), filter(nil), rowset=16 | | access([TBL3.COL2], [TBL3.COL1]), partitions(p0) | | is_index_back=false, is_global_index=false, | | range_key([TBL3.__pk_increment]), range(MIN ; MAX)always true | | 3 - output([TBL4.COL2], [TBL4.COL1]), filter(nil), rowset=16 | | sort_keys([TBL4.COL2, ASC]) | | 4 - output([TBL4.COL1], [TBL4.COL2]), filter(nil), rowset=16 | | access([TBL4.COL1], [TBL4.COL2]), partitions(p0) | | is_index_back=false, is_global_index=false, | | range_key([TBL4.COL1]), range(MIN ; MAX)always true | +---------------------------------------------------------------------------------------+ 26 rows in setQ5: Specify the
USE_MERGEhint to useMJ, query rows wheretbl3.col1 = tbl4.col1, calculatetbl3.col2 + tbl4.col2, and view the execution plan for this query.obclient> EXPLAIN SELECT /*+USE_MERGE(tbl3, tbl4)*/ tbl3.col2 + tbl4.col2 FROM tbl3, tbl4 WHERE tbl3.col1 = tbl4.col1;The return result is as follows:
+---------------------------------------------------------------------+ | Query Plan | +---------------------------------------------------------------------+ | =================================================== | | |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| | | --------------------------------------------------- | | |0 |MERGE JOIN | |1 |5 | | | |1 |├─SORT | |1 |3 | | | |2 |│ └─TABLE FULL SCAN|TBL3|1 |3 | | | |3 |└─TABLE FULL SCAN |TBL4|1 |3 | | | =================================================== | | Outputs & filters: | | ------------------------------------- | | 0 - output([TBL3.COL2 + TBL4.COL2]), filter(nil), rowset=16 | | equal_conds([TBL3.COL1 = TBL4.COL1]), other_conds(nil) | | merge_directions([ASC]) | | 1 - output([TBL3.COL1], [TBL3.COL2]), filter(nil), rowset=16 | | sort_keys([TBL3.COL1, ASC]) | | 2 - output([TBL3.COL1], [TBL3.COL2]), filter(nil), rowset=16 | | access([TBL3.COL1], [TBL3.COL2]), partitions(p0) | | is_index_back=false, is_global_index=false, | | range_key([TBL3.__pk_increment]), range(MIN ; MAX)always true | | 3 - output([TBL4.COL1], [TBL4.COL2]), filter(nil), rowset=16 | | access([TBL4.COL1], [TBL4.COL2]), partitions(p0) | | is_index_back=false, is_global_index=false, | | range_key([TBL4.COL1]), range(MIN ; MAX)always true | +---------------------------------------------------------------------+ 23 rows in set
In the preceding Q4 and Q5 queries, the USE_MERGE hint is used to specify the sorted merge join algorithm. In the return results of Q4 and Q5 queries, in addition to the MJ operator (operator 0), the following operators are also displayed:
SORT: Sorts the input data. For more information, see SORT.TABLE FULL SCAN: This operator belongs to theTABLE SCANoperator family and is used to show which index (or primary table) the optimizer chooses to access data from. For more information, see TABLE SCAN.
In the execution plan display for Q4 and Q5, outputs & filters details the specific output information of the MERGE JOIN operator as follows:
Information Name |
Meaning |
Example description |
|---|---|---|
| output | The list of columns or expressions output by the operator. | output([TBL3.COL2 + TBL4.COL2])Represents the output expression.TBL3.COL2 + TBL4.COL2The result. |
| filter | The filter condition (predicate) required by the operator. | filter(nil)indicates that no rows need to be filtered after the join operation is completed. |
| rowset | Indicates the vectorization size of the current operator. | rowset=16Indicates that the vectorization size of the current operator is 16. |
| equal_conds | In a MERGE JOIN, the equality join conditions must be met, and the result sets of the left and right child nodes must be ordered with respect to the join columns. |
|
| other_conds | Represents other non-equivalence conditions that must be met during the connection process. |
|
| merge_directions | Represents a MERGE JOIN (MERGE JOIN) indicates the sorting (and scanning) direction of the input data on both sides. |
merge_directions([ASC]): Indicates that both sides are sorted and merged in ascending order. |
HASH JOIN
HASH JOIN (abbreviated as HJ) indicates a hash join.
You can use the USE_HASH hint to force the optimizer to use HJ. When the table specified by the USE_HASH hint serves as the right table for the join, the HASH-JOIN algorithm is used. For detailed information about the USE_HASH hint, see the USE_HASH Hint section under Join Operation Hint.
HASH JOIN Example
Create table
tbl5.obclient> CREATE TABLE tbl5(col1 INT, col2 INT);Create table
tbl6and define columncol1as the primary key.obclient> CREATE TABLE tbl6(col1 INT PRIMARY KEY, col2 INT);Q6: Use the
USE_HASHhint to specify theHJmethod, query rows wheretbl5.col1 = tbl6.col1andtbl5.col2 + tbl6.col2 > 1, calculatetbl5.col2 + tbl6.col2, and view the execution plan for this query.obclient> EXPLAIN SELECT /*+USE_HASH(tbl5, tbl6)*/ tbl5.col2 + tbl6.col2 FROM tbl5, tbl6 WHERE tbl5.col1 = tbl6.col1 AND tbl5.col2 + tbl6.col2 > 1;The result is as follows:
+--------------------------------------------------------------------------------------+ | Query Plan | +--------------------------------------------------------------------------------------+ | ================================================= | | |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| | | ------------------------------------------------- | | |0 |HASH JOIN | |1 |5 | | | |1 |├─TABLE FULL SCAN|TBL5|1 |3 | | | |2 |└─TABLE FULL SCAN|TBL6|1 |3 | | | ================================================= | | Outputs & filters: | | ------------------------------------- | | 0 - output([TBL5.COL2 + TBL6.COL2]), filter(nil), rowset=16 | | equal_conds([TBL5.COL1 = TBL6.COL1]), other_conds([TBL5.COL2 + TBL6.COL2 > 1]) | | 1 - output([TBL5.COL1], [TBL5.COL2]), filter(nil), rowset=16 | | access([TBL5.COL1], [TBL5.COL2]), partitions(p0) | | is_index_back=false, is_global_index=false, | | range_key([TBL5.__pk_increment]), range(MIN ; MAX)always true | | 2 - output([TBL6.COL1], [TBL6.COL2]), filter(nil), rowset=16 | | access([TBL6.COL1], [TBL6.COL2]), partitions(p0) | | is_index_back=false, is_global_index=false, | | range_key([TBL6.COL1]), range(MIN ; MAX)always true | +--------------------------------------------------------------------------------------+ 19 rows in set
The USE_HASH hint was used in the Q6 query to specify the algorithm. In the result, in addition to the HASH JOIN operator (operator number 0), the following execution operators are also displayed:
TABLE FULL SCAN: This operator belongs to theTABLE SCANoperator family and is used to show which index (or primary table) the optimizer chooses to access data from. For more information, see TABLE SCAN.
In the Q6 execution plan display, outputs & filters details the specific output information of the HASH JOIN operator as follows:
Information Name |
Meaning |
Example description |
|---|---|---|
| output | The list of columns or expressions output by the operator. | output([TBL5.COL2 + TBL6.COL2])indicates thisHASH JOINAfter the computation is completed, the expressionTBL5.COL2 + TBL6.COL2The result of the operation. |
| filter | The filter condition (predicate) required by the operator. | filter(nil)indicates that no rows need further filtering after the join operation is completed. |
| rowset | Indicates the vectorization size of the current operator. | rowset=16Indicates that the vectorization size of the current operator is 16. |
| equal_conds | Represents the equality join condition for a hash join. The join columns on both sides are used to compute the hash value. | equal_conds([TBL5.COL1 = TBL6.COL1])indicates that the hash join will be performed usingTBL5.COL1 = TBL6.COL1equates the rows of two tables. |
| other_conds | Represents additional join conditions that filter the result set after the hash join completes matching. | other_conds([TBL5 Colbert + TBL6 Colbert > 1]) indicates additional filtering conditions beyond those specified by equal_conds for an equijoin. It requires that after joining, the sum of the values in the col1 field of the two tables must be greater than 1. |
