OceanBase Database supports three types of join algorithms: nested loop join, hash join, and merge join.
Hash join and merge join are applicable only to equi-join conditions, whereas nested loop join is applicable to any join condition.
Nested Loop Join
The Nested Loop Join algorithm scans one table (outer table) and, for each record in the outer table, scans another table (inner table) to find the matching data.
The "scan" in the algorithm can be either an index-based scan or a full table scan. Typically, a full table scan is inefficient, so the optimizer usually does not choose the Nested Loop Join algorithm if there is no index on the columns used in the join condition. In OceanBase Database, the execution plan indicates whether an index-based scan is used.
For example, in the first plan, the inner table is scanned using a full table scan because the join condition is t1.c = t2.c, and the t2 table does not have an index on the c column. In the second plan, the inner table is scanned using an index-based scan because the join condition is t1.b = t2.b, and the t2 table has an index k1 on the b column. This allows the t2 table to quickly find matching rows for each b value in the t1 table.
Create the
t1table.obclient> CREATE TABLE t1(a INT PRIMARY KEY, b INT, c INT, KEY k1(b));Create the
t2table.obclient> CREATE TABLE t2(a INT PRIMARY KEY, b INT, c INT, KEY k1(b));View the execution plan, where the join condition is
t1.c = t2.c.obclient> EXPLAIN EXTENDED_NOADDR SELECT/*+USE_NL(t1, t2)*/ * FROM t1, t2 WHERE t1.c = t2.c;The result is as follows:
+--------------------------------------------------------------------------------------+ | Query Plan | +--------------------------------------------------------------------------------------+ | =================================================== | | |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| | | --------------------------------------------------- | | |0 |NESTED-LOOP JOIN | |1 |5 | | | |1 |├─TABLE FULL SCAN |t1 |1 |3 | | | |2 |└─MATERIAL | |1 |3 | | | |3 | └─TABLE FULL SCAN|t2 |1 |3 | | | =================================================== | | Outputs & filters: | | ------------------------------------- | | 0 - output([t1.a], [t1.b], [t1.c], [t2.a], [t2.b], [t2.c]), filter(nil), rowset=16 | | conds([t1.c = t2.c]), nl_params_(nil), use_batch=false | | 1 - output([t1.a], [t1.c], [t1.b]), filter(nil), rowset=16 | | access([t1.a], [t1.c], [t1.b]), partitions(p0) | | is_index_back=false, is_global_index=false, | | range_key([t1.a]), range(MIN ; MAX)always true | | 2 - output([t2.a], [t2.b], [t2.c]), filter(nil), rowset=16 | | 3 - output([t2.a], [t2.c], [t2.b]), filter(nil), rowset=16 | | access([t2.a], [t2.c], [t2.b]), partitions(p0) | | is_index_back=false, is_global_index=false, | | range_key([t2.a]), range(MIN ; MAX)always true | | Used Hint: | | ------------------------------------- | | /*+ | | | | USE_NL("t2") | | */ | | Qb name trace: | | ------------------------------------- | | stmt_id:0, stmt_type:T_EXPLAIN | | stmt_id:1, SEL$1 | | Outline Data: | | ------------------------------------- | | /*+ | | BEGIN_OUTLINE_DATA | | LEADING(@"SEL$1" ("test_db"."t1"@"SEL$1" "test_db"."t2"@"SEL$1")) | | USE_NL(@"SEL$1" "test_db"."t2"@"SEL$1") | | USE_NL_MATERIALIZATION(@"SEL$1" "test_db"."t2"@"SEL$1") | | FULL(@"SEL$1" "test_db"."t1"@"SEL$1") | | FULL(@"SEL$1" "test_db"."t2"@"SEL$1") | | OPTIMIZER_FEATURES_ENABLE('4.6.0.0') | | END_OUTLINE_DATA | | */ | | Optimization Info: | | ------------------------------------- | | t1: | | table_rows:1 | | physical_range_rows:1 | | logical_range_rows:1 | | output_rows:1 | | table_dop:1 | | dop_method:Table DOP | | avaiable_index_name:[k1, t1] | | pruned_index_name:[k1] | | stats info:[version=1970-01-01 08:00:00.000000, is_locked=0, is_expired=0] | | dynamic sampling level:1 | | estimation method:[STORAGE, DYNAMIC SAMPLING FULL] | | t2: | | table_rows:1 | | physical_range_rows:1 | | logical_range_rows:1 | | output_rows:1 | | table_dop:1 | | dop_method:Table DOP | | avaiable_index_name:[k1, t2] | | pruned_index_name:[k1] | | stats info:[version=1970-01-01 08:00:00.000000, is_locked=0, is_expired=0] | | dynamic sampling level:1 | | estimation method:[STORAGE, DYNAMIC SAMPLING FULL] | | Plan Type: | | LOCAL | | Note: | | Degree of Parallelisim is 1 because of table property | +--------------------------------------------------------------------------------------+ 73 rows in setView the execution plan, where the join condition is
t1.b = t2.b.obclient> EXPLAIN EXTENDED_NOADDR SELECT/*+USE_NL(t1, t2)*/ * FROM t1, t2 WHERE t1.b = t2.b;The result is as follows:
+--------------------------------------------------------------------------------------+ | Query Plan | +--------------------------------------------------------------------------------------+ | ==================================================== | | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | | ---------------------------------------------------- | | |0 |NESTED-LOOP JOIN | |1 |21 | | | |1 |├─TABLE FULL SCAN |t1 |1 |3 | | | |2 |└─TABLE RANGE SCAN|t2(k1)|1 |18 | | | ==================================================== | | Outputs & filters: | | ------------------------------------- | | 0 - output([t1.a], [t1.b], [t1.c], [t2.a], [t2.b], [t2.c]), filter(nil), rowset=16 | | conds(nil), nl_params_([t1.b(:0)]), use_batch=true | | 1 - output([t1.a], [t1.b], [t1.c]), filter(nil), rowset=16 | | access([t1.a], [t1.b], [t1.c]), partitions(p0) | | is_index_back=false, is_global_index=false, | | range_key([t1.a]), range(MIN ; MAX)always true | | 2 - output([t2.a], [t2.b], [t2.c]), filter(nil), rowset=16 | | access([GROUP_ID], [t2.a], [t2.b], [t2.c]), partitions(p0) | | is_index_back=true, is_global_index=false, | | range_key([t2.b], [t2.a]), range(MIN ; MAX), | | range_cond([:0 = t2.b]), is_fast_range = true | | Used Hint: | | ------------------------------------- | | /*+ | | | | USE_NL("t2") | | */ | | Qb name trace: | | ------------------------------------- | | stmt_id:0, stmt_type:T_EXPLAIN | | stmt_id:1, SEL$1 | | Outline Data: | | ------------------------------------- | | /*+ | | BEGIN_OUTLINE_DATA | | LEADING(@"SEL$1" ("test_db"."t1"@"SEL$1" "test_db"."t2"@"SEL$1")) | | USE_NL(@"SEL$1" "test_db"."t2"@"SEL$1") | | FULL(@"SEL$1" "test_db"."t1"@"SEL$1") | | INDEX(@"SEL$1" "test_db"."t2"@"SEL$1" "k1") | | OPTIMIZER_FEATURES_ENABLE('4.6.0.0') | | END_OUTLINE_DATA | | */ | | Optimization Info: | | ------------------------------------- | | t1: | | table_rows:1 | | physical_range_rows:1 | | logical_range_rows:1 | | output_rows:1 | | table_dop:1 | | dop_method:Table DOP | | avaiable_index_name:[k1, t1] | | stats info:[version=1970-01-01 08:00:00.000000, is_locked=0, is_expired=0] | | dynamic sampling level:1 | | estimation method:[STORAGE, DYNAMIC SAMPLING FULL] | | t2: | | table_rows:1 | | physical_range_rows:1 | | logical_range_rows:1 | | index_back_rows:1 | | output_rows:1 | | table_dop:1 | | dop_method:Table DOP | | avaiable_index_name:[k1, t2] | | unstable_index_name:[t2] | | stats info:[version=1970-01-01 08:00:00.000000, is_locked=0, is_expired=0] | | dynamic sampling level:1 | | estimation method:[DYNAMIC SAMPLING BASIC] | | Plan Type: | | LOCAL | | Note: | | Degree of Parallelisim is 1 because of table property | +--------------------------------------------------------------------------------------+ 71 rows in set
The Nested Loop Join algorithm may perform multiple full table scans on the inner table because each scan requires iterating through the storage layer, which is relatively costly. Therefore, OceanBase Database supports scanning the inner table once and materializing the results in memory. This way, the next time a scan is needed, the data can be directly accessed from memory without repeatedly scanning the storage layer. However, materializing in memory has its own cost, so the OceanBase Database optimizer decides whether to materialize the inner table based on the cost.
A variation of the Nested Loop Join algorithm is the Blocked Nested Loop Join. This algorithm reads a block size of rows from the outer table at a time and then scans the inner table to find the matching data. This reduces the number of times the inner table is scanned.
The Nested Loop Join algorithm is typically used when the outer table has a small number of rows and the inner table has an index on the join condition column. This allows each row in the inner table to be quickly located using the index.
OceanBase Database also provides a hint /*+ USE_NL(table_name_list) */ to control the selection of the Nested Loop Join algorithm for multi-table joins. For example, if the join algorithm is Hash Join but you want to use the Nested Loop Join algorithm, you can use this hint.
Create the
t3table.obclient> CREATE TABLE t3(c1 INT, c2 INT);Create the
t4table.obclient> CREATE TABLE t4(c1 INT, c2 INT);View the execution plan, where the join condition is
t3.c1 = t4.c1.obclient> EXPLAIN SELECT * FROM t3, t4 WHERE t3.c1 = t4.c1;The result is as follows:
+--------------------------------------------------------------------------+ | Query Plan | +--------------------------------------------------------------------------+ | ================================================= | | |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| | | ------------------------------------------------- | | |0 |HASH JOIN | |1 |5 | | | |1 |├─TABLE FULL SCAN|t3 |1 |3 | | | |2 |└─TABLE FULL SCAN|t4 |1 |3 | | | ================================================= | | Outputs & filters: | | ------------------------------------- | | 0 - output([t3.c1], [t3.c2], [t4.c1], [t4.c2]), filter(nil), rowset=16 | | equal_conds([t3.c1 = t4.c1]), other_conds(nil) | | 1 - output([t3.c1], [t3.c2]), filter(nil), rowset=16 | | access([t3.c1], [t3.c2]), partitions(p0) | | is_index_back=false, is_global_index=false, | | range_key([t3.__pk_increment]), range(MIN ; MAX)always true | | 2 - output([t4.c1], [t4.c2]), filter(nil), rowset=16 | | access([t4.c1], [t4.c2]), partitions(p0) | | is_index_back=false, is_global_index=false, | | range_key([t4.__pk_increment]), range(MIN ; MAX)always true | +--------------------------------------------------------------------------+ 19 rows in setUse
/*+USE_NL(t3, t4)*/to view the execution plan, where the join condition ist3.c1 = t4.c1.obclient> EXPLAIN SELECT /*+USE_NL(t3, t4)*/* FROM t3, t4 WHERE t3.c1 = t4.c1;The result is as follows:
+--------------------------------------------------------------------------+ | Query Plan | +--------------------------------------------------------------------------+ | =================================================== | | |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| | | --------------------------------------------------- | | |0 |NESTED-LOOP JOIN | |1 |5 | | | |1 |├─TABLE FULL SCAN |t3 |1 |3 | | | |2 |└─MATERIAL | |1 |3 | | | |3 | └─TABLE FULL SCAN|t4 |1 |3 | | | =================================================== | | Outputs & filters: | | ------------------------------------- | | 0 - output([t3.c1], [t3.c2], [t4.c1], [t4.c2]), filter(nil), rowset=16 | | conds([t3.c1 = t4.c1]), nl_params_(nil), use_batch=false | | 1 - output([t3.c1], [t3.c2]), filter(nil), rowset=16 | | access([t3.c1], [t3.c2]), partitions(p0) | | is_index_back=false, is_global_index=false, | | range_key([t3.__pk_increment]), range(MIN ; MAX)always true | | 2 - output([t4.c1], [t4.c2]), filter(nil), rowset=16 | | 3 - output([t4.c1], [t4.c2]), filter(nil), rowset=16 | | access([t4.c1], [t4.c2]), partitions(p0) | | is_index_back=false, is_global_index=false, | | range_key([t4.__pk_increment]), range(MIN ; MAX)always true | +--------------------------------------------------------------------------+ 21 rows in set
The Nested Loop Join algorithm has two other implementation algorithms:
Batch Nested Loop Join
In OceanBase Database, the Blocked Nested Loop Join algorithm is implemented as Batch Nested Loop Join. This involves reading a batch of data rows (default is 1000 rows) from the outer table and then scanning the inner table to find the matching data. This reduces the number of times the inner table is scanned and the number of inner loops.
In the following example, the
batch_join=truefield indicates that the Batch Nested Loop Join algorithm is used.Create the
t5table.obclient> CREATE TABLE t5(c1 INT PRIMARY KEY);Create the
t6table.obclient> CREATE TABLE t6(c1 INT PRIMARY KEY);View the execution plan.
obclient> EXPLAIN EXTENDED_NOADDR SELECT /*+USE_NL(t5,t6)*/* FROM t5, t6 WHERE t5.c1=t6.c1;The result is as follows:
+----------------------------------------------------------------------------------+ | Query Plan | +----------------------------------------------------------------------------------+ | ================================================= | | |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| | | ------------------------------------------------- | | |0 |NESTED-LOOP JOIN | |1 |19 | | | |1 |├─TABLE FULL SCAN|t5 |1 |3 | | | |2 |└─TABLE GET |t6 |1 |16 | | | ================================================= | | Outputs & filters: | | ------------------------------------- | | 0 - output([t5.c1], [t6.c1]), filter(nil), rowset=16 | | conds(nil), nl_params_([t5.c1(:0)]), use_batch=true | | 1 - output([t5.c1]), filter(nil), rowset=16 | | access([t5.c1]), partitions(p0) | | is_index_back=false, is_global_index=false, | | range_key([t5.c1]), range(MIN ; MAX)always true | | 2 - output([t6.c1]), filter(nil), rowset=16 | | access([GROUP_ID], [t6.c1]), partitions(p0) | | is_index_back=false, is_global_index=false, | | range_key([t6.c1]), range(MIN ; MAX), | | range_cond([:0 = t6.c1]), is_fast_range = true | | Used Hint: | | ------------------------------------- | | /*+ | | | | USE_NL("t6") | | */ | | Qb name trace: | | ------------------------------------- | | stmt_id:0, stmt_type:T_EXPLAIN | | stmt_id:1, SEL$1 | | Outline Data: | | ------------------------------------- | | /*+ | | BEGIN_OUTLINE_DATA | | LEADING(@"SEL$1" ("test_db"."t5"@"SEL$1" "test_db"."t6"@"SEL$1")) | | USE_NL(@"SEL$1" "test_db"."t6"@"SEL$1") | | FULL(@"SEL$1" "test_db"."t5"@"SEL$1") | | INDEX(@"SEL$1" "test_db"."t6"@"SEL$1" "primary") | | OPTIMIZER_FEATURES_ENABLE('4.6.0.0') | | END_OUTLINE_DATA | | */ | | Optimization Info: | | ------------------------------------- | | t5: | | table_rows:1 | | physical_range_rows:1 | | logical_range_rows:1 | | output_rows:1 | | table_dop:1 | | dop_method:Table DOP | | avaiable_index_name:[t5] | | stats info:[version=1970-01-01 08:00:00.000000, is_locked=0, is_expired=0] | | dynamic sampling level:1 | | estimation method:[STORAGE, DYNAMIC SAMPLING FULL] | | t6: | | table_rows:1 | | physical_range_rows:1 | | logical_range_rows:1 | | output_rows:1 | | table_dop:1 | | dop_method:Table DOP | | avaiable_index_name:[t6] | | stats info:[version=1970-01-01 08:00:00.000000, is_locked=0, is_expired=0] | | dynamic sampling level:1 | | estimation method:[DEFAULT] | | Plan Type: | | LOCAL | | Note: | | Degree of Parallelisim is 1 because of table property | +----------------------------------------------------------------------------------+ 69 rows in set
Index Nested Loop Join
The Index Nested Loop Join algorithm is based on indexes for joining. It directly matches the outer table's matching conditions with the inner table's indexes, avoiding comparisons with every record in the inner table. This reduces the number of matches needed for the inner table.
In the following example, if the join condition is
t7.c1 = t8.c1, the Index Nested Loop Join algorithm is used if there is an index on thec1column in either thet7ort8table.Create the
t7table.obclient> CREATE TABLE t7(c1 INT PRIMARY KEY);Create the
t8table.obclient> CREATE TABLE t8(c1 INT ,c2 INT);View the execution plan.
obclient> EXPLAIN SELECT /*+ORDERED USE_NL(t8,t7)*/ * FROM t8, (SELECT /*+NO_MERGE*/ * FROM t7) t7 WHERE t7.c1 = t8.c1 AND t8.c2 = 1;The result is as follows:
+------------------------------------------------------------------------------------+ | Query Plan | +------------------------------------------------------------------------------------+ | ========================================================= | | |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| | | --------------------------------------------------------- | | |0 |NESTED-LOOP JOIN | |1 |19 | | | |1 |├─TABLE FULL SCAN |t8 |1 |3 | | | |2 |└─SUBPLAN SCAN |t7 |1 |16 | | | |3 | └─DISTRIBUTED TABLE GET|t7 |1 |16 | | | ========================================================= | | Outputs & filters: | | ------------------------------------- | | 0 - output([t8.c1], [t8.c2], [t7.c1]), filter(nil), rowset=16 | | conds(nil), nl_params_([t8.c1(:0)]), use_batch=true | | 1 - output([t8.c1], [t8.c2]), filter([t8.c2 = 1]), rowset=16 | | access([t8.c1], [t8.c2]), partitions(p0) | | is_index_back=false, is_global_index=false, filter_before_indexback[false], | | range_key([t8.__pk_increment]), range(MIN ; MAX)always true | | 2 - output([t7.c1]), filter(nil), rowset=16 | | access([t7.c1]) | | 3 - output([t7.c1]), filter(nil), rowset=16 | | access([GROUP_ID], [t7.c1]), partitions(p0) | | is_index_back=false, is_global_index=false, | | range_key([t7.c1]), range(MIN ; MAX)always true, | | range_cond([t7.c1 = :0]) | +------------------------------------------------------------------------------------+ 23 rows in set
In the
outputs & filtersoutput, if thenl_paramparameter contains[t2.c1], it indicates that the condition pushdown optimization is performed. For more information, see JOIN.Generally, during query optimization, the OceanBase Database optimizer prioritizes the Index Nested Loop Join algorithm and then checks if the Batch Nested Loop Join algorithm can be used. These two optimization methods can be used together, and the Nested Loop Join algorithm is selected only if necessary.
DAS group rescan
In OceanBase Database V4.1.0, the DAS group rescan optimization applies to single-level nested loop joins (NLJs). This optimization batches the left side of the NLJ to reduce the number of scans on the right side. The optimization is enabled by default and can be disabled using the internal system variable _NLJ_BATCHING_ENABLED (which must be enclosed in double quotes in Oracle mode). The syntax is as follows:
/* MySQL mode */
SET _NLJ_BATCHING_ENABLED=false;
/* Oracle mode */
SET "_NLJ_BATCHING_ENABLED"=false;
In the execution plan of an NLJ, the use_batch=true indicator at the 4th operator (4 - output) indicates that the DAS group rescan optimization is enabled for the NLJ operator. Example:
Create table
t9.obclient> CREATE TABLE t9 (a INT, b INT, c INT, PRIMARY KEY(a, b));Create table
t10.obclient> CREATE TABLE t10 (a INT, b INT, c INT, PRIMARY KEY(a, b));Create table
t11.obclient> CREATE TABLE t11 (a INT, b INT , c INT, PRIMARY KEY(b, c));View the execution plan.
obclient> EXPLAIN EXTENDED_NOADDR SELECT /*+no_rewrite leading(t9 v) use_nl(t9 v)*/ count(*), sum(t9.a+t9.b+t9.c+v.a+v.b+v.c) FROM t9, (SELECT /*+no_rewrite leading(t10 t11) use_nl(t10 t11)*/ t10.a, t10.b, t11.c FROM t10, t11 WHERE t10.c = t11.b) v WHERE t9.a = v.c AND t9.c = v.a;The returned result is as follows:
+-----------------------------------------------------------------------------------------------------------+ | Query Plan | +-----------------------------------------------------------------------------------------------------------+ | ==================================================================== | | |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| | | -------------------------------------------------------------------- | | |0 |SCALAR GROUP BY | |1 |21 | | | |1 |└─NESTED-LOOP JOIN | |1 |21 | | | |2 | ├─TABLE FULL SCAN |t9 |1 |3 | | | |3 | └─SUBPLAN SCAN |v |1 |19 | | | |4 | └─NESTED-LOOP JOIN | |1 |19 | | | |5 | ├─DISTRIBUTED TABLE RANGE SCAN|t10 |1 |18 | | | |6 | └─DISTRIBUTED TABLE GET |t11 |1 |16 | | | ==================================================================== | | Outputs & filters: | | ------------------------------------- | | 0 - output([T_FUN_COUNT(*)], [T_FUN_SUM(t9.a + t9.b + t9.c + v.a + v.b + v.c)]), filter(nil), rowset=16 | | group(nil), agg_func([T_FUN_COUNT(*)], [T_FUN_SUM(t9.a + t9.b + t9.c + v.a + v.b + v.c)]) | | 1 - output([t9.a], [t9.c], [t9.b], [v.c], [v.a], [v.b]), filter(nil), rowset=16 | | conds(nil), nl_params_([t9.a(:1)], [t9.c(:2)]), use_batch=false | | 2 - output([t9.a], [t9.b], [t9.c]), filter(nil), rowset=16 | | access([t9.a], [t9.b], [t9.c]), partitions(p0) | | is_index_back=false, is_global_index=false, | | range_key([t9.a], [t9.b]), range(MIN,MIN ; MAX,MAX)always true | | 3 - output([v.c], [v.a], [v.b]), filter(nil), rowset=16 | | access([v.c], [v.a], [v.b]) | | 4 - output([t10.a], [t10.b], [t11.c]), filter(nil), rowset=16 | | conds(nil), nl_params_([t10.c(:3)]), use_batch=true | | 5 - output([t10.a], [t10.b], [t10.c]), filter(nil), rowset=16 | | access([t10.a], [t10.b], [t10.c]), partitions(p0) | | is_index_back=false, is_global_index=false, | | range_key([t10.a], [t10.b]), range(MIN,MIN ; MAX,MAX)always true, | | range_cond([:2 = t10.a]), is_fast_range = true | | 6 - output([t11.c]), filter(nil), rowset=16 | | access([GROUP_ID], [t11.c]), partitions(p0) | | is_index_back=false, is_global_index=false, | | range_key([t11.b], [t11.c]), range(MIN ; MAX), | | range_cond([:1 = t11.c], [:3 = t11.b]), is_fast_range = true | | Used Hint: | | ------------------------------------- | | /*+ | | | | LEADING(("t9" "v")) | | USE_NL("v") | | LEADING(("t10" "t11")) | | USE_NL("t11") | | NO_REWRITE | | NO_REWRITE | | */ | | Qb name trace: | | ------------------------------------- | | stmt_id:0, stmt_type:T_EXPLAIN | | stmt_id:1, SEL$1 | | stmt_id:2, SEL$2 | | Outline Data: | | ------------------------------------- | | /*+ | | BEGIN_OUTLINE_DATA | | LEADING(@"SEL$1" ("test_db"."t9"@"SEL$1" "v"@"SEL$1")) | | USE_NL(@"SEL$1" "v"@"SEL$1") | | FULL(@"SEL$1" "test_db"."t9"@"SEL$1") | | LEADING(@"SEL$2" ("test_db"."t10"@"SEL$2" "test_db"."t11"@"SEL$2")) | | USE_NL(@"SEL$2" "test_db"."t11"@"SEL$2") | | INDEX(@"SEL$2" "test_db"."t10"@"SEL$2" "primary") | | USE_DAS(@"SEL$2" "test_db"."t10"@"SEL$2") | | INDEX(@"SEL$2" "test_db"."t11"@"SEL$2" "primary") | | USE_DAS(@"SEL$2" "test_db"."t11"@"SEL$2") | | OPTIMIZER_FEATURES_ENABLE('4.6.0.0') | | END_OUTLINE_DATA | | */ | | Optimization Info: | | ------------------------------------- | | t9: | | table_rows:1 | | physical_range_rows:1 | | logical_range_rows:1 | | output_rows:1 | | table_dop:1 | | dop_method:Table DOP | | avaiable_index_name:[t9] | | stats info:[version=1970-01-01 08:00:00.000000, is_locked=0, is_expired=0] | | dynamic sampling level:1 | | estimation method:[STORAGE, DYNAMIC SAMPLING FULL] | | t10: | | table_rows:1 | | physical_range_rows:1 | | logical_range_rows:1 | | output_rows:1 | | table_dop:1 | | dop_method:DAS DOP | | avaiable_index_name:[t10] | | stats info:[version=1970-01-01 08:00:00.000000, is_locked=0, is_expired=0] | | dynamic sampling level:1 | | estimation method:[STORAGE, DYNAMIC SAMPLING BASIC] | | t11: | | table_rows:1 | | physical_range_rows:1 | | logical_range_rows:1 | | output_rows:1 | | table_dop:1 | | dop_method:DAS DOP | | avaiable_index_name:[t11] | | stats info:[version=1970-01-01 08:00:00.000000, is_locked=0, is_expired=0] | | dynamic sampling level:1 | | estimation method:[DEFAULT] | | Plan Type: | | LOCAL | | Note: | | Degree of Parallelisim is 1 because of table property | +-----------------------------------------------------------------------------------------------------------+ 106 rows in set
Merge Join
The principle of Merge Join is to first sort the two tables based on the join fields (if memory space is insufficient, external sorting is required), and then scan the two tables to perform the join.
The merging process starts by taking one record from each table and matching them. If the records meet the join condition, they are added to the result set. If not, the record with the smaller value in the join field is discarded, and the next record from the corresponding table is fetched for further matching, continuing this process until the entire cycle is completed.
When merging two tables with a many-to-many relationship, temporary space is typically used. For example, when A Join B uses Merge Join, if a group of values in the join field exists as multiple records A1, A2, ..., An in table A and multiple records B1, B2, ..., Bn in table B, each record A1, A2, ..., An in table A must be matched with all corresponding equal records B1, B2, ..., Bn in table B. This means the pointer needs to move from B1 to Bn multiple times, reading the corresponding records B1 to Bn each time. Pre-reading and storing the records B1 to Bn in a temporary memory table is faster than reading them from the original data pages or disk. In some scenarios, if there are available indexes on the join fields and the sorting is consistent, the sorting operation can be skipped.
Generally, Merge Join is more suitable when the two input tables are already sorted. Otherwise, Hash Join might be more appropriate. The following example demonstrates two Merge Join plans: the first one requires sorting, while the second one does not (because both tables use the k1 index access path, which is already sorted by the b column).
Create table
t12.obclient> CREATE TABLE t12(a INT PRIMARY KEY, b INT, c INT, KEY k1(b));Create table
t13.obclient> CREATE TABLE t13(a INT PRIMARY KEY, b INT, c INT, KEY k1(b));Use
/*+USE_MERGE(t12, t13)*/to view the execution plan.obclient> EXPLAIN SELECT /*+USE_MERGE(t12, t13)*/ * FROM t12, t13 WHERE t12.c = t13.c;The returned result is as follows:
+--------------------------------------------------------------------------------------------+ | Query Plan | +--------------------------------------------------------------------------------------------+ | =================================================== | | |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| | | --------------------------------------------------- | | |0 |MERGE JOIN | |1 |5 | | | |1 |├─SORT | |1 |3 | | | |2 |│ └─TABLE FULL SCAN|t12 |1 |3 | | | |3 |└─SORT | |1 |3 | | | |4 | └─TABLE FULL SCAN|t13 |1 |3 | | | =================================================== | | Outputs & filters: | | ------------------------------------- | | 0 - output([t12.a], [t12.b], [t12.c], [t13.a], [t13.b], [t13.c]), filter(nil), rowset=16 | | equal_conds([t12.c = t13.c]), other_conds(nil) | | merge_directions([ASC]) | | 1 - output([t12.a], [t12.b], [t12.c]), filter(nil), rowset=16 | | sort_keys([t12.c, ASC]) | | 2 - output([t12.a], [t12.c], [t12.b]), filter(nil), rowset=16 | | access([t12.a], [t12.c], [t12.b]), partitions(p0) | | is_index_back=false, is_global_index=false, | | range_key([t12.a]), range(MIN ; MAX)always true | | 3 - output([t13.a], [t13.b], [t13.c]), filter(nil), rowset=16 | | sort_keys([t13.c, ASC]) | | 4 - output([t13.a], [t13.c], [t13.b]), filter(nil), rowset=16 | | access([t13.a], [t13.c], [t13.b]), partitions(p0) | | is_index_back=false, is_global_index=false, | | range_key([t13.a]), range(MIN ; MAX)always true | +--------------------------------------------------------------------------------------------+ 26 rows in setUse
/*+USE_MERGE(t12, t13),INDEX(t12 k1),INDEX(t13 k1)*/to view the execution plan.obclient> EXPLAIN SELECT /*+USE_MERGE(t12, t13),INDEX(t12 k1),INDEX(t13 k1)*/ * FROM t12, t13 WHERE t12.b = t13.b;The returned result is as follows:
+--------------------------------------------------------------------------------------------+ | Query Plan | +--------------------------------------------------------------------------------------------+ | ==================================================== | | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | | ---------------------------------------------------- | | |0 |MERGE JOIN | |1 |14 | | | |1 |├─TABLE FULL SCAN|t12(k1)|1 |7 | | | |2 |└─TABLE FULL SCAN|t13(k1)|1 |7 | | | ==================================================== | | Outputs & filters: | | ------------------------------------- | | 0 - output([t12.a], [t12.b], [t12.c], [t13.a], [t13.b], [t13.c]), filter(nil), rowset=16 | | equal_conds([t12.b = t13.b]), other_conds(nil) | | merge_directions([ASC]) | | 1 - output([t12.a], [t12.b], [t12.c]), filter(nil), rowset=16 | | access([t12.a], [t12.b], [t12.c]), partitions(p0) | | is_index_back=true, is_global_index=false, | | range_key([t12.b], [t12.a]), range(MIN,MIN ; MAX,MAX)always true | | 2 - output([t13.a], [t13.b], [t13.c]), filter(nil), rowset=16 | | access([t13.a], [t13.b], [t13.c]), partitions(p0) | | is_index_back=true, is_global_index=false, | | range_key([t13.b], [t13.a]), range(MIN,MIN ; MAX,MAX)always true | +--------------------------------------------------------------------------------------------+ 20 rows in set
OceanBase Database also provides the /*+ USE_MERGE(table_name_list) */ hint to control the selection of the Merge Join algorithm when joining multiple tables. For example, if the join algorithm is Hash Join but the user prefers Merge Join, the hint can be used to specify this.
Create table
t14.obclient> CREATE TABLE t14(c1 INT, c2 INT);Create table
t15.obclient> CREATE TABLE t15(c1 INT, c2 INT);View the execution plan with the join condition
t14.c1 = t15.c1.obclient> EXPLAIN SELECT * FROM t14, t15 WHERE t14.c1 = t15.c1;The returned result is as follows:
+------------------------------------------------------------------------------+ | Query Plan | +------------------------------------------------------------------------------+ | ================================================= | | |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| | | ------------------------------------------------- | | |0 |HASH JOIN | |1 |5 | | | |1 |├─TABLE FULL SCAN|t14 |1 |3 | | | |2 |└─TABLE FULL SCAN|t15 |1 |3 | | | ================================================= | | Outputs & filters: | | ------------------------------------- | | 0 - output([t14.c1], [t14.c2], [t15.c1], [t15.c2]), filter(nil), rowset=16 | | equal_conds([t14.c1 = t15.c1]), other_conds(nil) | | 1 - output([t14.c1], [t14.c2]), filter(nil), rowset=16 | | access([t14.c1], [t14.c2]), partitions(p0) | | is_index_back=false, is_global_index=false, | | range_key([t14.__pk_increment]), range(MIN ; MAX)always true | | 2 - output([t15.c1], [t15.c2]), filter(nil), rowset=16 | | access([t15.c1], [t15.c2]), partitions(p0) | | is_index_back=false, is_global_index=false, | | range_key([t15.__pk_increment]), range(MIN ; MAX)always true | +------------------------------------------------------------------------------+ 19 rows in setUse
/*+USE_MERGE(t14,t15)*/to view the execution plan with the join conditiont14.c1 = t15.c1.obclient> EXPLAIN SELECT /*+USE_MERGE(t14,t15)*/* FROM t14, t15 WHERE t14.c1 = t15.c1;The returned result is as follows:
+------------------------------------------------------------------------------+ | Query Plan | +------------------------------------------------------------------------------+ | =================================================== | | |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| | | --------------------------------------------------- | | |0 |MERGE JOIN | |1 |5 | | | |1 |├─SORT | |1 |3 | | | |2 |│ └─TABLE FULL SCAN|t14 |1 |3 | | | |3 |└─SORT | |1 |3 | | | |4 | └─TABLE FULL SCAN|t15 |1 |3 | | | =================================================== | | Outputs & filters: | | ------------------------------------- | | 0 - output([t14.c1], [t14.c2], [t15.c1], [t15.c2]), filter(nil), rowset=16 | | equal_conds([t14.c1 = t15.c1]), other_conds(nil) | | merge_directions([ASC]) | | 1 - output([t14.c1], [t14.c2]), filter(nil), rowset=16 | | sort_keys([t14.c1, ASC]) | | 2 - output([t14.c1], [t14.c2]), filter(nil), rowset=16 | | access([t14.c1], [t14.c2]), partitions(p0) | | is_index_back=false, is_global_index=false, | | range_key([t14.__pk_increment]), range(MIN ; MAX)always true | | 3 - output([t15.c1], [t15.c2]), filter(nil), rowset=16 | | sort_keys([t15.c1, ASC]) | | 4 - output([t15.c1], [t15.c2]), filter(nil), rowset=16 | | access([t15.c1], [t15.c2]), partitions(p0) | | is_index_back=false, is_global_index=false, | | range_key([t15.__pk_increment]), range(MIN ; MAX)always true | +------------------------------------------------------------------------------+ 26 rows in set
Hash Join
Hash Join is a join algorithm that builds a hash table from the smaller table (usually called the build table) based on the join condition and then scans the larger table (usually called the probe table) row by row to find matching rows in the hash table. If the build table is too large to fit into memory, OceanBase Database partitions both the build table and the probe table based on the join condition into multiple partitions. Each partition contains an independent pair of a build table and a probe table. This way, a large hash join is divided into multiple independent and mutually exclusive hash joins. Each partition's hash join can be completed in memory. In most cases, hash join is more efficient than other join algorithms.
Here is an example of a hash join plan.
Create a table named
t16.obclient> CREATE TABLE t16(a INT PRIMARY KEY, b INT, c INT, KEY k1(b));Create a table named
t17.obclient> CREATE TABLE t17(a INT PRIMARY KEY, b INT, c INT, KEY k1(b));View the execution plan.
obclient> EXPLAIN SELECT/*+USE_HASH(t16, t17)*/ * FROM t16, t17 WHERE t16.c = t17.c;The returned result is as follows:
+--------------------------------------------------------------------------------------------+ | Query Plan | +--------------------------------------------------------------------------------------------+ | ================================================= | | |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| | | ------------------------------------------------- | | |0 |HASH JOIN | |1 |5 | | | |1 |├─TABLE FULL SCAN|t16 |1 |3 | | | |2 |└─TABLE FULL SCAN|t17 |1 |3 | | | ================================================= | | Outputs & filters: | | ------------------------------------- | | 0 - output([t16.a], [t16.b], [t16.c], [t17.a], [t17.b], [t17.c]), filter(nil), rowset=16 | | equal_conds([t16.c = t17.c]), other_conds(nil) | | 1 - output([t16.a], [t16.c], [t16.b]), filter(nil), rowset=16 | | access([t16.a], [t16.c], [t16.b]), partitions(p0) | | is_index_back=false, is_global_index=false, | | range_key([t16.a]), range(MIN ; MAX)always true | | 2 - output([t17.a], [t17.c], [t17.b]), filter(nil), rowset=16 | | access([t17.a], [t17.c], [t17.b]), partitions(p0) | | is_index_back=false, is_global_index=false, | | range_key([t17.a]), range(MIN ; MAX)always true | +--------------------------------------------------------------------------------------------+ 19 rows in set
OceanBase Database also provides a hint /*+ USE_HASH(table_name_list) */ to control the selection of the hash join algorithm when joining multiple tables. For example, if the merge join algorithm is selected in the following scenario, but you want to use the hash join algorithm, you can use the above hint to control it.
Create a table named
t18.obclient> CREATE TABLE t18(c1 INT, c2 INT, PRIMARY KEY(c1));Create a table named
t19.obclient> CREATE TABLE t19(c1 INT, c2 INT, PRIMARY KEY(c1));View the execution plan, where the join condition is
t18.c1 = t19.c1.obclient> EXPLAIN SELECT * FROM t18, t19 WHERE t18.c1 = t19.c1;The returned result is as follows:
+------------------------------------------------------------------------------+ | Query Plan | +------------------------------------------------------------------------------+ | ================================================= | | |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| | | ------------------------------------------------- | | |0 |MERGE JOIN | |1 |5 | | | |1 |├─TABLE FULL SCAN|t18 |1 |3 | | | |2 |└─TABLE FULL SCAN|t19 |1 |3 | | | ================================================= | | Outputs & filters: | | ------------------------------------- | | 0 - output([t18.c1], [t18.c2], [t19.c1], [t19.c2]), filter(nil), rowset=16 | | equal_conds([t18.c1 = t19.c1]), other_conds(nil) | | merge_directions([ASC]) | | 1 - output([t18.c1], [t18.c2]), filter(nil), rowset=16 | | access([t18.c1], [t18.c2]), partitions(p0) | | is_index_back=false, is_global_index=false, | | range_key([t18.c1]), range(MIN ; MAX)always true | | 2 - output([t19.c1], [t19.c2]), filter(nil), rowset=16 | | access([t19.c1], [t19.c2]), partitions(p0) | | is_index_back=false, is_global_index=false, | | range_key([t19.c1]), range(MIN ; MAX)always true | +------------------------------------------------------------------------------+ 20 rows in setUse
/*+USE_HASH(t18, t19)*/to view the execution plan, where the join condition ist18.c1 = t19.c1.obclient> EXPLAIN SELECT /*+USE_HASH(t18, t19)*/ * FROM t18, t19 WHERE t18.c1 = t19.c1;The returned result is as follows:
+------------------------------------------------------------------------------+ | Query Plan | +------------------------------------------------------------------------------+ | ================================================= | | |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| | | ------------------------------------------------- | | |0 |HASH JOIN | |1 |5 | | | |1 |├─TABLE FULL SCAN|t18 |1 |3 | | | |2 |└─TABLE FULL SCAN|t19 |1 |3 | | | ================================================= | | Outputs & filters: | | ------------------------------------- | | 0 - output([t18.c1], [t18.c2], [t19.c1], [t19.c2]), filter(nil), rowset=16 | | equal_conds([t18.c1 = t19.c1]), other_conds(nil) | | 1 - output([t18.c1], [t18.c2]), filter(nil), rowset=16 | | access([t18.c1], [t18.c2]), partitions(p0) | | is_index_back=false, is_global_index=false, | | range_key([t18.c1]), range(MIN ; MAX)always true | | 2 - output([t19.c1], [t19.c2]), filter(nil), rowset=16 | | access([t19.c1], [t19.c2]), partitions(p0) | | is_index_back=false, is_global_index=false, | | range_key([t19.c1]), range(MIN ; MAX)always true | +------------------------------------------------------------------------------+ 19 rows in set
OceanBase Database also uses runtime filters in hash joins. For more information, see Runtime filter.
