What is nested loop join
A nested loop join (NLJ) joins two data sources by using two nested loops. For example, the following is T1 Nested Loop Join T2 in pseudocode:
for row_1 IN (select * from T1 where xxx)
loop
for row_2 IN (select * from T2 where xxx)
loop
if match join condition(row_1, row_2)
then
output (row_1, row_2)
end if
end loop
end loop
The outer loop traverses each row in the left table, and the inner loop then traverses each row in the right table to check whether the two rows meet the join condition. If so, the rows are joined and then returned.
Here is an execution plan for a nested loop join:
Query Plan:
============================================
|ID|OPERATOR |NAME|EST. ROWS|COST |
--------------------------------------------
|0 |NESTED-LOOP JOIN| |99000 |4120845|
|1 | TABLE SCAN |t1 |100000 |41911 |
|2 | TABLE GET |t2 |1 |40 |
============================================
Outputs & filters:
-------------------------------------
0 - output([1]), filter(nil),
conds(nil), nl_params_([t1.c1])
1 - output([t1.c1]), filter(nil),
access([t1.c1]), partitions(p0)
2 - output([1]), filter(nil),
access([t2.c1]), partitions(p0)
When the optimizer chooses nested loop joins
Nested loop joins are useful when the database joins a small dataset with a large dataset, provided that the large dataset has only a small amount of data that meets join conditions or only the first row that meets join conditions needs to be returned.
In general, nested loop joins work best on small driving tables with indexes on join conditions. If a data source has only one row, such as an equality lookup on the primary key (for example, WHERE id=101), the join is a simple lookup. The optimizer always tries to put the smallest data source on the left, making it the driving table.
The optimizer decides whether to use nested loops based on various factors, such as whether a related hint is specified, whether equijoin conditions are used, whether columns in join conditions are indexed, and whether the driving table outputs a maximum of one row. For example, if a query contains a hint that controls the use of the nested loop join, the optimizer determines whether to use the nested loop join as specified by the hint. In the absence of such a hint, if the query has an equijoin condition that does not match any index, and the driving table outputs multiple rows, the optimizer will not choose the nested loop join. Otherwise, the optimizer will try to generate a plan that uses the nested loop join algorithm and calculate the cost of the plan, and finally determines whether to use the nested loop join based on the cost.
If the query does not have any equijoin condition or the join condition is absent, the optimizer will choose the nested loop join, because there are no other algorithms available.
Control the use of nested loop joins in the optimizer
The most direct control method is to specify the join algorithm by using a hint. You can use the USE_NL hint to instruct the optimizer to use the nested loop join algorithm. The LEADING hint is often used together with the USE_NL hint to specify the join order. This is because the plan that uses the nested loop join algorithm may not have the lowest cost and therefore may be overridden by a plan (that adopts a different join order) with lower cost. The USE_NL hint specifies the right table of the join.
In the following example, the optimizer autonomously chooses the hash join algorithm by default.
explain select 1 from t1, t2 where t1.c1 = t2.c1;
Query Plan:
======================================
|ID|OPERATOR |NAME|EST. ROWS|COST |
--------------------------------------
|0 |HASH JOIN | |99000 |194200|
|1 | TABLE SCAN|t1 |100000 |41911 |
|2 | TABLE SCAN|t2 |100000 |41911 |
======================================
Outputs & filters:
-------------------------------------
0 - output([1]), filter(nil),
equal_conds([t1.c1 = t2.c1]), other_conds(nil)
1 - output([t1.c1]), filter(nil),
access([t1.c1]), partitions(p0)
2 - output([t2.c1]), filter(nil),
access([t2.c1]), partitions(p0)
To instruct the optimizer to use the nested loop join algorithm, you can use a hint.
explain select /*+leading(t1 t2) use_nl(t2)*/ 1 from t1, t2 where t1.c1 = t2.c1;
Query Plan:
============================================
|ID|OPERATOR |NAME|EST. ROWS|COST |
--------------------------------------------
|0 |NESTED-LOOP JOIN| |99000 |4120845|
|1 | TABLE SCAN |t1 |100000 |41911 |
|2 | TABLE GET |t2 |1 |40 |
============================================
Outputs & filters:
-------------------------------------
0 - output([1]), filter(nil),
conds(nil), nl_params_([t1.c1])
1 - output([t1.c1]), filter(nil),
access([t1.c1]), partitions(p0)
2 - output([1]), filter(nil),
access([t2.c1]), partitions(p0)
To better use the nested loop join algorithm, take note of the following considerations:
- Use a data source with a small amount of data as the driving table, if possible.
- If the join condition does not match any index, create a new index to use the nested loop join algorithm.
- Do not use the nested loop join algorithm for a right join, full join, right semi join, or right anti join.