What is a join order
For a multi-table join, since the database can join only one pair of data sources at a time, the optimizer must specify the join order of data source pairs. A query may have multiple join orders. For example, the join query select * from t1, t2, t3 where t1.c1 = t2.c1 and t2.c1 = t3.c1; has multiple alternative join orders. Here are two of the join orders:
Plan 1:
==============================================
|ID|OPERATOR |NAME|EST. ROWS |COST |
----------------------------------------------
|0 |HASH JOIN | |96059601000|22771533844|
|1 | HASH JOIN | |98010000 |23334667 |
|2 | TABLE SCAN|T1 |100000 |42605 |
|3 | TABLE SCAN|T2 |100000 |42605 |
|4 | TABLE SCAN |T3 |100000 |42605 |
==============================================
Plan 2:
==============================================
|ID|OPERATOR |NAME|EST. ROWS |COST |
----------------------------------------------
|0 |HASH JOIN | |96059601000|22771533844|
|1 | HASH JOIN | |98010000 |23334667 |
|2 | TABLE SCAN|T3 |100000 |42605 |
|3 | TABLE SCAN|T2 |100000 |42605 |
|4 | TABLE SCAN |T1 |100000 |42605 |
==============================================
For a more complex query, such as select 1 from t1, t2, t3, t4 where t1.c1 = t2.c1 and t1.c1 = t3.c1 and t1.c1 = t4.c1;, more join orders are available, and their costs may differ greatly. The optimizer needs to choose the optimal join order based on certain strategies.
How does the optimizer decide the join order
The optimizer selects two data sources for join at a time until all data sources are covered. The optimizer then obtains all possible join orders, calculates the execution cost of each join order, and selects the optimal join order based on the calculated cost.
Various factors enter into the optimizer decision on the join order, for example, the amount of data to join, the number of columns to access in tables, data distribution, the join algorithm, and indexes on tables. Among them, the factor that affects the optimizer most is the amount of data to join, which depends on the accuracy of statistics and the row estimates of the optimizer.
How to control the optimizer to use a specific join order
You can use two hints to control the join order in the optimizer.
The first hint is ORDERED, which instructs the optimizer to choose the join order defined by the query without considering other join orders. For example, for select /*+ORDERED*/ 1 from t1, t2, t3 where t1.c1 = t2.c1 and t1.c1 = t3.c1;, the optimizer will only choose the join order t1 join t2 join t3 when generating an execution plan.
==============================================
|ID|OPERATOR |NAME|EST. ROWS |COST |
----------------------------------------------
|0 |HASH JOIN | |96059601000|65343614674|
|1 | HASH JOIN | |98010000 |66674090 |
|2 | TABLE SCAN|T1 |100000 |61860 |
|3 | TABLE SCAN|T2 |100000 |61860 |
|4 | TABLE SCAN |T3 |100000 |61860 |
==============================================
The second hint is LEADING, which specifies a complete or partial order for the optimizer. For example, in select /*+leading(t1 t3 t4 t2)*/ 1 from t1, t2, t3, t4 where t1.c1 = t2.c1 and t1.c1 = t3.c1 and t1.c1 = t4.c1;, LEADING specifies a complete join order.
=====================================================
|ID|OPERATOR |NAME|EST. ROWS |COST |
-----------------------------------------------------
|0 |HASH JOIN | |9.414801e+13 |6.412702e+13 |
|1 | HASH JOIN | |96059601000 |65343614674 |
|2 | HASH JOIN | |98010000 |66674090 |
|3 | TABLE SCAN|T1 |100000 |61860 |
|4 | TABLE SCAN|T3 |100000 |61860 |
|5 | TABLE SCAN |T4 |100000 |61860 |
|6 | TABLE SCAN |T2 |100000 |61860 |
=====================================================
You can also specify a partial join order, and the optimizer determines the join order for the remaining tables. For example, for select /*+leading(t3 t1)*/ 1 from t1, t2, t3, t4 where t1.c1 = t2.c1 and t1.c1 = t3.c1 and t1.c1 = t4.c1;, the optimizer may consider the following join orders:
Query Plan1:
=====================================================
|ID|OPERATOR |NAME|EST. ROWS |COST |
-----------------------------------------------------
|0 |HASH JOIN | |9.414801e+13 |6.412702e+13 |
|1 | HASH JOIN | |96059601000 |65343614674 |
|2 | HASH JOIN | |98010000 |66674090 |
|3 | TABLE SCAN|T3 |100000 |61860 |
|4 | TABLE SCAN|T1 |100000 |61860 |
|5 | TABLE SCAN |T4 |100000 |61860 |
|6 | TABLE SCAN |T2 |100000 |61860 |
=====================================================
Query Plan2:
=====================================================
|ID|OPERATOR |NAME|EST. ROWS |COST |
-----------------------------------------------------
|0 |HASH JOIN | |9.414801e+13 |6.412702e+13 |
|1 | HASH JOIN | |96059601000 |65343614674 |
|2 | HASH JOIN | |98010000 |66674090 |
|3 | TABLE SCAN|T3 |100000 |61860 |
|4 | TABLE SCAN|T1 |100000 |61860 |
|5 | TABLE SCAN |T2 |100000 |61860 |
|6 | TABLE SCAN |T4 |100000 |61860 |
=====================================================
The optimizer then decides the optimal join order based on the cost.
How to quickly determine whether the optimizer's selected join order is suboptimal
In most cases, suboptimal join orders are due to inaccurate statistics or row count estimates. When encountering slow queries, first verify the accuracy of the estimated row counts for the base tables. If a table's row count is significantly off, the optimizer's selected join order is likely not the most cost-effective. For example, the execution of the plan for select 1 from t1, t2, t3, t4 where t1.c1 = t2.c1 and t1.c1 = t3.c1 and t1.c1 = t4.c1 and t1.c2=2 and t4.c2=1 is very slow.
=========================================
|ID|OPERATOR |NAME|EST. ROWS|COST |
-----------------------------------------
|0 |HASH JOIN | |9686572 |2563762|
|1 | HASH JOIN | |9884 |184590 |
|2 | HASH JOIN | |11 |97710 |
|3 | TABLE SCAN|T1 |10 |48849 |
|4 | TABLE SCAN|T4 |10 |48849 |
|5 | TABLE SCAN |T2 |100000 |42605 |
|6 | TABLE SCAN |T3 |100000 |42605 |
=========================================
You can construct an auxiliary SQL query to verify the number of rows in the base table. For example, you can construct select count(*) from t1 where t1.c2=2; to verify the number of rows in t1, which is actually 1 million. That is, the join order chosen by the optimizer is not the optimal one when the row estimates are incorrect. In this case, you need to add the LEADING hint to specify a join order, for example, leading(t3 (t2 (t4 t1))).