A join is a query that combines two or more tables or views. When a query contains multiple tables in its FROM clause, OceanBase Database performs a join query. The output columns of the query can be selected from any of the tables in the FROM clause. If multiple tables have a column with the same name, you must use the table name to qualify all references to these columns in the query.
Join types
The join types in a database mainly include inner joins, outer joins, and semi joins (SEMI JOIN/ ANTI JOIN). Semi joins are rewritten using subqueries, and OceanBase Database does not have corresponding syntax for them.
Join conditions
Join conditions are rules that combine multiple tables. Most joins include at least one join condition, which exists in the FROM or WHERE clause and compares two columns from different tables. The WHERE clause may also contain other conditions that reference only one table, further limiting the number of rows returned by the join query.
Join conditions can be categorized into equi-joins (e.g., t1.a = t2.b) and non-equi-joins (e.g., t1.a < t2.b). Compared to non-equi-join conditions, equi-join conditions allow the database to use efficient join algorithms such as MERGE JOIN (MJ) and HASH JOIN (HJ).
To execute a join, OceanBase Database extracts rows from different tables and pairs them based on the join conditions. For multi-table joins involving more than two tables, OceanBase Database first joins two tables based on their column join conditions, then joins the result with another table based on join conditions involving the joined tables and the new table. The optimizer determines the join order based on the join conditions, base table indexes, and available statistics.
Note
If the WHERE clause contains join conditions, you cannot specify LOB columns in the WHERE clause.
Join types
The following table describes the join types supported in the current version of OceanBase Database.
| Join type | English name | Description |
|---|---|---|
| Equijoin | Equijoin | A join that contains an equality operator. |
| Self join | SELF JOIN | A join between a table and itself. |
| Inner join | INNER JOIN | An inner join that returns the matching rows from two joined tables. |
| Left (outer) join | LEFT [OUTER] JOIN | A join that returns all rows from the left table (the table that appears first in the JOIN clause) and excludes unmatched rows from the right table. |
| Right (outer) join | RIGHT [OUTER] JOIN | A join that returns all rows from the right table (the table that appears last in the JOIN clause) and excludes unmatched rows from the left table. |
| Full (outer) join | FULL [OUTER] JOIN | A join that returns all rows from both tables, regardless of whether they match. |
| SEMI join | SEMI JOIN | A join that can only be expanded by using a subquery. |
| ANTI join | ANTI JOIN | A join that can only be expanded by using a subquery. |
| Cartesian product | Cartesian product | A join that returns the Cartesian product of two tables when no join condition is specified. |
Equijoin
An equijoin is a join that contains an equality operator. In an equijoin, the rows that satisfy the equality condition are combined and output.
Self join
A self join is a join between a table and itself. The table appears twice in the FROM clause, followed by table aliases that qualify the column names in the join condition. When a self join is executed, OceanBase Database combines the rows that satisfy the join condition and returns them.
Cartesian product
If no join condition is specified in a join query, OceanBase Database returns the Cartesian product of the two tables. The Cartesian product combines each row from the first table with each row from the second table. The Cartesian product generates many rows, most of which are useless. For example, a Cartesian product of two tables with 100 rows each generates 10,000 rows. Therefore, OceanBase Database recommends that you always specify a join condition in your queries to avoid returning Cartesian products.
If a query joins three or more tables and no join condition is specified, the optimizer can choose a join order to avoid generating a Cartesian product in the intermediate steps.
Inner join
An inner join is the most basic type of join in a database.
An inner join combines the columns of two tables (for example, A and B) based on the join condition and stores the results in a new table. The query compares each row in table A with each row in table B and finds the combinations that satisfy the join condition. When the join condition is satisfied, the matching rows from tables A and B are combined (side by side) into a row in the result set. The result set of an inner join is equivalent to first generating the Cartesian product of the two tables by combining each row in table A with each row in table B, and then returning the rows that satisfy the join condition.
Outer join
An outer join returns all rows that satisfy the join condition and the unmatched rows from one of the tables, with NULL values filled in the corresponding positions of the other table.
An outer join can be further divided into left join, right join, and full join based on whether it retains the rows from the left table, right table, or both. In a left join (LEFT [OUTER] JOIN), if a row in the left table does not find a matching row in the right table, NULL values are automatically filled in the corresponding positions of the right table. In a right join (RIGHT [OUTER] JOIN), if a row in the right table does not find a matching row in the left table, NULL values are automatically filled in the corresponding positions of the left table. In a full join (FULL [OUTER] JOIN), if a row in the left table does not find a matching row in the right table, or a row in the right table does not find a matching row in the left table, NULL values are automatically filled in the corresponding positions of the other table.
OceanBase Database also supports the outer join operator (+). For a left join, the outer join operator (+) is applied to all columns of the right table in the join condition of the WHERE clause. For a right join, the outer join operator (+) is applied to all columns of the left table in the join condition of the WHERE clause. When using the outer join operator (+), the following rules and limitations apply:
If the
FROMclause of a query block contains theJOINsyntax, the outer join operator (+) cannot be specified.The outer join operator (+) can only appear in the
WHEREclause, or in the left-related context of theFROMclause (when theTABLEclause is specified), and can only be applied to a single column of a table or view.If multiple join conditions are specified between the left and right tables, the outer join operator (+) must be used in all conditions. Otherwise, the database will only return the rows generated by the simple join, without any warning or error.
The outer join operator (+) cannot be used to join a table to itself.
The outer join operator (+) can only be applied to columns, not to expressions.
A
WHEREcondition containing the outer join operator (+) cannot be combined with a condition that uses theORorINlogical operator.In a query that joins more than two tables, one of the tables can be an empty table generated for another table. Therefore, the outer join operator (+) cannot be applied to the columns of the same table in multiple join conditions. In other words, the outer join operator (+) cannot be applied to the same column in multiple join conditions.
SEMI join
When tables A and B are joined using LEFT SEMI JOIN or RIGHT SEMI JOIN, only the rows in A or B that can find a match in B or A, respectively, are returned. The SEMI JOIN can only be expanded by using a subquery.
ANTI join
When tables A and B are joined using LEFT ANTI JOIN or RIGHT ANTI JOIN, only the rows in A or B that cannot find a match in B or A, respectively, are returned. Similar to SEMI JOIN, the ANTI JOIN can only be expanded by using a subquery.
Examples
Create tables table_a and table_b and insert data.
CREATE TABLE table_a(PK INT, name VARCHAR(25));
INSERT INTO table_a VALUES(1,'Fords');
INSERT INTO table_a VALUES(2,'Police');
INSERT INTO table_a VALUES(3,'Taxi');
INSERT INTO table_a VALUES(4,'Lincoln');
INSERT INTO table_a VALUES(5,'Arizona');
INSERT INTO table_a VALUES(6,'Washington');
INSERT INTO table_a VALUES(7,'Dell');
INSERT INTO table_a VALUES(10,'Lucent');
CREATE TABLE table_b(PK INT, name VARCHAR(25));
INSERT INTO table_b VALUES(1,'Fords');
INSERT INTO table_b VALUES(2,'Police');
INSERT INTO table_b VALUES(3,'Taxi');
INSERT INTO table_b VALUES(6,'Washington');
INSERT INTO table_b VALUES(7,'Dell');
INSERT INTO table_b VALUES(8,'Microsoft');
INSERT INTO table_b VALUES(9,'Apple');
INSERT INTO table_b VALUES(11,'Scottish Whisky');
Self-join example
obclient> SELECT * FROM table_a ta, table_a tb WHERE ta.NAME = tb.NAME;
+------+-----------------+------+-----------------+
| PK | NAME | PK | NAME |
+------+-----------------+------+-----------------+
| 5 | Arizona | 5 | Arizona |
| 6 | Washington | 6 | Washington |
| 7 | Dell | 7 | Dell |
| 10 | Lucent | 10 | Lucent |
| 4 | Lincoln | 4 | Lincoln |
| 3 | Taxi | 3 | Taxi |
| 1 | Fox | 1 | Fox |
| 2 | Police | 2 | Police |
+------+-----------------+------+-----------------+
8 rows in set
Inner join example
obclient> SELECT A.PK AS A_PK, A.name AS A_Value, B.PK AS B_PK, B.name AS B_Value
FROM table_a A INNER JOIN table_b B ON A.PK = B.PK;
+------+-----------+------+-----------+
| A_PK | A_VALUE | B_PK | B_VALUE |
+------+-----------+------+-----------+
| 1 | Fox | 1 | Fox |
| 2 | Police | 2 | Police |
| 3 | Taxi | 3 | Taxi |
| 6 | Washington| 6 | Washington|
| 7 | Dell | 7 | Dell |
+------+-----------+------+-----------+
5 rows in set
Left join example
obclient> SELECT A.PK AS A_PK, A.name AS A_Value, B.PK AS B_PK, B.name AS B_Value
FROM table_a A LEFT JOIN table_b B ON A.PK = B.PK;
+------+-----------------+------+-----------+
| A_PK | A_VALUE | B_PK | B_VALUE |
+------+-----------------+------+-----------+
| 1 | Fox | 1 | Fox |
| 2 | Police | 2 | Police |
| 3 | Taxi | 3 | Taxi |
| 6 | Washington | 6 | Washington|
| 7 | Dell | 7 | Dell |
| 4 | Lincoln | NULL | NULL |
| 5 | Arizona | NULL | NULL |
| 10 | Lucent | NULL | NULL |
+------+-----------------+------+-----------+
8 rows in set
obclient> SELECT A.PK AS A_PK, A.name AS A_Value, B.PK AS B_PK, B.name AS B_Value
FROM table_a A,table_b B WHERE A.PK = B.PK(+);
+------+-----------------+------+-----------+
| A_PK | A_VALUE | B_PK | B_VALUE |
+------+-----------------+------+-----------+
| 1 | Fox | 1 | Fox |
| 2 | Police | 2 | Police |
| 3 | Taxi | 3 | Taxi |
| 6 | Washington | 6 | Washington|
| 7 | Dell | 7 | Dell |
| 4 | Lincoln | NULL | NULL |
| 5 | Arizona | NULL | NULL |
| 10 | Lucent | NULL | NULL |
+------+-----------------+------+-----------+
8 rows in set
Right join queries example
obclient> SELECT A.PK AS A_PK, A.name AS A_Value, B.PK AS B_PK, B.name AS B_Value
FROM table_a A RIGHT JOIN table_b B ON A.PK = B.PK;
+------+-----------+------+--------------------+
| A_PK | A_VALUE | B_PK | B_VALUE |
+------+-----------+------+--------------------+
| 1 | Fox | 1 | Fox |
| 2 | policeman | 2 | policeman |
| 3 | Cab | 3 | Cab |
| 6 | Washington | 6 | Washington |
| 7 | Dell | 7 | Dell |
| NULL | NULL | 8 | Microsoft |
| NULL | NULL | 11 | Scotch Whisky |
| NULL | NULL | 9 | Apple |
+------+-----------+------+--------------------+
8 rows in set
obclient> SELECT A.PK AS A_PK, A.name AS A_Value, B.PK AS B_PK, B.name AS B_Value
FROM table_a A,table_b B WHERE A.PK(+) = B.PK;
+------+-----------+------+--------------------+
| A_PK | A_VALUE | B_PK | B_VALUE |
+------+-----------+------+--------------------+
| 1 | Foxy | 1 | Foxy |
| 2 | police | 2 | police |
| 3 | Taxi | 3 | Taxi |
| 6 | Washington | 6 | Washington |
| 7 | Dell Inc | 7 | Dell Inc |
| NULL | NULL | 8 | Microsoft |
| NULL | NULL | 11 | Scottish whisky |
| NULL | NULL | 9 | Apple |
+------+-----------+------+--------------------+
8 rows in set
Examples of full outer join queries
obclient> SELECT A.PK AS A_PK,A.name AS A_Value,B.PK AS B_PK,B.name AS B_Value
FROM table_a A FULL JOIN table_b B ON A.PK = B.PK;
+------+-----------------+------+--------------------+
| A_PK | A_VALUE | B_PK | B_VALUE |
+------+-----------------+------+--------------------+
| 1 | Foxes | 1 | Foxes |
| 2 | Police | 2 | Police |
| 3 | Taxi | 3 | Taxi |
| 6 | Washington | 6 | Washington |
| 7 | Dell | 7 | Dell |
| NULL | NULL | 8 | Microsoft |
| NULL | NULL | 9 | Apple |
| NULL | NULL | 11 | Scotch whisky |
| 4 | Lincoln | NULL | NULL |
| 5 | Arizona | NULL | NULL |
| 10 | Lucent Technologies | NULL | NULL |
+------+-----------------+------+--------------------+
11 rows in set
Sample SEMI join query
A dependent subquery is converted into a SEMI JOIN.
obclient> explain SELECT * FROM table_a t1 WHERE t1.PK IN (SELECT t2.PK FROM table_b t2
WHERE t2.NAME = t1.NAME);\G
+------------------------------------------------+
| Query Plan |
+------------------------------------------------+
=======================================
|ID|OPERATOR |NAME|EST. ROWS|COST|
---------------------------------------
|0 |HASH SEMI JOIN| |8 |114 |
|1 | TABLE SCAN |T1 |8 |38 |
|2 | TABLE SCAN |T2 |8 |38 |
=======================================
Outputs & filters:
-------------------------------------
0 - output([T1.PK], [T1.NAME]), filter(nil),
equal_conds([T1.PK = T2.PK], [T2.NAME = T1.NAME]), other_conds(nil)
1 - output([T1.NAME], [T1.PK]), filter(nil),
access([T1.NAME], [T1.PK]), partitions(p0)
2 - output([T2.NAME], [T2.PK]), filter(nil),
access([T2.NAME], [T2.PK]), partitions(p0)
+------------------------------------------------+
1 row in set
Examples of an anti join query
A subquery that has a dependent subquery is rewritten as an ANTI JOIN.
obclient> EXPLAIN SELECT * FROM table_a t1 WHERE t1.PK NOT IN (SELECT t2.PK
FROM table_b t2 WHERE t2.name = t1.name);
+------------------------------------------------+
| Query Plan |
+------------------------------------------------+
=======================================
|ID|OPERATOR |NAME|EST. ROWS|COST|
---------------------------------------
|0 |HASH ANTI JOIN| |0 |112 |
|1 | TABLE SCAN |T1 |8 |38 |
|2 | TABLE SCAN |T2 |8 |38 |
=======================================
Outputs & filters:
-------------------------------------
0 - output([T1.PK], [T1.NAME]), filter(nil),
equal_conds([T2.NAME = T1.NAME]), other_conds([(T_OP_OR, T1.PK = T2.PK,
(T_OP_IS, T1.PK, NULL, 0), (T_OP_IS, T2.PK, NULL, 0))])
1 - output([T1.NAME], [T1.PK]), filter(nil),
access([T1.NAME], [T1.PK]), partitions(p0)
2 - output([T2.NAME], [T2.PK]), filter(nil),
access([T2.NAME], [T2.PK]), partitions(p0)
+---------------------------------------------------------+
1 row in set