A join query combines two or more tables, views, or materialized views. When a query contains multiple tables in the FROM clause, OceanBase Database performs a join query. The query output columns can be selected from any of the tables in the FROM clause. If multiple tables have a column with the same name, you must use the table name to qualify all references to these columns in the query.
Join types
The join types supported by OceanBase Database include inner joins, outer joins, and semi joins ( SEMI JOIN / ANTI JOIN ). A semi join is derived from a subquery rewrite and is not directly supported by OceanBase Database.
Join conditions
Join conditions are rules that combine multiple tables. Most joins include at least one join condition, which is specified in the FROM or WHERE clause. These conditions compare columns from different tables. In addition to join conditions, the WHERE clause may contain other conditions that reference only one table. These conditions can further limit the number of rows returned by the join query.
Join conditions can be classified into equi-joins (e.g., t1.a = t2.b) and non-equi-joins (e.g., t1.a < t2.b). Compared to non-equi-joins, equi-joins allow the database to use efficient join algorithms such as MERGE JOIN (MJ) and HASH JOIN (HJ).
To execute a join, OceanBase Database extracts rows from different tables and combines them into pairs based on the join conditions. For multi-table joins involving more than two tables, OceanBase Database first joins two tables based on their column join conditions, and then joins the resulting table with another table based on the join conditions that include columns from the joined table and the new table. The optimizer determines the join order based on the join conditions, base table indexes, and available statistics.
Note
If the WHERE clause contains join conditions, you cannot specify LOB columns in the WHERE clause.
Join methods
The following table describes the join methods supported by OceanBase Database in the current version.
| Join type | English name | Description |
|---|---|---|
| Equi-join | EQUI JOIN | A join that includes join conditions with equality operators. |
| Self-join | SELF JOIN | A join that joins a table with itself. |
| Inner join | INNER JOIN | An inner join that returns only the matching rows from the joined tables. |
| Left (outer) join | LEFT [OUTER] JOIN | A left outer join that returns all rows from the left table (the table that appears first in the JOIN clause) and excludes unmatched rows from the right table. |
| Right (outer) join | RIGHT [OUTER] JOIN | A right outer join that returns all rows from the right table (the table that appears last in the JOIN clause) and excludes unmatched rows from the left table. |
| Full (outer) join | FULL [OUTER] JOIN | A full outer join that returns all rows from the joined tables, regardless of whether they match. |
| SEMI join | SEMI JOIN | A join that can only be expanded through a subquery. |
| ANTI join | ANTI JOIN | A join that returns all records from the left table that do not match the right table. |
| Cartesian product | Cartesian Product | When no join operations are performed on two tables, the query returns the Cartesian product of the two tables. |
Equi-join
An equi-join is a join that includes join conditions with equality operators. In an equi-join, rows that satisfy the equality conditions are combined and output.
Self-join
A self-join is a join that joins a table with itself. The table appears twice in the FROM clause, followed by table aliases that qualify the column names in the join conditions. During a self-join, OceanBase Database combines rows that satisfy the join conditions and returns them.
Cartesian product
If no join conditions are specified in a join query, OceanBase Database returns the Cartesian product of the two tables. The Cartesian product combines each row from the first table with each row from the second table. This can generate a large number of rows, most of which are irrelevant. For example, if two tables each have 100 rows, their Cartesian product will have 10,000 rows. Therefore, OceanBase Database recommends that you always specify a join condition in your queries to avoid returning Cartesian products unless you specifically need them.
If a query joins three or more tables without specifying join conditions, the optimizer can choose a join order that avoids generating intermediate Cartesian products.
Inner join
An inner join is the most basic join operation in a database.
An inner join combines columns from two tables (e.g., A and B) based on the join conditions and stores the results in a new table. The query compares each row in table A with each row in table B and identifies combinations that satisfy the join conditions. When a combination satisfies the join conditions, the matching rows from tables A and B are combined (side by side) into a single row in the result set. The result set of an inner join is equivalent to first generating the Cartesian product of the two tables by combining each row in table A with each row in table B, and then returning only the rows that satisfy the join conditions.
Outer join
An outer join returns all rows that satisfy the join conditions and also returns unmatched rows from one table, filling the corresponding positions in the other table with NULL.
Outer joins can be further classified into left joins, right joins, and full joins based on whether they retain rows from the left table, right table, or both. In a left join (LEFT [OUTER] JOIN), if a row in the left table does not have a matching row in the right table, the corresponding positions in the right table are filled with NULL. In a right join (RIGHT [OUTER] JOIN), if a row in the right table does not have a matching row in the left table, the corresponding positions in the left table are filled with NULL. In a full join (FULL [OUTER] JOIN), if a row in the left table does not have a matching row in the right table, or vice versa, the corresponding positions in the other table are filled with NULL.
OceanBase Database also supports the outer join symbol (+). For a left join, you must apply the outer join symbol (+) to all columns of the right table in the join conditions of the WHERE clause. For a right join, you must apply the outer join symbol (+) to all columns of the left table in the join conditions of the WHERE clause. When using the outer join symbol (+), the following rules and limitations apply:
If the
FROMclause containsJOINsyntax, you cannot specify the outer join symbol (+).The outer join symbol (+) can only appear in the
WHEREclause or in the left correlated context of theFROMclause (when theTABLEclause is specified). It can only be applied to a single column of a table or view.If multiple join conditions are used to join the left and right tables, the outer join symbol (+) must be applied to all conditions. Otherwise, the database will only return data from simple joins, and no warning or error will be generated.
You cannot use the outer join symbol (+) to join a table with itself.
The outer join operator (+) can only be applied to columns, not to expressions.
A
WHEREcondition containing the outer join symbol (+) cannot be combined with conditions that use theORorINlogical operators.In a query that joins more than two tables, one of the tables can be an empty table generated for another table. Therefore, you cannot apply the outer join symbol (+) to columns of the same table in multiple join conditions. In other words, you cannot apply the outer join operator (+) to the same column in multiple join conditions.
SEMI join
When tables A and B are joined using LEFT SEMI JOIN or RIGHT SEMI JOIN, only the rows in A or B that have matching rows in B or A, respectively, are returned. SEMI JOIN can only be expanded through a subquery.
ANTI join
When tables A and B are joined using LEFT ANTI JOIN or RIGHT ANTI JOIN, only the rows in A or B that do not have matching rows in B or A, respectively, are returned. Like SEMI JOIN, ANTI JOIN can only be expanded through a subquery.
Examples
Create tables table_a and table_b and insert data.
obclient> CREATE TABLE table_a(PK INT, name VARCHAR(25));
obclient> INSERT INTO table_a VALUES(1,'Fox');
obclient> INSERT INTO table_a VALUES(2,'Police');
obclient> INSERT INTO table_a VALUES(3,'Taxi');
obclient> INSERT INTO table_a VALUES(4,'Lincoln');
obclient> INSERT INTO table_a VALUES(5,'Arizona');
obclient> INSERT INTO table_a VALUES(6,'Washington');
obclient> INSERT INTO table_a VALUES(7,'Dell');
obclient> INSERT INTO table_a VALUES(10,'Lucent');
obclient> CREATE TABLE table_b(PK INT, name VARCHAR(25));
obclient> INSERT INTO table_b VALUES(1,'Fox');
obclient> INSERT INTO table_b VALUES(2,'Police');
obclient> INSERT INTO table_b VALUES(3,'Taxi');
obclient> INSERT INTO table_b VALUES(6,'Washington');
obclient> INSERT INTO table_b VALUES(7,'Dell');
obclient> INSERT INTO table_b VALUES(8,'Microsoft');
obclient> INSERT INTO table_b VALUES(9,'Apple');
obclient> INSERT INTO table_b VALUES(11,'Scottish Whisky');
Self-join query example
obclient> SELECT * FROM table_a ta, table_a tb WHERE ta.NAME = tb.NAME;
Expected result:
+------+------------+------+------------+
| PK | NAME | PK | NAME |
+------+------------+------+------------+
| 1 | Fox | 1 | Fox |
| 2 | Police | 2 | Police |
| 3 | Taxi | 3 | Taxi |
| 4 | Lincoln | 4 | Lincoln |
| 5 | Arizona | 5 | Arizona |
| 6 | Washington | 6 | Washington |
| 7 | Dell | 7 | Dell |
| 10 | Lucent | 10 | Lucent |
+------+------------+------+------------+
8 rows in set (0.013 sec)
Inner join query example
obclient> SELECT A.PK AS A_PK, A.name AS A_Value, B.PK AS B_PK, B.name AS B_Value
FROM table_a A INNER JOIN table_b B ON A.PK = B.PK;
Expected result:
+------+------------+------+------------+
| A_PK | A_VALUE | B_PK | B_VALUE |
+------+------------+------+------------+
| 1 | Fox | 1 | Fox |
| 2 | Police | 2 | Police |
| 3 | Taxi | 3 | Taxi |
| 6 | Washington | 6 | Washington |
| 7 | Dell | 7 | Dell |
+------+------------+------+------------+
5 rows in set (0.017 sec)
Left join query example
obclient> SELECT A.PK AS A_PK, A.name AS A_Value, B.PK AS B_PK, B.name AS B_Value
FROM table_a A LEFT JOIN table_b B ON A.PK = B.PK;
Expected result:
+------+------------+------+------------+
| A_PK | A_VALUE | B_PK | B_VALUE |
+------+------------+------+------------+
| 1 | Fox | 1 | Fox |
| 2 | Police | 2 | Police |
| 3 | Taxi | 3 | Taxi |
| 6 | Washington | 6 | Washington |
| 7 | Dell | 7 | Dell |
| 4 | Lincoln | NULL | NULL |
| 5 | Arizona | NULL | NULL |
| 10 | Lucent | NULL | NULL |
+------+------------+------+------------+
8 rows in set (0.006 sec)
obclient> SELECT A.PK AS A_PK, A.name AS A_Value, B.PK AS B_PK, B.name AS B_Value
FROM table_a A,table_b B WHERE A.PK = B.PK(+);
Expected result:
+------+------------+------+------------+
| A_PK | A_VALUE | B_PK | B_VALUE |
+------+------------+------+------------+
| 1 | Fox | 1 | Fox |
| 2 | Police | 2 | Police |
| 3 | Taxi | 3 | Taxi |
| 6 | Washington | 6 | Washington |
| 7 | Dell | 7 | Dell |
| 4 | Lincoln | NULL | NULL |
| 5 | Arizona | NULL | NULL |
| 10 | Lucent | NULL | NULL |
+------+------------+------+------------+
8 rows in set (0.008 sec)
Right join query example
obclient> SELECT A.PK AS A_PK, A.name AS A_Value, B.PK AS B_PK, B.name AS B_Value
FROM table_a A RIGHT JOIN table_b B ON A.PK = B.PK;
Expected result:
+------+------------+------+-----------------+
| A_PK | A_VALUE | B_PK | B_VALUE |
+------+------------+------+-----------------+
| 1 | Fox | 1 | Fox |
| 2 | Police | 2 | Police |
| 3 | Taxi | 3 | Taxi |
| 6 | Washington | 6 | Washington |
| 7 | Dell | 7 | Dell |
| NULL | NULL | 11 | Scottish Whisky |
| NULL | NULL | 9 | Apple |
| NULL | NULL | 8 | Microsoft |
+------+------------+------+-----------------+
8 rows in set (0.005 sec)
obclient> SELECT A.PK AS A_PK, A.name AS A_Value, B.PK AS B_PK, B.name AS B_Value
FROM table_a A,table_b B WHERE A.PK(+) = B.PK;
Expected result:
+------+------------+------+-----------------+
| A_PK | A_VALUE | B_PK | B_VALUE |
+------+------------+------+-----------------+
| 1 | Fox | 1 | Fox |
| 2 | Police | 2 | Police |
| 3 | Taxi | 3 | Taxi |
| 6 | Washington | 6 | Washington |
| 7 | Dell | 7 | Dell |
| NULL | NULL | 11 | Scottish Whisky |
| NULL | NULL | 9 | Apple |
| NULL | NULL | 8 | Microsoft |
+------+------------+------+-----------------+
8 rows in set (0.005 sec)
Full join query example
obclient> SELECT A.PK AS A_PK,A.name AS A_Value,B.PK AS B_PK,B.name AS B_Value
FROM table_a A FULL JOIN table_b B ON A.PK = B.PK;
Expected result:
+------+------------+------+-----------------+
| A_PK | A_VALUE | B_PK | B_VALUE |
+------+------------+------+-----------------+
| 1 | Fox | 1 | Fox |
| 2 | Police | 2 | Police |
| 3 | Taxi | 3 | Taxi |
| 6 | Washington | 6 | Washington |
| 7 | Dell | 7 | Dell |
| NULL | NULL | 8 | Microsoft |
| NULL | NULL | 9 | Apple |
| NULL | NULL | 11 | Scottish Whisky |
| 4 | Lincoln | NULL | NULL |
| 5 | Arizona | NULL | NULL |
| 10 | Lucent | NULL | NULL |
+------+------------+------+-----------------+
11 rows in set (0.005 sec)
SEMI join query example
Rewrite the dependent subquery into a SEMI JOIN.
obclient> EXPLAIN SELECT * FROM table_a t1 WHERE t1.PK IN (SELECT t2.PK FROM table_b t2
WHERE t2.NAME = t1.NAME);
Expected result:
+---------------------------------------------------------------------------+
| Query Plan |
+---------------------------------------------------------------------------+
| ================================================= |
| |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| |
| ------------------------------------------------- |
| |0 |HASH SEMI JOIN | |8 |11 | |
| |1 |├─TABLE FULL SCAN|T1 |8 |3 | |
| |2 |└─TABLE FULL SCAN|T2 |8 |3 | |
| ================================================= |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([T1.PK], [T1.NAME]), filter(nil), rowset=16 |
| equal_conds([T2.NAME = T1.NAME], [T1.PK = T2.PK]), other_conds(nil) |
| 1 - output([T1.NAME], [T1.PK]), filter(nil), rowset=16 |
| access([T1.NAME], [T1.PK]), partitions(p0) |
| is_index_back=false, is_global_index=false, |
| range_key([T1.__pk_increment]), range(MIN ; MAX)always true |
| 2 - output([T2.NAME], [T2.PK]), filter(nil), rowset=16 |
| access([T2.NAME], [T2.PK]), partitions(p0) |
| is_index_back=false, is_global_index=false, |
| range_key([T2.__pk_increment]), range(MIN ; MAX)always true |
+---------------------------------------------------------------------------+
19 rows in set (0.006 sec)
ANTI join query example
Rewrite the dependent subquery into an ANTI JOIN.
obclient> EXPLAIN SELECT * FROM table_a t1 WHERE t1.PK NOT IN (SELECT t2.PK
FROM table_b t2 WHERE t2.name = t1.name);
Expected result:
+---------------------------------------------------------------------------------------------------------------+
| Query Plan |
+---------------------------------------------------------------------------------------------------------------+
| ================================================= |
| |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| |
| ------------------------------------------------- |
| |0 |HASH ANTI JOIN | |1 |11 | |
| |1 |├─TABLE FULL SCAN|T1 |8 |3 | |
| |2 |└─TABLE FULL SCAN|T2 |8 |3 | |
| ================================================= |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([T1.PK], [T1.NAME]), filter(nil), rowset=16 |
| equal_conds([T2.NAME = T1.NAME]), other_conds([(T_OP_OR, T1.PK = T2.PK, T1.PK IS NULL, T2.PK IS NULL)]) |
| 1 - output([T1.NAME], [T1.PK]), filter(nil), rowset=16 |
| access([T1.NAME], [T1.PK]), partitions(p0) |
| is_index_back=false, is_global_index=false, |
| range_key([T1.__pk_increment]), range(MIN ; MAX)always true |
| 2 - output([T2.NAME], [T2.PK]), filter(nil), rowset=16 |
| access([T2.NAME], [T2.PK]), partitions(p0) |
| is_index_back=false, is_global_index=false, |
| range_key([T2.__pk_increment]), range(MIN ; MAX)always true |
+---------------------------------------------------------------------------------------------------------------+
19 rows in set (0.008 sec)
