A JOIN query combines two or more tables, views, or materialized views. When your query's FROM clause contains multiple table names, OceanBase Database performs a JOIN query. You can select columns from any tables listed in the FROM clause. If multiple tables contain a column with the same name, you must qualify references to these columns with the table name during the query.
JOIN types
The join types in a database include inner joins, outer joins, and semi joins (SEMI JOIN/ ANTI JOIN). Semi joins are rewritten by using subqueries and therefore not supported in the original SQL syntax in OceanBase Database.
JOIN conditions
A join condition is a rule that specifies how to combine multiple tables. Most joins require at least one join condition. The join condition is specified in the FROM or WHERE clause to compare columns from different tables. Apart from the join condition, the WHERE clause can also contain other conditions that reference only one table. These additional conditions can further limit the number of rows in the result set of a join query.
JOIN conditions are classified into equal-value join conditions (for example, t1.a = t2.b) and nonequal-value join conditions (for example, t1.a < t2.b). Equal-value join conditions allow the database to use efficient join algorithms, such as MERGE JOIN (MJ) and HASH JOIN (HJ).
To perform a join, OceanBase Database extracts rows from different tables and combines them into pairs based on the join condition. Then, it matches the pairs based on the join condition. To perform a multi-table join, OceanBase Database first joins two tables based on their join conditions, and then joins the result with another table based on the join conditions of the involved columns of the two tables. The optimizer determines the join order of tables in OceanBase Database based on the join conditions, base table indexes, and available statistics.
Note
If the WHERE clause contains a join condition, you cannot specify a LOB column in the WHERE clause.
JOIN types
The following table lists the join methods supported by the current version of OceanBase Database.
| Join type | Description |
|---|---|
| Equi join | A join that contains an equality condition. |
| Self join | A join between a table and itself. |
| Inner join | A join that returns only the matching rows from the two tables being joined. |
| Left outer join | A join that returns all rows from the left table and the matching rows from the right table. If there is no match, the result set will contain null values. |
| Right outer join | A join that returns all rows from the right table and the matching rows from the left table. If there is no match, the result set will contain null values. |
| Full outer join | A join that returns all rows from the two tables being joined. If there is no match, the result set will contain null values. |
| Semi join | A join that can be performed only through a subquery. |
| Anti join | A join that returns all records from the left table that do not match those in the right table. |
| Cartesian product | When no join condition is specified between two tables, a query on the two tables returns the Cartesian product of the two tables. |
Equijoin
An equijoin is a join that contains an equality operator. In an equijoin, rows that meet the join condition are combined and output.
Self-join
A self-join is a join between two instances of the same table. The table appears twice in the FROM clause, followed by table aliases that are used to distinguish the columns in the join condition. During a self-join, OceanBase Database combines rows that meet the join condition and returns the combined rows.
Cartesian product
If the join condition is omitted in the query involving two tables, OceanBase Database returns the Cartesian product of the two tables, combining each row in the first table with every row in the second table. The Cartesian product generates a large number of rows, most of which are useless data. For example, the Cartesian product of two tables, each with 100 rows, contains 10,000 rows. Therefore, OceanBase Database does not recommend that you omit the join condition in your query to avoid returning the Cartesian product.
If the join condition is omitted in the query involving three or more tables, the optimizer will determine the join order to avoid generating Cartesian products.
Inner join
An inner join is the most basic join operation in a database.
In an inner join, columns from the two tables (A and B) are combined based on the join condition. The combined columns are stored in a result table. The query compares each row in table A with each row in table B and retains the rows that meet the join condition. The matching rows from tables A and B are combined (placed side by side) into a single row in the result set. The result set of an inner join contains all rows that meet the join condition. In other words, it contains the Cartesian product of the two tables after the tables are filtered based on the join condition.
Outer join
An outer join returns all rows that meet the join condition, and also returns rows from one table that do not have matches in the other table. NULL values are filled in the corresponding positions of the other table.
An outer join can be a left outer join, right outer join, or full outer join. In a left outer join (LEFT [OUTER] JOIN), NULL values are automatically filled in the right table when a row in the left table does not have a match. In a right outer join (RIGHT [OUTER] JOIN), NULL values are automatically filled in the left table when a row in the right table does not have a match. In a full outer join (FULL [OUTER] JOIN), NULL values are filled in both tables when a row in either table does not have a match.
OceanBase Database also supports the outer join symbol (+). A left outer join is specified by applying the outer join symbol (+) to all columns of the right table in the join condition in the WHERE clause; a right outer join is specified by applying the outer join symbol (+) to all columns of the left table in the WHERE clause. The following rules and limitations apply when you use the outer join symbol (+):
If the query block contains a
FROMclause that uses theJOINsyntax, the outer join symbol (+) cannot be specified.The outer join symbol (+) can be used only in the
WHEREclause, or in a left-related context in theFROMclause (where theTABLEclause is specified), and it can be applied only to a single column in a table or view.If multiple join conditions are specified between the left and right tables, the outer join symbol (+) must be specified in all conditions. Otherwise, the database will return only the rows in the result set of an inner join and no warning or error will be returned.
The outer join symbol (+) cannot be used to outer join a table to itself.
The outer join operator (+) can be applied only to columns, not to expressions.
WHEREconditions that contain the outer join symbol (+) cannot be combined withORorINconditions.In a query that performs an outer join on more than two tables, a table can be generated by another table. Therefore, the outer join symbol (+) cannot be applied to columns in the same table in multiple join conditions, namely, in the join conditions between the left and right tables and between the right table and the next table.
SEMI join
A LEFT SEMI JOIN or RIGHT SEMI JOIN returns all rows in the A or B table that have matches in the B or A table. The SEMI JOIN operator can be applied only through subqueries.
ANTI join
A LEFT ANTI JOIN or RIGHT ANTI JOIN returns all rows in the A or B table that do not have matches in the B or A table. The ANTI JOIN operator can be applied only through subqueries.
Examples
Create tables named table_a and table_b and insert data into the tables.
obclient> CREATE TABLE table_a(PK INT, name VARCHAR(25));
obclient> INSERT INTO table_a VALUES(1,'Fox');
obclient> INSERT INTO table_a VALUES(2,'Police');
obclient> INSERT INTO table_a VALUES(3,'Taxi');
obclient> INSERT INTO table_a VALUES(4,'Lincoln');
obclient> INSERT INTO table_a VALUES(5,'Arizona');
obclient> INSERT INTO table_a VALUES(6,'Washington');
obclient> INSERT INTO table_a VALUES(7,'Dell');
obclient> INSERT INTO table_a VALUES(10,'Lucent');
obclient> CREATE TABLE table_b(PK INT, name VARCHAR(25));
obclient> INSERT INTO table_b VALUES(1,'Fox');
obclient> INSERT INTO table_b VALUES(2,'Police');
obclient> INSERT INTO table_b VALUES(3,'Taxi');
obclient> INSERT INTO table_b VALUES(6,'Washington');
obclient> INSERT INTO table_b VALUES(7,'Dell');
obclient> INSERT INTO table_b VALUES(8,'Microsoft');
obclient> INSERT INTO table_b VALUES(9,'Apple');
obclient> INSERT INTO table_b VALUES(11,'Scottish Whisky');
Self-join query example
obclient> SELECT * FROM table_a ta, table_a tb WHERE ta.NAME = tb.NAME;
The expected result is as follows:
+------+------------+------+------------+
| PK | NAME | PK | NAME |
+------+------------+------+------------+
| 1 | Fox | 1 | Fox |
| 2 | Police | 2 | Police |
| 3 | Taxi | 3 | Taxi |
| 4 | Lincoln | 4 | Lincoln |
| 5 | Arizona | 5 | Arizona |
| 6 | Washington | 6 | Washington |
| 7 | Dell | 7 | Dell |
| 10 | Lucent | 10 | Lucent |
+------+------------+------+------------+
8 rows in set (0.013 sec)
Inner join query example
obclient> SELECT A.PK AS A_PK, A.name AS A_Value, B.PK AS B_PK, B.name AS B_Value
FROM table_a A INNER JOIN table_b B ON A.PK = B.PK;
The expected result is as follows:
+------+------------+------+------------+
| A_PK | A_VALUE | B_PK | B_VALUE |
+------+------------+------+------------+
| 1 | Fox | 1 | Fox |
| 2 | Police | 2 | Police |
| 3 | Taxi | 3 | Taxi |
| 6 | Washington | 6 | Washington |
| 7 | Dell | 7 | Dell |
+------+------------+------+------------+
5 rows in set (0.017 sec)
Left join query example
obclient> SELECT A.PK AS A_PK, A.name AS A_Value, B.PK AS B_PK, B.name AS B_Value
FROM table_a A LEFT JOIN table_b B ON A.PK = B.PK;
The expected result is as follows:
+------+------------+------+------------+
| A_PK | A_VALUE | B_PK | B_VALUE |
+------+------------+------+------------+
| 1 | Fox | 1 | Fox |
| 2 | Police | 2 | Police |
| 3 | Taxi | 3 | Taxi |
| 6 | Washington | 6 | Washington |
| 7 | Dell | 7 | Dell |
| 4 | Lincoln | NULL | NULL |
| 5 | Arizona | NULL | NULL |
| 10 | Lucent | NULL | NULL |
+------+------------+------+------------+
8 rows in set (0.006 sec)
obclient> SELECT A.PK AS A_PK, A.name AS A_Value, B.PK AS B_PK, B.name AS B_Value
FROM table_a A,table_b B WHERE A.PK = B.PK(+);
The expected result is as follows:
+------+------------+------+------------+
| A_PK | A_VALUE | B_PK | B_VALUE |
+------+------------+------+------------+
| 1 | Fox | 1 | Fox |
| 2 | Police | 2 | Police |
| 3 | Taxi | 3 | Taxi |
| 6 | Washington | 6 | Washington |
| 7 | Dell | 7 | Dell |
| 4 | Lincoln | NULL | NULL |
| 5 | Arizona | NULL | NULL |
| 10 | Lucent | NULL | NULL |
+------+------------+------+------------+
8 rows in set (0.008 sec)
Right join query example
obclient> SELECT A.PK AS A_PK, A.name AS A_Value, B.PK AS B_PK, B.name AS B_Value
FROM table_a A RIGHT JOIN table_b B ON A.PK = B.PK;
The expected result is as follows:
+------+------------+------+-----------------+
| A_PK | A_VALUE | B_PK | B_VALUE |
+------+------------+------+-----------------+
| 1 | Fox | 1 | Fox |
| 2 | Police | 2 | Police |
| 3 | Taxi | 3 | Taxi |
| 6 | Washington | 6 | Washington |
| 7 | Dell | 7 | Dell |
| NULL | NULL | 11 | Scottish Whisky |
| NULL | NULL | 9 | Apple |
| NULL | NULL | 8 | Microsoft |
+------+------------+------+-----------------+
8 rows in set (0.005 sec)
obclient> SELECT A.PK AS A_PK, A.name AS A_Value, B.PK AS B_PK, B.name AS B_Value
FROM table_a A,table_b B WHERE A.PK(+) = B.PK;
The expected result is as follows:
+------+------------+------+-----------------+
| A_PK | A_VALUE | B_PK | B_VALUE |
+------+------------+------+-----------------+
| 1 | Fox | 1 | Fox |
| 2 | Police | 2 | Police |
| 3 | Taxi | 3 | Taxi |
| 6 | Washington | 6 | Washington |
| 7 | Dell | 7 | Dell |
| NULL | NULL | 11 | Scottish Whisky |
| NULL | NULL | 9 | Apple |
| NULL | NULL | 8 | Microsoft |
+------+------------+------+-----------------+
8 rows in set (0.005 sec)
Full join query example
obclient> SELECT A.PK AS A_PK,A.name AS A_Value,B.PK AS B_PK,B.name AS B_Value
FROM table_a A FULL JOIN table_b B ON A.PK = B.PK;
The expected result is as follows:
+------+------------+------+-----------------+
| A_PK | A_VALUE | B_PK | B_VALUE |
+------+------------+------+-----------------+
| 1 | Fox | 1 | Fox |
| 2 | Police | 2 | Police |
| 3 | Taxi | 3 | Taxi |
| 6 | Washington | 6 | Washington |
| 7 | Dell | 7 | Dell |
| NULL | NULL | 8 | Microsoft |
| NULL | NULL | 9 | Apple |
| NULL | NULL | 11 | Scottish Whisky |
| 4 | Lincoln | NULL | NULL |
| 5 | Arizona | NULL | NULL |
| 10 | Lucent | NULL | NULL |
+------+------------+------+-----------------+
11 rows in set (0.005 sec)
SEMI join query example
Expand and rewrite the dependent subquery into a SEMI JOIN.
obclient> EXPLAIN SELECT * FROM table_a t1 WHERE t1.PK IN (SELECT t2.PK FROM table_b t2
WHERE t2.NAME = t1.NAME);
The expected result is as follows:
+---------------------------------------------------------------------------+
| Query Plan |
+---------------------------------------------------------------------------+
| ================================================= |
| |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| |
| ------------------------------------------------- |
| |0 |HASH SEMI JOIN | |8 |11 | |
| |1 |├─TABLE FULL SCAN|T1 |8 |3 | |
| |2 |└─TABLE FULL SCAN|T2 |8 |3 | |
| ================================================= |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([T1.PK], [T1.NAME]), filter(nil), rowset=16 |
| equal_conds([T2.NAME = T1.NAME], [T1.PK = T2.PK]), other_conds(nil) |
| 1 - output([T1.NAME], [T1.PK]), filter(nil), rowset=16 |
| access([T1.NAME], [T1.PK]), partitions(p0) |
| is_index_back=false, is_global_index=false, |
| range_key([T1.__pk_increment]), range(MIN ; MAX)always true |
| 2 - output([T2.NAME], [T2.PK]), filter(nil), rowset=16 |
| access([T2.NAME], [T2.PK]), partitions(p0) |
| is_index_back=false, is_global_index=false, |
| range_key([T2.__pk_increment]), range(MIN ; MAX)always true |
+---------------------------------------------------------------------------+
19 rows in set (0.006 sec)
ANTI join query example
Rewrite the dependent subquery into an ANTI JOIN.
obclient> EXPLAIN SELECT * FROM table_a t1 WHERE t1.PK NOT IN (SELECT t2.PK
FROM table_b t2 WHERE t2.name = t1.name);
The expected result is as follows:
+---------------------------------------------------------------------------------------------------------------+
| Query Plan |
+---------------------------------------------------------------------------------------------------------------+
| ================================================= |
| |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| |
| ------------------------------------------------- |
| |0 |HASH ANTI JOIN | |1 |11 | |
| |1 |├─TABLE FULL SCAN|T1 |8 |3 | |
| |2 |└─TABLE FULL SCAN|T2 |8 |3 | |
| ================================================= |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([T1.PK], [T1.NAME]), filter(nil), rowset=16 |
| equal_conds([T2.NAME = T1.NAME]), other_conds([(T_OP_OR, T1.PK = T2.PK, T1.PK IS NULL, T2.PK IS NULL)]) |
| 1 - output([T1.NAME], [T1.PK]), filter(nil), rowset=16 |
| access([T1.NAME], [T1.PK]), partitions(p0) |
| is_index_back=false, is_global_index=false, |
| range_key([T1.__pk_increment]), range(MIN ; MAX)always true |
| 2 - output([T2.NAME], [T2.PK]), filter(nil), rowset=16 |
| access([T2.NAME], [T2.PK]), partitions(p0) |
| is_index_back=false, is_global_index=false, |
| range_key([T2.__pk_increment]), range(MIN ; MAX)always true |
+---------------------------------------------------------------------------------------------------------------+
19 rows in set (0.008 sec)