A join query is a query that combines rows from two or more tables, views, or materialized views. When the FROM clause in a query specifies multiple tables, OceanBase Database performs a join query and selects a column from any table specified in the FROM clause as the output column. If two or more tables have an identical column name, you must define all references to this column in the query by using the table names.
Join types
Joins in OceanBase Database include inner joins, outer joins, and semi/anti-joins, among which semi-joins and anti-joins are implemented by rewriting subqueries. OceanBase Database does not provide syntaxes for semi-joins or anti-joins.
Join conditions
A join condition is a rule that joins multiple tables. Most joins contain at least one join condition in the FROM or WHERE clause to compare two columns from different tables. The WHERE clause may contain other conditions in addition to join conditions. The conditions that reference only one table can further restrict the number of rows returned by a join query.
Join conditions can be divided into equi-join conditions (such as t1.a = t2.b) and non-equi-join conditions (such as t1.a < t2.b). Unlike non-equi-join conditions, equi-join conditions allow the database to use efficient join algorithms such as MERGE JOIN (MJ) and HASH JOIN (HJ).
OceanBase Database extracts and pairs rows from different tables and matches them by using join conditions. To join more than two tables, OceanBase Database first joins two of them based on their column join conditions, and then joins the results to a new table based on the join conditions between columns of the joined table and the new table. The optimizer determines the join order based on the join conditions, base table indexes, and available statistics.
Note
If the WHERE clause contains join conditions, you cannot specify a LOB column in the WHERE clause.
Join methods
The following table lists the join methods supported by the current version of OceanBase Database.
| Join type | Form of expression | Description |
|---|---|---|
| Equi-join | Equijoin | A join with a join condition containing an equality operator. |
| Self-join | SELF JOIN | A join of a table to itself. |
| Inner join | INNER JOIN | A join of two tables that returns only those rows that satisfy the join condition. |
| Left (outer) join | LEFT [OUTER] JOIN | A join that returns all rows in the table on the left side (to the leftmost of the JOIN clause), excluding the unmatching rows in the table on the right side. |
| Right (outer) join | RIGHT [OUTER] JOIN | A join that returns all rows in the table on the right side (to the rightmost of the JOIN clause), excluding the unmatched rows in the table on the left side. |
| Full (outer) join | FULL [OUTER] JOIN | A join that returns all rows in all joined tables, regardless of whether the rows are matching. |
| Semi-join | SEMI JOIN | A join that can be obtained only by expanding and rewriting a subquery. |
| Anti-join | ANTI JOIN | A join that can be obtained only by expanding and rewriting a subquery. |
| Cartesian product | Cartesian Product | A query on two tables that are not joined returns the Cartesian product of the two tables. |
Equi-join
An equi-join is a join that contains an equality operator. An equi-join combines rows of specific columns that meet the equivalent conditions and outputs them.
Self-join
A self-join is a join of a table to itself. The table is specified in the FROM clause twice, followed by a table alias that qualifies the column names in the join condition. When executing a self-join, OceanBase Database combines and returns the rows that meet the join condition.
Cartesian product
If a join query does not have a join condition for two tables to be joined, OceanBase Database returns the Cartesian product of the two tables, which is a result set where each row in the first table is paired with each row in the second table. The Cartesian product contains many rows, most of which are useless. For example, the Cartesian product of two 100-row tables contains 10,000 rows. Therefore, we recommend that your query contain at least one join condition to avoid returning the Cartesian product, unless otherwise needed.
If a query joins three or more tables but no join condition is specified, the optimizer can select a join order to avoid generating Cartesian products.
Inner join
An inner join is the most basic join operation in a database.
An inner join combines the columns of two tables (such as tables A and B) based on the join conditions to generate a new result table. The query compares each row of Table A with each row of Table B and returns the combinations that meet the join conditions. When the join conditions are met, a row in Table A that matches a row in Table B are paired by column (aligned) into one row in the result set. The join first generates the Cartesian product of the two tables, where each row in Table A is paired with each row in Table B, and then returns records that meet the join conditions.
Outer join
An outer join returns all the rows that meet the join conditions. In addition, it returns unused rows of one table and fills NULL in the corresponding positions in the other table.
Outer joins can be further divided into left (outer) joins, right (outer) joins, and full (outer) joins based on whether the rows in the table on the left side, right side, or tables on both sides are retained. In a left (outer) join, if a row in the table on the left side does not have a matching row in the table on the right side, the table on the right side is automatically filled with NULL. In a right (outer) join, if a row in the table on the right side does not have a matching row in the table on the left side, the table on the left side is automatically filled with NULL. In a full (outer) join, if a row in either table does not have a matching the row in the other table, the latter is automatically filled with NULL.
OceanBase Database also supports the outer join sign (+). For a left join, the outer join sign (+) is applied for all columns in the right-side table in the join condition of the WHERE clause. For a right join, the outer join sign (+) is applied for all columns in the left-side table in the join condition of the WHERE clause. Take note of the following rules and constraints when you use the outer join sign (+):
If a query block contains the
JOINsyntax in theFROMclause, the outer join sign (+) cannot be specified.The outer join sign (+) can appear only in the
WHEREclause or in the context of left-correlation (whenTABLEclause is specified) in theFROMclause, and can apply only to one column of a table or view.If the left-side and right-side tables are joined by using multiple join conditions, the outer join sign (+) must be used in all conditions. Otherwise, the database returns only the row data generated by simple joins and does not report warnings or errors.
You cannot use the outer join sign (+) to join a table to itself.
The outer join sign (+) can be applied only to columns rather than expressions.
The
WHEREcondition containing an outer join sign (+) cannot be used together with a condition that uses theORorINlogical operator.In a query that executes an outer join for more than two pairs of tables, a table can be only an empty table generated for another table. Therefore, in the join conditions of A and B and those of B and C, the outer join sign (+) cannot be applied to columns of B. In other words, the outer join operator (+) cannot be applied to common columns in multiple join conditions.
Semi-join
A left semi-join or right semi-join for Table A and Table B returns only rows in Table A that match rows in Table B and rows in Table B that match rows in Table A. You can get a semi-join query only by expanding and rewriting a subquery.
Anti-join
A left anti-join or right anti-join for Table A and Table B returns only rows in Table A that do not match rows in Table B and rows in Table B that do not match rows in Table A. Similar to a semi-join, you can get an anti-join query only by expanding and rewriting a subquery.
Examples
Create tables named table_a and table_b and insert data into them.
CREATE TABLE table_a(PK INT, name VARCHAR(25));
INSERT INTO table_a VALUES(1,'Fox');
INSERT INTO table_a VALUES(2,'Police');
INSERT INTO table_a VALUES(3,'Taxi');
INSERT INTO table_a VALUES(4,'Lincoln');
INSERT INTO table_a VALUES(5,'Arizona');
INSERT INTO table_a VALUES(6,'Washington');
INSERT INTO table_a VALUES(7,'Dell');
INSERT INTO table_a VALUES(10,'Lucent');
CREATE TABLE table_b(PK INT, name VARCHAR(25));
INSERT INTO table_b VALUES(1,'Fox');
INSERT INTO table_b VALUES(2,'Police');
INSERT INTO table_b VALUES(3,'Taxi');
INSERT INTO table_b VALUES(6,'Washington');
INSERT INTO table_b VALUES(7,'Dell');
INSERT INTO table_b VALUES(8,'Microsoft');
INSERT INTO table_b VALUES(9,'Apple');
INSERT INTO table_b VALUES(11,'Scotch whisky');
Sample self-join query
obclient> SELECT * FROM table_a ta, table_a tb WHERE ta.name = tb.name;
+------+-----------------+------+-----------------+
| PK | NAME | PK | NAME |
+------+-----------------+------+-----------------+
| 5 | Arizona | 5 | Arizona |
| 6 | Washington | 6 | Washington |
| 7 | Dell | 7 | Dell |
| 10 | Lucent | 10 | Lucent |
| 4 | Lincoln | 4 | Lincoln |
| 3 | Taxi | 3 | Taxi |
| 1 | Fox | 1 | Fox |
| 2 | Police | 2 | Police |
+------+-----------------+------+-----------------+
8 rows in set
Sample inner join query
obclient> SELECT a.PK AS a_PK, a.name AS a_value, b.PK AS b_PK, b.name AS b_value
FROM table_a a INNER JOIN table_b b ON a.PK = b.PK;37
+------+-----------+------+-----------+
| A_PK | A_VALUE | B_PK | B_VALUE |
+------+-----------+------+-----------+
| 1 | Fox | 1 | Fox |
| 2 | Police | 2 | Police |
| 3 | Taxi | 3 | Taxi |
| 6 | Washington| 6 | Washington|
| 7 | Dell | 7 | Dell |
+------+-----------+------+-----------+
5 rows in set
Sample left join query
obclient> SELECT a.PK AS a_PK, a.name AS a_value, b.PK AS b_PK, b.name AS b_value
FROM table_a a LEFT JOIN table_b b ON a.PK = b.PK;
+------+-----------------+------+-----------+
| A_PK | A_VALUE | B_PK | B_VALUE |
+------+-----------------+------+-----------+
| 1 | Fox | 1 | Fox |
| 2 | Police | 2 | Police |
| 3 | Taxi | 3 | Taxi |
| 6 | Washington | 6 | Washington|
| 7 | Dell | 7 | Dell |
| 4 | Lincoln | NULL | NULL |
| 5 | Arizona | NULL | NULL |
| 10 | Lucent | NULL | NULL |
+------+-----------------+------+-----------+
8 rows in set
obclient> SELECT a.PK AS a_PK, a.name AS a_value, b.PK AS b_PK, b.name AS b_value
FROM table_a a,table_b b WHERE a.PK = b.PK(+);
+------+-----------------+------+-----------+
| A_PK | A_VALUE | B_PK | B_VALUE |
+------+-----------------+------+-----------+
| 1 | Fox | 1 | Fox |
| 2 | Police | 2 | Police |
| 3 | Taxi | 3 | Taxi |
| 6 | Washington | 6 | Washington|
| 7 | Dell | 7 | Dell |
| 4 | Lincoln | NULL | NULL |
| 5 | Arizona | NULL | NULL |
| 10 | Lucent | NULL | NULL |
+------+-----------------+------+-----------+
8 rows in set
Sample right join query
obclient> SELECT a.PK AS a_PK, a.name AS a_value, b.PK AS b_PK, b.name AS b_value
FROM table_a a RIGHT JOIN table_b b ON a.PK = b.PK;
+------+-----------+------+--------------------+
| A_PK | A_VALUE | B_PK | B_VALUE |
+------+-----------+------+--------------------+
| 1 | Fox | 1 | Fox |
| 2 | Police | 2 | Police |
| 3 | Taxi | 3 | Taxi |
| 6 | Washington| 6 | Washington |
| 7 | Dell | 7 | Dell |
| NULL | NULL | 8 | Microsoft |
| NULL | NULL | 11 | Scotch whisky |
| NULL | NULL | 9 | Apple |
+------+-----------+------+--------------------+
8 rows in set
obclient> SELECT a.PK AS a_PK, a.name AS a_value, b.PK AS b_PK, b.name AS b_value
FROM table_a a,table_b b WHERE a.PK(+) = b.PK;
+------+-----------+------+--------------------+
| A_PK | A_VALUE | B_PK | B_VALUE |
+------+-----------+------+--------------------+
| 1 | Fox | 1 | Fox |
| 2 | Police | 2 | Police |
| 3 | Taxi | 3 | Taxi |
| 6 | Washington| 6 | Washington |
| 7 | Dell | 7 | Dell |
| NULL | NULL | 8 | Microsoft |
| NULL | NULL | 11 | Scotch whisky |
| NULL | NULL | 9 | Apple |
+------+-----------+------+--------------------+
8 rows in set
Sample full (outer) join query
obclient> SELECT a.PK AS a_PK, a.name AS a_value, b.PK AS b_PK, b.name AS b_value
FROM table_a a FULL JOIN table_b b ON a.PK = b.PK;
+------+-----------------+------+--------------------+
| A_PK | A_VALUE | B_PK | B_VALUE |
+------+-----------------+------+--------------------+
| 1 | Fox | 1 | Fox |
| 2 | Police | 2 | Police |
| 3 | Taxi | 3 | Taxi |
| 6 | Washington | 6 | Washington |
| 7 | Dell | 7 | Dell |
| NULL | NULL | 8 | Microsoft |
| NULL | NULL | 9 | Apple |
| NULL | NULL | 11 | Scotch whisky |
| 4 | Lincoln | NULL | NULL |
| 5 | Arizona | NULL | NULL |
| 10 | Lucent | NULL | NULL |
+------+-----------------+------+--------------------+
11 rows in set
Sample semi-join query
Rewrite a subquery with dependencies into a semi-join query.
obclient> explain SELECT * FROM table_a t1 WHERE t1.PK IN (SELECT t2.PK FROM table_b t2
WHERE t2.name = t1.name);\G
+------------------------------------------------+
| Query Plan |
+------------------------------------------------+
=======================================
|ID|OPERATOR |NAME|EST. ROWS|COST|
---------------------------------------
|0 |HASH SEMI JOIN| |8 |114 |
|1 | TABLE SCAN |T1 |8 |38 |
|2 | TABLE SCAN |T2 |8 |38 |
=======================================
Outputs & filters:
-------------------------------------
0 - output([T1.PK], [T1.NAME]), filter(nil),
equal_conds([T1.PK = T2.PK], [T2.NAME = T1.NAME]), other_conds(nil)
1 - output([T1.NAME], [T1.PK]), filter(nil),
access([T1.NAME], [T1.PK]), partitions(p0)
2 - output([T2.NAME], [T2.PK]), filter(nil),
access([T2.NAME], [T2.PK]), partitions(p0)
+------------------------------------------------+
1 row in set
Sample anti-join query
Rewrite a subquery with dependencies into an anti-join query.
obclient> EXPLAIN SELECT * FROM table_a t1 WHERE t1.PK NOT IN (SELECT t2.PK
FROM table_b t2 WHERE t2.name = t1.name);
+------------------------------------------------+
| Query Plan |
+------------------------------------------------+
=======================================
|ID|OPERATOR |NAME|EST. ROWS|COST|
---------------------------------------
|0 |HASH ANTI JOIN| |0 |112 |
|1 | TABLE SCAN |T1 |8 |38 |
|2 | TABLE SCAN |T2 |8 |38 |
=======================================
Outputs & filters:
-------------------------------------
0 - output([T1.PK], [T1.NAME]), filter(nil),
equal_conds([T2.NAME = T1.NAME]), other_conds([(T_OP_OR, T1.PK = T2.PK,
(T_OP_IS, T1.PK, NULL, 0), (T_OP_IS, T2.PK, NULL, 0))])
1 - output([T1.NAME], [T1.PK]), filter(nil),
access([T1.NAME], [T1.PK]), partitions(p0)
2 - output([T2.NAME], [T2.PK]), filter(nil),
access([T2.NAME], [T2.PK]), partitions(p0)
+---------------------------------------------------------+
1 row in set