Join statements in a database are used to combine two or more tables in the database based on join conditions. The set generated by a join can be saved as a table or used as a table.
The meaning of a join statement is to combine the attributes of two tables by their values. Common join types in a database include inner joins, outer joins, semi-joins, and anti-joins. Semi-joins and anti-joins are derived through subquery rewriting, and SQL does not have specific syntax for expressing semi-joins or anti-joins.
Join conditions
Join conditions can be categorized into equi-joins (e.g., t1.a = t2.b) and non-equi-joins (e.g., t1.a < t2.b). Compared to non-equi-join conditions, equi-join conditions allow the use of efficient join algorithms in the database, such as hash joins and merge-sort joins.
Self-Join
A self-join is a join operation where a table is joined with itself. Here is an example of a self-join.
obclient> CREATE TABLE t1(a INT PRIMARY KEY, b INT, c INT);
Query OK, 0 rows affected
obclient> SELECT * FROM t1 AS ta, t1 AS tb WHERE ta.b = tb.b;
Inner-Join
An inner join is the most fundamental type of join operation in a database. An inner join combines columns from two tables (such as A and B) based on the join condition to generate a new result table. The query compares each row in table A with each row in table B and identifies combinations that satisfy the join condition. When the join condition is met, matching rows from tables A and B are combined (side by side) into a single row in the result set. The resulting set from a join can be defined as first performing a Cartesian product (cross join) of the two tables (combining each row in A with each row in B), and then returning only the records that satisfy the join condition.
obclient> CREATE TABLE t1(c1 INT,c2 INT);
Query OK, 0 rows affected
obclient> CREATE TABLE t2(c1 INT,c2 INT);
Query OK, 0 rows affected
obclient> SELECT * FROM t1 JOIN t2 USING(c1);
Outer-Join
An outer join does not require every record in one table to have a matching record in the other table. The table that retains all records, even those without a match, is called the preserving table.
Outer joins can be further classified into left outer joins, right outer joins, and full outer joins based on which table's rows are preserved.
In a left outer join, if a row from the left table does not find a match in the right table, the right table automatically fills in
NULL.In a right outer join, if a row from the right table does not find a match in the left table, the left table automatically fills in
NULL.In a full outer join, if a row from either the left or right table does not find a match,
NULLis automatically filled in.
obclient> CREATE TABLE t1(c1 INT,c2 INT);
Query OK, 0 rows affected
obclient> CREATE TABLE t2(c1 INT,c2 INT);
Query OK, 0 rows affected
obclient> SELECT * FROM t1 LEFT JOIN t2 ON t1.c1 = t2.c1;
obclient> SELECT * FROM t1 RIGHT JOIN t2 ON t1.c1 = t2.c1;
obclient> SELECT * FROM t1 FULL JOIN t2 ON t1.c1 = t2.c1;
Semi-Join
When performing a left or right semi-join between tables A and B, it returns all rows from table A that have a match in table B or all rows from table B that have a match in table A.
Semi-joins can only be derived through subquery expansion, as shown below.
obclient> CREATE TABLE t1(a INT PRIMARY KEY, b INT, c INT);
Query OK, 0 rows affected
obclient> CREATE TABLE t2(a INT PRIMARY KEY, b INT, c INT);
Query OK, 0 rows affected
obclient> INSERT INTO t1 VALUES (1, 1, 1),(2, 2, 2);
obclient> INSERT INTO t2 VALUES (1, 1, 1),(2, 2, 2);
obclient> SELECT * FROM t1 WHERE t1.a IN (SELECT t2.b FROM t2 WHERE t2.c = t1.c);
When viewing the query plan using EXPLAIN, you can see that dependent subqueries are expanded and rewritten into semi-joins.
obclient> EXPLAIN SELECT * FROM t1 WHERE t1.a IN (SELECT t2.b FROM t2 WHERE t2.c = t1.c);
| ========================================
|ID|OPERATOR |NAME|EST. ROWS|COST|
----------------------------------------
|0 |MERGE SEMI JOIN| |2 |76 |
|1 | TABLE SCAN |t1 |2 |37 |
|2 | SORT | |2 |38 |
|3 | TABLE SCAN |t2 |2 |37 |
========================================
...
Anti-Join
When performing a left or right anti-join between tables A and B, it returns all rows from table A that do not have a match in table B or all rows from table B that do not have a match in table A.
Similar to semi-joins, anti-joins can only be derived through subquery expansion, as shown below.
obclient> CREATE TABLE t1(a INT PRIMARY KEY, b INT, c INT);
Query OK, 0 rows affected
obclient> CREATE TABLE t2(a INT PRIMARY KEY, b INT, c INT);
Query OK, 0 rows affected
obclient> INSERT INTO t1 VALUES (1, 1, 1),(2, 2, 2);
obclient> INSERT INTO t2 VALUES (1, 1, 1),(2, 2, 2);
obclient> SELECT * FROM t1 WHERE t1.a NOT IN (SELECT t2.b FROM t2 WHERE t2.c = t1.c);
When viewing the query plan using EXPLAIN, you can see that dependent subqueries are rewritten into anti-joins.
obclient> EXPLAIN SELECT * FROM t1 WHERE t1.a NOT IN (SELECT t2.b FROM t2 WHERE t2.c = t1.c);
| =============================================
|ID|OPERATOR |NAME|EST. ROWS|COST|
---------------------------------------------
|0 |HASH RIGHT ANTI JOIN| |0 |77 |
|1 | TABLE SCAN |t2 |2 |37 |
|2 | TABLE SCAN |t1 |2 |37 |
=============================================
...
