A subquery is a query that is nested within another query. The outer query is also known as the parent query. The results of the subquery are passed back to the parent query as input. The parent query then uses this value in its computation to determine the final output.
SQL allows for multiple levels of nested queries, meaning that a subquery can contain other subqueries. Additionally, subqueries can appear in various clauses of an SQL statement, such as SELECT, FROM, and WHERE.
Subqueries
In a database, subqueries can be categorized into correlated and non-correlated subqueries. A correlated subquery is one whose execution depends on variables from the outer query. Therefore, such subqueries are typically evaluated multiple times. A non-correlated subquery, on the other hand, does not depend on variables from the outer query and is generally evaluated only once. The following examples illustrate both types of subqueries.
Non-correlated subquery:
obclient> SELECT * FROM t1 WHERE t1.a IN (SELECT t2.a FROM t2);Correlated subquery that uses the outer query variable
t1.b:obclient> SELECT * FROM t1 WHERE t1.a IN (SELECT t2.a FROM T2 WHERE t2.b = t1.b);
Scalar subqueries
A scalar subquery is one that returns a single element. The result of a scalar subquery can be used as a scalar operand and possesses the fundamental characteristics of scalar operands, such as data type, length, and whether it can be NULL. For example, the following query returns a result that can be used as a scalar operand:
obclient> CREATE TABLE t1 (c1 INTEGER, c2 VARCHAR(20) NOT NULL);
obclient> INSERT INTO t1 VALUES (1, 'OceanBase');
obclient> SELECT (SELECT c2 FROM t1);
+---------------------+
| (SELECT c2 FROM t1) |
+---------------------+
| OceanBase |
+---------------------+
1 row in set
The query above returns a value of type VARCHAR with a length of 20, specifically OceanBase, using the default character set and collation of the tenant. The nullability of the value selected by the scalar subquery is not copied. Although the c2 column has a NOT NULL constraint, if the subquery result is empty, the result will still be NULL.
Although scalar subqueries return a single element, they cannot be used in all statements. If a statement only allows literals, a scalar subquery cannot be used as a replacement.
The following examples help you better understand scalar subqueries.
The query returns 2 because the table t2 contains a column named c1 that is the same as the column c1 in the table t1.
obclient> CREATE TABLE t1 (c1 INT);
obclient> INSERT INTO t1 VALUES (1);
obclient> CREATE TABLE t2 (c1 INT);
obclient> INSERT INTO t2 VALUES (2);
obclient> SELECT (SELECT c1 FROM t2) FROM t1;
+---------------------+
| (SELECT s1 FROM t2) |
+---------------------+
| 2 |
+---------------------+
1 row in set
Additionally, scalar subqueries can be part of an expression, such as:
obclient> CREATE TABLE t1 (c1 INTEGER, c2 VARCHAR(20) NOT NULL);
obclient> INSERT INTO t1 VALUES (1, 'OceanBase');
obclient> SELECT HEX((SELECT c2 FROM t1 WHERE c1=1)) FROM t1;
+-------------------------------------+
| HEX((SELECT c2 FROM t1 WHERE c1=1)) |
+-------------------------------------+
| 4F6365616E42617365 |
+-------------------------------------+
1 row in set
Comparing subqueries
Common syntax for subqueries is as follows:
operand operator {ANY | SOME | ALL} (subquery)
operand LIKE (subquery)
operand {IN | NOT IN | EXISTS | NOT EXISTS} (subquery)
In this syntax, operand is the operand used to compare with the result of the subquery, and operator can be one of the following operators:
Equal to (
=or<=>)Greater than (
>)Less than (
<)Greater than or equal to (
>=)Less than or equal to (
<=)Not equal to (
!=or<>)LIKEfor character or string comparisons.
For example, the following subquery is used to find values in the c1 column of t1 that are the same as the maximum value in the c2 column of t2. This query cannot be rewritten as a join between t1 and t2.
obclient> CREATE TABLE t1 (c1 INT);
obclient> INSERT INTO t1 VALUES (1);
obclient> CREATE TABLE t2 (c2 INT);
obclient> INSERT INTO t2 VALUES (2);
obclient> SELECT * FROM t1 WHERE c1 = (SELECT MAX(c2) FROM t2);
Notice
When comparing with a scalar, the subquery must return a scalar. When comparing with a row constructor, the subquery must be a row subquery and return a row with the same properties as the row constructor.
Clearly, only the results of scalar subqueries can be directly compared. If a subquery returns a set of values, the set must be processed. OceanBase Database provides the following methods to convert a set into a scalar value:
Aggregate functions: Aggregate all values in the set into one value and compare it with the operand.
ANY | SOME: Compare all values in the set with the operand. If any row satisfies the comparison condition, it returnsTRUE.SOMEis an alias forANY.ALL: Compare all values in the set with the operand. If all rows satisfy the comparison condition, it returnsTRUE.IN: ReturnsTRUEif the operand is in the set.INis equivalent to= ANY.NOT IN: ReturnsTRUEif the operand is not in the set.NOT INis equivalent to<> ALL.EXIST: ReturnsTRUEif the subquery returns any rows.NOT EXIST: ReturnsTRUEif the subquery returns no rows.
Row subqueries
Scalar subqueries or column subqueries return scalar or single-column values, while row subqueries return a single row, allowing for multiple column values.
For example, the following subquery returns only one row:
obclient> CREATE TABLE t1(id INTEGER PRIMARY KEY,c1 INTEGER,c2 INTEGER);
obclient> CREATE TABLE t2(id INTEGER PRIMARY KEY,c3 INTEGER,c4 INTEGER);
obclient> SELECT * FROM t1
WHERE ROW(c1,c2) = (SELECT c3, c4 FROM t2 WHERE id = 10);
If the values of c3 and c4 match any pair of values in c1 and c2 of t1, it returns all matching c1 and c2 values. If no matches are found, it returns an empty set. Additionally, if no WHERE condition is specified in the subquery, it may return multiple rows, resulting in an error. A row subquery can only return a single row.
The expression ROW(c1,c2) can be abbreviated as (c1,c2), and ROW() is called a row constructor. The row constructor and the row returned by the subquery must contain the same number of values. A row constructor is used to compare with a subquery that returns two or more columns. A row constructor cannot be used with a column subquery that returns a single column. For example, the following query will return a syntax error:
obclient> SELECT * FROM t1 WHERE ROW(1) = (SELECT column1 FROM t2);
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your OceanBase version for the right syntax to use near ') = (SELECT c1 FROM t2)' at line 1
In the optimizer, a row constructor is expanded. For example, the following two SQL statements are equivalent:
obclient> SELECT * FROM t1 WHERE (c1,c2) = (1,1);
obclient> SELECT * FROM t1 WHERE c1 = 1 AND c2 = 1;
Correlated subqueries
A correlated subquery is one that references columns from the outer query. For example, the following subquery references the c2 column of t1 in its WHERE clause:
obclient> SELECT * FROM t1 WHERE c1 IN (SELECT c3 FROM t2 WHERE t2.c4 = t1.c2);
For such statements, the query is evaluated from the innermost subquery outward. For example, in the following query, x.c2 must be a column in t2 and not in t1. This is because the SELECT c1 FROM t2 AS x statement first renames t2, while SELECT c1 FROM t1 is the outer query.
obclient> SELECT c1 FROM t1 AS x
WHERE x.c1 = (SELECT c1 FROM t2 AS x
WHERE x.c1 = (SELECT c1 FROM t3
WHERE x.c2 = t3.c1));
For subqueries in HAVING or ORDER BY clauses, OceanBase Database also searches for column names in the outer SELECT list.
In a correlated subquery, if an aggregate function contains external references, it can only contain external references and cannot be nested within another function or expression.
Derived tables
A derived table is an expression that generates a table within the FROM clause of a query. For example, a query in the FROM clause of a SELECT statement can be a derived table:
SELECT ... FROM (subquery) [AS] table_name ...
As a derived table in the FROM clause, [AS] table_name is no longer an optional alias but a required name for the result set of the subquery. Any column in a derived table must have a unique name.
Derived tables are often used for step-by-step calculations. For example, the following query calculates the average value for each group after grouping:
obclient> SELECT AVG(sum_c1)
FROM (SELECT SUM(c1) AS sum_cumn1
FROM t1 GROUP BY c1) AS t1;
A derived table can return a scalar, a column, a row, or a table.
Derived tables have the following limitations:
A derived table cannot be a correlated subquery.
A derived table cannot reference other tables in the same
SELECTstatement.A derived table cannot contain external references.
Lateral derived tables
A lateral derived table is a table that can reference previously defined tables or derived tables in the FROM clause.
Lateral derived tables provide more flexibility and efficiency when using subqueries, especially when the subquery depends on columns from the outer query. The LATERAL keyword was introduced in OceanBase Database (MySQL mode) V4.2.2 to support this feature.
Here is an example:
SELECT *
FROM A, LATERAL (SELECT * FROM B WHERE B.col1 = A.col1) AS derived_table;
For more information about lateral derived tables, see Lateral derived tables.
Subquery unnesting
Subquery unnesting is a database optimization strategy that moves some subqueries into the outer parent query. Essentially, it converts certain subqueries into equivalent multi-table join operations. One clear benefit of this strategy is that it can effectively utilize certain access paths, join methods, and join orders, thereby minimizing the number of query levels. The following example demonstrates subquery unnesting, where a subquery is rewritten as a join statement.
obclient> CREATE TABLE t1(a INT PRIMARY KEY, b INT, c INT);
Query OK, 0 rows affected
obclient> CREATE TABLE t2(a INT PRIMARY KEY, b INT, c INT);
Query OK, 0 rows affected
Subqueries without dependencies.
obclient> EXPLAIN SELECT * FROM t1 WHERE t1.a IN (SELECT t2.a FROM t2); | ====================================== |ID|OPERATOR |NAME|EST. ROWS|COST | -------------------------------------- |0 |MERGE JOIN | |100001 |131664| |1 | TABLE SCAN|t1 |100000 |48372 | |2 | TABLE SCAN|t2 |100000 |41911 | ====================================== ...Subqueries with dependencies are unnested and rewritten as joins.
obclient> EXPLAIN SELECT * FROM t1 WHERE T1.A IN (SELECT T2.B FROM T2 WHERE T2.C = T1.C); | =========================================== |ID|OPERATOR |NAME |EST. ROWS|COST | -------------------------------------------- |0 |MERGE JOIN | |9703 |215436| |1 | TABLE SCAN |t1 |100000 |64066 | |2 | SORT | |10001 |129621| |3 | SUBPLAN SCAN |VIEW1|10001 |111242| |4 | HASH DISTINCT| |10001 |109862| |5 | TABLE SCAN |t2 |100000 |64066 | ============================================ ...
