Subqueries|V4.3.3| docs|Distributed Database

Subqueries

Last Updated：2024-12-02 03:48:26 Updated

A subquery is a query nested in an upper-layer query. An upper-layer query is generally referred to as a parent query or outer query. The subquery result is returned as the input to the parent query or outer query. The parent query takes the subquery result into the calculation to determine the final output.

The SQL language supports multi-level nested queries, which means that one subquery can be nested with other subqueries. Meanwhile, subqueries can be included in various clauses of SQL statements, such as SELECT, FROM, and WHERE.

Subquery

In the database, subqueries can be divided into dependent subqueries and independent subqueries. A dependent subquery means that the execution of this subquery depends on variables of outer queries. Therefore, dependent queries are usually computed multiple times. An independent subquery means that the execution of this subquery does not depend on variables of outer queries. Independent subqueries are typically computed only once. The following example shows an independent subquery and a dependent subquery.

Independent subquery

obclient> SELECT * FROM t1 WHERE t1.a IN (SELECT t2.a FROM t2);

Dependent subquery that uses outer query variable t1.b

obclient> SELECT * FROM t1 WHERE t1.a IN (SELECT t2.a FROM T2 WHERE t2.b = t1.b);

Scalar subquery

A subquery that results in a single element is called a scalar subquery. The query result can be used as a scalar operand and has the basic characteristics of a scalar operand, such as the data type, length, and whether it can be NULL. For example, the result returned by the following query can be used as a scalar operand:

obclient> CREATE TABLE t1 (c1 INTEGER, c2 VARCHAR(20) NOT NULL);
obclient> INSERT INTO t1 VALUES (1, 'OceanBase');
obclient> SELECT (SELECT c2 FROM t1);
+---------------------+
| (SELECT c2 FROM t1) |
+---------------------+
| OceanBase           |
+---------------------+
1 row in set

The preceding query returns the value OceanBase, which is a VARCHAR value with a length of 20. The character set and collation are the default values of the tenant. The nullability of values selected by the scalar subquery are not copied. Although column c2 has the NOT NULL constraint, NULL is still returned if the subquery result is null.

Although a scalar subquery returns a single element, it does not apply to all statements. If a statement allows only literals, you cannot replace it with a scalar subquery.

The following example helps you better understand scalar subqueries.

This query returns 2 because tables t2 and t1 contain the same column c1.

obclient> CREATE TABLE t1 (c1 INT);
obclient> INSERT INTO t1 VALUES (1);
obclient> CREATE TABLE t2 (c1 INT);
obclient> INSERT INTO t2 VALUES (2);
obclient> SELECT (SELECT c1 FROM t2) FROM t1;
+---------------------+
| (SELECT s1 FROM t2) |
+---------------------+
|                   2 |
+---------------------+
1 row in set

In addition, a scalar subquery can be part of an expression. Here is an example:

obclient> CREATE TABLE t1 (c1 INTEGER, c2 VARCHAR(20) NOT NULL);
obclient> INSERT INTO t1 VALUES (1, 'OceanBase');
obclient> SELECT HEX((SELECT c2 FROM t1 WHERE c1=1)) FROM t1;
+-------------------------------------+
| HEX((SELECT c2 FROM t1 WHERE c1=1)) |
+-------------------------------------+
| 4F6365616E42617365                  |
+-------------------------------------+
1 row in set

Comparison in subqueries

The general syntax of subqueries is as follows:

operand operator {ANY | SOME | ALL} (subquery)
operand LIKE (subquery)
operand {IN | NOT IN | EXISTS | NOT EXISTS} (subquery)

operand specifies the operand used for comparison with the subquery result. The valid values of operator are as follows:

= or <=>
>
<
>=
<=
!= or <>
LIKE is used to compare characters or strings.

For example, the following subquery is used to return the records in t1 where c1 equals the largest value of c2 in t2. It cannot be rewritten into a join of t1 and t2.

obclient> CREATE TABLE t1 (c1 INT);
obclient> INSERT INTO t1 VALUES (1);
obclient> CREATE TABLE t2 (c2 INT);
obclient> INSERT INTO t2 VALUES (2);
obclient> SELECT * FROM t1 WHERE c1 = (SELECT MAX(c2) FROM t2);

Notice

A subquery used for comparison with a scalar value must return a scalar value. A subquery used for comparison with a row constructor must be a row subquery that returns rows with the same attributes as the row constructor.

Only the results of scalar subqueries can be used for direct comparison. If a subquery returns a column of values, the set must be processed. OceanBase Database provides the following methods to convert a set into a scalar value:

Aggregate function: aggregates all values in the set into one value and compares the value with the operand.
ANY | SOME: compares all values in the set with the operand and returns TRUE if any row meets the comparison condition. SOME is an alias of ANY.
ALL: compares all values in the set with the operand and returns TRUE if all rows meet the comparison condition.
IN: returns TRUE if the operand belongs to the set. IN is equivalent to = ANY.
NOT IN: returns TRUE if the operand does not belong to the set. NOT IN is equivalent to <> ALL.
EXIST: returns TRUE if the subquery returns rows.
NOT EXIST: returns TRUE if the subquery returns no rows.

Row subquery

A scalar or column subquery returns a scalar or column value. A row subquery returns a single row. Therefore, multiple column values can be returned.

For example, the following subquery returns only a single row:

obclient> CREATE TABLE t1(id INTEGER PRIMARY KEY,c1 INTEGER,c2 INTEGER);
obclient> CREATE TABLE t2(id INTEGER PRIMARY KEY,c3 INTEGER,c4 INTEGER);
obclient> SELECT * FROM t1
 WHERE ROW(c1,c2) = (SELECT c3, c4 FROM t2 WHERE id = 10);

If the c3,c4 value pair is equal to any c1,c2 value pair in table t1, all c1,c2 value pairs that meet the condition are returned. If no value meets the condition, an empty set is returned. In addition, if you do not specify the WHERE condition in the subquery, multiple rows can be returned, and an error occurs. A row subquery returns only a single row.

The ROW(c1,c2) expression can be simplified into (c1,c2). ROW() is called a row constructor. The rows returned by the row constructor and the subquery must contain the same number of values. The row constructor is compared with a subquery that returns two or more columns. The row constructor cannot be used with a subquery that returns a single column. For example, the following query returns a syntax error:

obclient> SELECT * FROM t1 WHERE ROW(1) = (SELECT column1 FROM t2);
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your OceanBase version for the right syntax to use near ') = (SELECT c1 FROM t2)' at line 1

The row constructor is expanded in the optimizer. For example, the following two SQL statements are equivalent:

obclient> SELECT * FROM t1 WHERE (c1,c2) = (1,1);
obclient> SELECT * FROM t1 WHERE c1 = 1 AND c2 = 1;

Correlated subqueries

A subquery that references columns from its parent query is called a correlated subquery. For example, the WHERE condition in the following subquery references column c2 of table t1.

obclient> SELECT * FROM t1 WHERE c1 IN (SELECT c3 FROM t2 WHERE t2.c4 = t1.c2);

For such statements, the query is evaluated from the inside out. For example, in the following statement, x.c2 must be a column of table t2 instead of table t1. This is because the SELECT c1 FROM t2 AS x statement first renames table t2, and SELECT c1 FROM t1 is an outer query.

obclient> SELECT c1 FROM t1 AS x
  WHERE x.c1 = (SELECT c1 FROM t2 AS x
    WHERE x.c1 = (SELECT c1 FROM t3
      WHERE x.c2 = t3.c1));

For a subquery in the HAVING or ORDER BY clause, OceanBase Database also searches for column names in the outer SELECT list.

If an aggregate function in a correlated subquery contains an outer reference, the function can contain only the outer reference and cannot be contained in another function or expression.

Derived tables

A derived table is a table expression generated within the range of the FROM clause. In the following example, a query in the FROM clause of the SELECT statement serves as a derived table.

SELECT ... FROM (subquery) [AS] table_name ...

As a derived table of the FROM clause, [AS] table_name is no longer an optional alias, but must be a name specified for the result set of the subquery. Each column of a derived table must have a unique name.

Derived tables are commonly used for step-by-step calculation. For example, the following query calculates the average value of each group after grouping:

obclient> SELECT AVG(sum_c1)
FROM (SELECT SUM(c1) AS sum_cumn1
FROM t1 GROUP BY c1) AS t1;

Derived tables can return scalar values, columns, rows, or tables.

Limitations on derived tables are as follows:

A derived table cannot be a correlated subquery.
A derived table cannot contain references to other tables in the same SELECT statement.
A derived table cannot contain outer references.

Lateral derived tables

Lateral derived tables is a feature that allows you to reference preceding tables or derived tables defined in a FROM list for a FROM clause.

A lateral derived table can provide higher flexibility and efficiency for subqueries, especially for subqueries that depend on columns in the outer query. Starting from V4.2.2, OceanBase Database in MySQL mode introduces the LATERAL keyword to support this feature.

Here is an example:

SELECT *
FROM A, LATERAL (SELECT * FROM B WHERE B.col1 = A.col1) AS derived_table;

For more information about how to use a lateral derived table, see Lateral derived tables.

Subquery unnesting

Subquery unnesting is a database optimization strategy. It places some subqueries in the outer parent query to convert some subqueries into equivalent multi-table join operations. One obvious benefit of using this strategy is that some access paths, connection methods, and connection sequences may be effectively used to reduce the layers of query statements. In the following subquery unnesting example, the subquery is rewritten into a join statement.

obclient> CREATE TABLE t1(a INT PRIMARY KEY, b INT, c INT);
Query OK, 0 rows affected

obclient> CREATE TABLE t2(a INT PRIMARY KEY, b INT, c INT);
Query OK, 0 rows affected

The following example shows an independent subquery.

obclient> EXPLAIN SELECT * FROM t1 WHERE t1.a IN (SELECT t2.a FROM t2);
| ======================================
|ID|OPERATOR   |NAME|EST. ROWS|COST  |
--------------------------------------
|0 |MERGE JOIN |    |100001   |131664|
|1 | TABLE SCAN|t1  |100000   |48372 |
|2 | TABLE SCAN|t2  |100000   |41911 |
======================================
...

In the following example, a dependent subquery is unnested and rewritten into a join.

obclient> EXPLAIN SELECT * FROM t1 WHERE T1.A IN (SELECT T2.B FROM T2 WHERE T2.C = T1.C);
| ===========================================
|ID|OPERATOR        |NAME |EST. ROWS|COST  |
--------------------------------------------
|0 |MERGE JOIN      |     |9703     |215436|
|1 | TABLE SCAN     |t1   |100000   |64066 |
|2 | SORT           |     |10001    |129621|
|3 |  SUBPLAN SCAN  |VIEW1|10001    |111242|
|4 |   HASH DISTINCT|     |10001    |109862|
|5 |    TABLE SCAN  |t2   |100000   |64066 |
============================================
...