A Common Table Expression (CTE) is a temporary result set that is used to store intermediate query results. It is a named temporary table that exists only for the duration of the query. Unlike derived tables, CTEs can be self-referencing and can be referenced multiple times within the same query.
Scenarios:
- CTEs can be reused in multiple places to simplify the reuse of the same subquery and avoid repeated writing of the same logic.
- CTEs can be used to simplify recursive queries, for example, to query data in a hierarchical structure.
- CTEs can be used to break a complex query into smaller parts, making the query logic clearer and easier to understand.
OceanBase Database supports both recursive and non-recursive CTEs.
CTE syntax
A common table expression is an optional part of the DML statement syntax. It is defined by using the WITH clause. Multiple WITH clauses can be separated with commas. Each WITH clause contains a subquery that generates a result set and associates the result set with a name. The syntax is as follows:
WITH [RECURSIVE]
cte_name [(column_name [, column_name] ...)] AS (subquery)
[, cte_name [(column_name [, column_name] ...)] AS (subquery)] ...
Arguments
| Argument | Description |
|---|---|
[RECURSIVE] |
Optional keyword to specify whether to create a recursive CTE.
|
cte_name |
The name of the CTE, which can be referenced by tables in the WITH clause. |
column_name |
The selected column names that are used to alias the columns in the CTE. This enables you to use more readable column names in the main query. |
AS(subquery) |
The subquery that generates the result set of the CTE. AS must be followed by a pair of parentheses. |
If the CTE name is followed by a pair of parentheses that enclose a name list, the names in the list are column names. The number of column names must be the same as the number of columns in the SELECT statement of the CTE. If no column names are specified, the column names in the AS(subquery) part will be used, namely, the column names in the first SELECT list.
Scenarios supported by the WITH clause
You can use the WITH clause in the following scenarios:
At the start of a
SELECTstatement.WITH ... SELECT ...At the start of a subquery (including a derived table subquery).
SELECT ... WHERE id IN (WITH ... SELECT ...) ... SELECT * FROM (WITH ... SELECT ...) AS dt ...Immediately before the
SELECTclause in a statement that contains aSELECTclause.INSERT ... WITH ... SELECT ... REPLACE ... WITH ... SELECT ... CREATE TABLE ... WITH ... SELECT ... CREATE VIEW ... WITH ... SELECT ...
You can use only one WITH clause at the same level. If multiple WITH clauses are included in the WITH clause, separate them with commas.
WITH cte1 AS (...), cte2 AS (...) SELECT ...
The WITH clause can define one or more common table expressions. However, each CTE name must be unique within the WITH clause. The following example is illegal because the CTE contains two CTEs that are both named cte1:
WITH cte1 AS (...), cte1 AS (...) SELECT ...
Structure of a recursive CTE
A recursive CTE has the following structure:
If the
WITHclause contains a recursive CTE, theWITHclause must start withWITH RECURSIVE. Otherwise, you can useWITHorWITH RECURSIVE.The recursive CTE query consists of two
SELECTclauses separated byUNION [ALL]:-- Returns the initial row set. SELECT ... UNION ALL -- Returns additional row sets. SELECT ...The first
SELECTstatement generates one or more initial rows for the CTE and does not reference the CTE name. The secondSELECTstatement generates additional rows and references the CTE name in itsFROMclause for recursion. The recursion continues until this statement does not generate new rows. Therefore, a recursive CTE consists of a non-recursiveSELECTpart and a recursiveSELECTpart. EachSELECTpart can be multipleSELECTstatements joined together.The data types of the result columns of a recursive CTE are inferred from the data types of the columns in the non-recursive
SELECTpart. All columns can contain null values. The data types of the recursiveSELECTpart are ignored.The recursion operates only on rows generated by the previous iteration. If the recursive part contains multiple query blocks, the iterations of the query blocks are scheduled in an unspecified order, and a query block can operate on rows generated by the previous iteration or by other query blocks after they are generated.
Here is an example:
WITH RECURSIVE cte1 (n) AS
(
SELECT 1 /*Non-recursive part. It retrieves a single row to generate the initial row set.*/
UNION ALL
SELECT n + 2 FROM cte1 WHERE n < 10 /*Recursive part. It generates a new value that is greater than the value of n in the previous row set by 2 and continues until n is greater than or equal to 10.*/
)
SELECT * FROM cte1;
Limitations
The recursive SELECT part of a recursive CTE query must meet the following requirements:
It must not contain the following elements:
Aggregate functions such as
SUM()Window functions
GROUP BYORDER BYDISTINCT
It must reference a table in the
FROMclause of the recursive CTE query only once. It can reference tables and CTEs outside theFROMclause and join them with the CTE. If used in a join, the CTE cannot be located on the right side of aLEFT JOIN.
The cost estimates displayed by the EXPLAIN statement for a recursive CTE represent the cost of each iteration and can differ significantly from the total cost. The optimizer cannot estimate the number of iterations because it cannot determine when the condition specified in the WHERE clause becomes False.
- The
UNION [DISTINCT]andLIMITstructures are not supported between the recursiveSELECTpart and the non-recursiveSELECTpart.
Examples
The following example shows how to create a student hierarchy table that contains the IDs, names, and mentors of students. This example is used to illustrate the differences between recursive common table expressions (CTEs) and non-recursive CTEs.
First, create the student table and insert some data:
obclient> CREATE TABLE student (
student_id INT PRIMARY KEY,
name VARCHAR(100),
mentor_id INT,
FOREIGN KEY (mentor_id) REFERENCES student(student_id)
);
Query OK, 0 rows affected
obclient> INSERT INTO student (student_id, name, mentor_id) VALUES
(1, 'Alice', NULL),
(2, 'Bob', 1),
(3, 'Charlie', 1),
(4, 'David', 2),
(5, 'Eve', 3);
Query OK, 5 rows affected
In this data model, Alice is a top-level student who has no mentor. Bob and Charlie are Alice's students. David is Bob's student. Eve is Charlie's student.
Non-recursive CTE example
A non-recursive CTE does not reference itself. For example, if you want to select all students directly mentored by Alice (i.e., first-level students), you can use the following non-recursive CTE:
WITH Alice_Students AS (
SELECT * FROM student WHERE mentor_id = 1
)
SELECT * FROM Alice_Students;
The execution result is as follows:
+------------+---------+-----------+
| student_id | name | mentor_id |
+------------+---------+-----------+
| 2 | Bob | 1 |
| 3 | Charlie | 1 |
+------------+---------+-----------+
2 rows in set
Recursive CTE example
If you want to find all students directly or indirectly mentored by Alice (i.e., all students at all levels), you must use a recursive CTE. A recursive CTE references itself to query students at the next level.
WITH RECURSIVE Student_Hierarchy AS (
-- Anchor member: Select Alice as the starting point.
SELECT student_id, name, mentor_id FROM student WHERE mentor_id IS NULL
UNION ALL
-- Recursive member: Select direct students of students at the previous level.
SELECT s.student_id, s.name, s.mentor_id
FROM student s
INNER JOIN Student_Hierarchy sh ON s.mentor_id = sh.student_id
)
SELECT * FROM Student_Hierarchy;
This recursive CTE selects Alice as the starting point, then recursively selects direct students of each student, until no more students are found.
The execution result is as follows:
+------------+---------+-----------+
| student_id | name | mentor_id |
+------------+---------+-----------+
| 1 | Alice | NULL |
| 3 | Charlie | 1 |
| 2 | Bob | 1 |
| 5 | Eve | 3 |
| 4 | David | 2 |
+------------+---------+-----------+
5 rows in set
In this example, the recursive CTE allows you to traverse the hierarchical structure of students, while the non-recursive CTE can only select students at a fixed level.