A common table expression (CTE) is a named temporary result set that applies to the current statement. It is not stored as an object and is used only during queries. Different from derived tables, CTEs can reference themselves and can be referenced multiple times in the same query.
Application scenarios are as follows:
- You can use CTEs to reuse the same subquery in multiple places to avoid repeatedly writing the same logic.
- You can use CTEs to simplify recursive queries, such as locating tree-structured data.
- You can use CTEs to split a complex query into smaller parts to make the query logic clearer and easier to understand.
OceanBase Database supports recursive CTEs and non-recursive CTEs.
Syntax
CTE is optional in the syntax of DML statements and is defined by using the WITH clause. You can separate multiple subclauses in a WITH clause with commas (,). Each subclause provides a subquery to generate a result set and associates the subquery with a name. The syntax is as follows:
WITH [RECURSIVE]
cte_name [(column_name [, column_name] ...)] AS (subquery)
[, cte_name [(column_name [, column_name] ...)] AS (subquery)] ...
Parameters
| Parameter | Description |
|---|---|
[RECURSIVE] |
An optional keyword that specifies to create a recursive CTE.
|
cte_name |
The CTE name, which can be referenced in a WITH clause. |
column_name |
The list of selected column names, which is used to specify aliases for columns in the CTE. The list allows you to use more readable column names in the main query. |
AS(subquery) |
The CTE subquery, which is used to generate the CTE result set. AS must be followed with parentheses. |
If the CTE name is followed by a list of names, these names are the column names. The number of column names here must be the same as that in the SELECT statement of the CTE. If no column name is specified, the column names in the first SELECT list in AS(subquery) will be used.
Application scenarios of the WITH clause
You can use the WITH clause in the following scenarios:
At the beginning of a
SELECTstatementWITH ... SELECT ...At the beginning of a subquery, including derived table subqueries
SELECT ... WHERE id IN (WITH ... SELECT ...) ... SELECT * FROM (WITH ... SELECT ...) AS dt ...Right before the
SELECTstatement in a statement that contains aSELECTstatementINSERT ... WITH ... SELECT ... REPLACE ... WITH ... SELECT ... CREATE TABLE ... WITH ... SELECT ... CREATE VIEW ... WITH ... SELECT ...
Only one WITH clause is allowed at the same level. You can separate multiple subclauses in a WITH clause with commas (,).
WITH cte1 AS (...), cte2 AS (...) SELECT ...
A WITH clause can define one or multiple CTEs. Make sure that the CTE names are unique in the same WITH clause. The following example shows an invalid WITH clause:
WITH cte1 AS (...), cte1 AS (...) SELECT ...
Structure of a recursive CTE
The following describes the structure of a recursive CTE:
If a CTE in a
WITHclause references itself, theWITHclause must begin with theRECURSIVEkeyword. Otherwise, theRECURSIVEkeyword is optional.A recursive CTE subquery comprises two parts separated with
UNION [ALL].SELECT ... -- Returns the initial row set. UNION ALL SELECT ... -- Returns an additional row set.The first
SELECTgenerates one or more initial rows for the CTE without referencing the CTE name. The secondSELECTgenerates additional rows and references the CTE name in theFROMclause for recursion. The recursion ends when no new rows are generated. Therefore, a recursive CTE consists of a non-recursiveSELECTpart and a recursiveSELECTpart. EachSELECTpart can be a combination of multipleSELECTstatements.The type of result columns of a CTE is inferred from the column type of the non-recursive
SELECTpart, and all columns can be null. The recursiveSELECTpart is ignored in determining the type of result columns.Each iteration of the recursive part applies only to the rows generated in the previous iteration. If the recursive part contains multiple query blocks, the iteration of each query block is scheduled based on a built-in order, and the rows operated by each query block are those generated by the last iteration or those generated by other query blocks since the last iteration ends.
Here is an example:
WITH RECURSIVE cte1 (n) AS
(
SELECT 1 /*The non-recursive part that retrieves a single row to generate the initial row set.*/
UNION ALL
SELECT n + 2 FROM cte WHERE n < 10 /*The recursive part that generates a new value greater than value n in the previous row set by 2 until n is not smaller than 10.*/
)
SELECT * FROM cte1;
Limitations
The following syntax constraints apply to recursive CTE subqueries:
The recursive
SELECTpart must not contain the following structures:Aggregate functions such as
SUM()Analytic functions
GROUP BYORDER BYDISTINCT
The recursive
SELECTpart can reference the CTE name in the subquery of theFROMclause only once. The recursive part can reference another table and join it with the CTE. In this case, the CTE cannot be located on the right side ofLEFT JOIN.
For a recursive CTE, the estimated cost displayed by using EXPLAIN represents the cost of each iteration, which may differ greatly from the total cost. The optimizer cannot predict the number of iterations because it cannot predict when the WHERE clause will become false.
UNION [DISTINCT]andLIMITare not allowed between the recursiveSELECTpart and non-recursiveSELECTpart.
Examples
The following examples show the differences between recursive and non-recursive CTEs by using a student profile table that contains the student ID, student name, and mentor ID columns.
First, create a student profile table and insert some data.
obclient> CREATE TABLE student (
student_id INT PRIMARY KEY,
name VARCHAR(100),
mentor_id INT,
FOREIGN KEY (mentor_id) REFERENCES student(student_id)
);
Query OK, 0 rows affected
obclient> INSERT INTO student (student_id, name, mentor_id) VALUES
(1, 'Alice', NULL),
(2, 'Bob', 1),
(3, 'Charlie', 1),
(4, 'David', 2),
(5, 'Eve', 3);
Query OK, 5 rows affected
In this data model, Alice is the top student and has no mentor. Bob and Charlie are Alice's students. David is Bob's student, and Eve is Charlie's student.
Example of a non-recursive CTE
A non-recursive CTE does not reference itself. To select all the direct students of Alice, you can use a non-recursive CTE as follows:
WITH Alice_Students AS (
SELECT * FROM student WHERE mentor_id = 1
)
SELECT * FROM Alice_Students;
The execution result is as follows:
+------------+---------+-----------+
| student_id | name | mentor_id |
+------------+---------+-----------+
| 2 | Bob | 1 |
| 3 | Charlie | 1 |
+------------+---------+-----------+
2 rows in set
Example of a recursive CTE
To select all the direct and indirect students of Alice, you need to use a recursive CTE. A recursive CTE references itself to query students downwards level by level.
WITH RECURSIVE Student_Hierarchy AS (
-- Anchor member: Select Alice as the starting point.
SELECT student_id, name, mentor_id FROM student WHERE mentor_id IS NULL
UNION ALL
-- Recursive members: Select direct students based on the ID of the upper-level student.
SELECT s.student_id, s.name, s.mentor_id
FROM student s
INNER JOIN Student_Hierarchy sh ON s.mentor_id = sh.student_id
)
SELECT * FROM Student_Hierarchy;
This recursive CTE first selects Alice as the starting point, and then recursively selects the direct students of each student until there are no more students.
The execution result is as follows:
+------------+---------+-----------+
| student_id | name | mentor_id |
+------------+---------+-----------+
| 1 | Alice | NULL |
| 3 | Charlie | 1 |
| 2 | Bob | 1 |
| 5 | Eve | 3 |
| 4 | David | 2 |
+------------+---------+-----------+
5 rows in set
In preceding examples, the recursive CTE traverses the hierarchy of students, while the non-recursive CTE selects only a fixed level of students.