Common table expressions (CTEs) are named temporary result sets that are valid only for the current statement and are not stored as objects. They are used only during query execution. Unlike derived tables, CTEs can be self-referencing and can be referenced multiple times within the same query.
Scenarios:
- CTEs can be used to reuse the same subquery in multiple places, avoiding the need to write the same logic repeatedly.
- They can simplify recursive queries, such as those used to find hierarchical data.
- CTEs can break down complex queries into smaller parts, making the query logic clearer and easier to understand.
OceanBase Database supports both nonrecursive and recursive CTEs.
CTE syntax
Common table expressions (CTEs) are optional parts of DML statement syntax and are defined using the WITH clause. If the WITH clause contains multiple clauses, they can be separated by commas. Each clause provides a subquery to generate a result set and associates the subquery with a name. The syntax is as follows:
WITH [RECURSIVE]
cte_name [(column_name [, column_name] ...)] AS (subquery)
[, cte_name [(column_name [, column_name] ...)] AS (subquery)] ...
Parameters
| Parameter | Description |
|---|---|
[RECURSIVE] |
An optional keyword used to specify whether to create a recursive CTE.
|
cte_name |
The name of the CTE, which can be referenced by a table containing the WITH clause. |
column_name |
The list of selected column names used to specify aliases for columns in the CTE. This allows you to use more readable column names in the main query. |
AS(subquery) |
The CTE subquery used to generate the CTE result set. The AS keyword must be followed by parentheses. |
If a list of names in parentheses follows the CTE name, these names are column names and must match the number of columns in the SELECT statement of the CTE. If no column names are specified, the column names will be derived from the AS(subquery) part, specifically from the first SELECT list.
Applicable scenarios for the WITH clause
The WITH clause can be used in the following scenarios:
At the beginning of a
SELECTstatement.WITH ... SELECT ...At the beginning of a subquery (including derived table subqueries).
SELECT ... WHERE id IN (WITH ... SELECT ...) ... SELECT * FROM (WITH ... SELECT ...) AS dt ...Immediately before a statement that contains a
SELECTstatement.INSERT ... WITH ... SELECT ... REPLACE ... WITH ... SELECT ... CREATE TABLE ... WITH ... SELECT ... CREATE VIEW ... WITH ... SELECT ...
Only one WITH clause is allowed at the same level. If the WITH clause contains multiple clauses, they can be separated by commas.
WITH cte1 AS (...), cte2 AS (...) SELECT ...
The WITH clause can define one or more CTEs, but each CTE name must be unique within the clause. The following example is invalid:
WITH cte1 AS (...), cte1 AS (...) SELECT ...
Structure of a recursive CTE
A recursive CTE has the following structure:
If a CTE in the
WITHclause refers to itself, theWITHclause must start withWITH RECURSIVE. Otherwise, theRECURSIVEkeyword is not required.A recursive CTE subquery consists of two parts separated by
UNION [ALL | DISTINCT]:Note
UNIONorUNION DISTINCT: Merges the result sets of two or moreSELECTstatements into a single set and removes duplicate rows.UNION ALL: Merges the result sets of two or moreSELECTstatements into a single set without removing duplicate rows.
SELECT ... -- Returns the initial row set UNION ALL SELECT ... -- Returns additional rowsThe first
SELECTgenerates one or more initial rows for the CTE and does not reference the CTE name. The secondSELECTgenerates additional rows and performs a recursive operation by referencing the CTE name in itsFROMclause. The recursion stops when this part no longer generates new rows. Therefore, a recursive CTE consists of a non-recursiveSELECTpart and a recursiveSELECTpart. EachSELECTpart can be a union of multipleSELECTstatements.The data types of the result columns of the CTE are inferred only from the columns in the non-recursive
SELECTpart, and all columns can be nullable. The types from the recursiveSELECTpart are ignored.Each iteration of the recursive part operates only on the rows generated in the previous iteration. If the recursive part contains multiple query blocks, each query block's iterations are scheduled in an unspecified order, and each query block operates on the rows generated in its previous iteration or by other query blocks since the last iteration.
Here is an example:
WITH RECURSIVE cte1 (n) AS
(
SELECT 1 /* Non-recursive part that retrieves a single row to generate the initial row set */
UNION ALL
SELECT n + 2 FROM cte1 WHERE n < 10 /* Recursive part that generates a new value greater than the previous n value by 2, until n is not less than 10 */
)
SELECT * FROM cte1;
Limitations
The following syntax constraints apply to recursive CTE subqueries:
The recursive SELECT part cannot contain the following structures:
Aggregate functions, such as SUM()
Window functions
GROUP BY
ORDER BY
DISTINCT
The recursive SELECT part must reference the subquery in its FROM clause only once. It can reference tables other than the CTE and join with the CTE. If such a join is used, the CTE cannot be on the right side of a LEFT JOIN.
For recursive CTEs, the cost estimates displayed by EXPLAIN represent the cost of each iteration, which may differ significantly from the total cost. The optimizer cannot predict the number of iterations because it cannot determine when the WHERE clause will become false.
- The recursive
SELECT and non-recursive SELECT parts cannot contain the LIMIT structure.
Examples
The following example creates a student table with columns for student ID, name, and mentor ID, to demonstrate the difference between recursive and non-recursive CTEs.
First, we create the student table and insert some data:
obclient> CREATE TABLE student (
student_id INT PRIMARY KEY,
name VARCHAR(100),
mentor_id INT,
FOREIGN KEY (mentor_id) REFERENCES student(student_id)
);
Query OK, 0 rows affected
obclient> INSERT INTO student (student_id, name, mentor_id) VALUES
(1, 'Alice', NULL),
(2, 'Bob', 1),
(3, 'Charlie', 1),
(4, 'David', 2),
(5, 'Eve', 3);
Query OK, 5 rows affected
In this data model, Alice is the top-level student with no mentor. Bob and Charlie are Alice's students, David is Bob's student, and Eve is Charlie's student.
Non-recursive CTE example
A non-recursive CTE does not reference itself. For example, if we want to select all of Alice's direct students (the first-level students), we can use a non-recursive CTE like this:
WITH Alice_Students AS (
SELECT * FROM student WHERE mentor_id = 1
)
SELECT * FROM Alice_Students;
The result will be:
+------------+---------+-----------+
| student_id | name | mentor_id |
+------------+---------+-----------+
| 2 | Bob | 1 |
| 3 | Charlie | 1 |
+------------+---------+-----------+
2 rows in set
Recursive CTE example
Now, if we want to find all of Alice's direct and indirect students (all levels of students), we need to use a recursive CTE. A recursive CTE references itself to query down each level.
WITH RECURSIVE Student_Hierarchy AS (
-- Anchor member: Select Alice as the starting point
SELECT student_id, name, mentor_id FROM student WHERE mentor_id IS NULL
UNION ALL
-- Recursive member: Select direct students based on the ID of the previous level's student
SELECT s.student_id, s.name, s.mentor_id
FROM student s
INNER JOIN Student_Hierarchy sh ON s.mentor_id = sh.student_id
)
SELECT * FROM Student_Hierarchy;
This recursive CTE first selects Alice as the starting point and then recursively selects each student's direct students until there are no more students.
The result will be:
+------------+---------+-----------+
| student_id | name | mentor_id |
+------------+---------+-----------+
| 1 | Alice | NULL |
| 3 | Charlie | 1 |
| 2 | Bob | 1 |
| 5 | Eve | 3 |
| 4 | David | 2 |
+------------+---------+-----------+
5 rows in set
In this example, the recursive CTE allows us to traverse the student hierarchy, while the non-recursive CTE only selects students at a fixed level.