A common table expression (CTE) is a temporary result set that is used only within the statement that defines it. A CTE does not actually store data but calculates and returns data during the execution of a query. Unlike derived tables, CTEs can be self-referential and can be referenced multiple times in the same query.
Scenarios:
- CTEs can reuse the same subquery multiple times without rewriting the same logic multiple times.
- CTEs can simplify recursive queries, such as those used to query hierarchical data.
- CTEs can break complex queries into smaller parts to simplify the query logic.
OceanBase Database supports both recursive and non-recursive CTEs.
CTE syntax
A common table expression is an optional part of the DML statement syntax. It is defined by using the WITH clause. Multiple WITH clauses can be separated with commas. Each WITH clause contains a subquery that generates a result set and associates the result set with a name. The syntax is as follows:
WITH [RECURSIVE]
cte_name [(column_name [, column_name] ...)] AS (subquery)
[, cte_name [(column_name [, column_name] ...)] AS (subquery)] ...
Arguments
| Argument | Description |
|---|---|
[RECURSIVE] |
The keyword to specify whether to create a recursive CTE.
|
cte_name |
The name of the CTE, which can be referenced by tables in the WITH clause. |
column_name |
The selected column names that are used to alias the columns in the CTE. This enables the use of more readable column names in the main query. |
AS(subquery) |
The subquery that is used to generate the result set of the CTE. The AS keyword must be followed by a pair of parentheses. |
If the CTE name is followed by a pair of parentheses containing a name list, the names in the parentheses are column names. The number of column names must be the same as the number of columns in the SELECT statement of the CTE. If no column names are specified, the column names in the first SELECT list of the AS(subquery) part are used.
Scenarios where the WITH clause can be used
You can use the WITH clause in the following scenarios:
At the beginning of a
SELECTstatement.WITH ... SELECT ...At the beginning of a subquery (including a derived table subquery).
SELECT ... WHERE id IN (WITH ... SELECT ...) ... SELECT * FROM (WITH ... SELECT ...) AS dt ...Immediately before the
SELECTclause in a statement that contains aSELECTstatement.INSERT ... WITH ... SELECT ... REPLACE ... WITH ... SELECT ... CREATE TABLE ... WITH ... SELECT ... CREATE VIEW ... WITH ... SELECT ...
You can use at most one WITH clause at the same level. If multiple WITH clauses are included in the WITH clause, separate them with commas.
WITH cte1 AS (...), cte2 AS (...) SELECT ...
The WITH clause can define one or more common table expressions (CTEs). However, each CTE name must be unique within the WITH clause. The following example is illegal:
WITH cte1 AS (...), cte1 AS (...) SELECT ...
Structure of a recursive CTE
A recursive CTE has the following structure:
If the
WITHclause contains a CTE that references itself, theWITHclause must start withWITH RECURSIVE. Otherwise, it does not need to start withRECURSIVE.A recursive CTE consists of two parts separated by
UNION [ALL]:SELECT ... -- Returns an initial row set. UNION ALL SELECT ... -- Returns additional row sets.The first
SELECTstatement generates one or more initial rows for the CTE and does not reference the CTE name. The secondSELECTstatement generates additional rows through recursion by referencing the CTE name in itsFROMclause. The recursion ends when the secondSELECTstatement does not generate new rows. Therefore, a recursive CTE consists of a non-recursiveSELECTpart and a recursiveSELECTpart. EachSELECTpart can be multipleSELECTstatements joined together.The types of the columns in the result set of a CTE are inferred from the column types in the result set of the non-recursive
SELECTpart. All columns can be null. The types of the columns in the recursiveSELECTpart are ignored.The rows generated by the recursive part are operated on only by the current iteration. If the recursive part contains multiple query blocks, the iterations of the query blocks are scheduled in an unspecified order, and a query block can operate on the rows generated by the previous iteration of itself or other query blocks since the last iteration ended.
Here is an example:
WITH RECURSIVE cte1 (n) AS
(
SELECT 1 /*Non-recursive part. It retrieves a single row to generate the initial row set.*/
UNION ALL
SELECT n + 2 FROM cte1 WHERE n < 10 /*Recursive part. It generates a new value that is 2 greater than the value of n in the previous row set until the value of n is not less than 10.*/
)
SELECT * FROM cte1;
Limitations
The recursive SELECT part of a recursive CTE query must meet the following requirements:
It must not contain the following elements:
Aggregate functions such as
SUM()Window functions
GROUP BYORDER BYDISTINCT
It must reference a table only through a subquery in its
FROMclause. It can join tables and reference CTEs. If it joins a CTE, the CTE cannot be on the right side of aLEFT JOIN.
The cost estimation displayed in the EXPLAIN statement for a recursive CTE represents the cost of each iteration and can differ significantly from the total cost. The optimizer cannot predict the number of iterations because it cannot predict when the WHERE condition becomes false.
- The
UNION [DISTINCT]andLIMITstructures are not supported between the recursiveSELECTpart and the non-recursiveSELECTpart.
Examples
The following example shows how to create a student hierarchy table that contains the IDs and names of students and their mentor IDs. This example is used to illustrate the differences between recursive CTEs and non-recursive CTEs.
First, create a student table and insert some data into the table:
obclient> CREATE TABLE student (
student_id INT PRIMARY KEY,
name VARCHAR(100),
mentor_id INT,
FOREIGN KEY (mentor_id) REFERENCES student(student_id)
);
Query OK, 0 rows affected
obclient> INSERT INTO student (student_id, name, mentor_id) VALUES
(1, 'Alice', NULL),
(2, 'Bob', 1),
(3, 'Charlie', 1),
(4, 'David', 2),
(5, 'Eve', 3);
Query OK, 5 rows affected
In this data model, Alice is a top-level student with no mentor. Bob and Charlie are Alice's students. David is Bob's student. Eve is Charlie's student.
Non-recursive CTE example
A non-recursive CTE does not reference itself. For example, if you want to select all students directly mentored by Alice (i.e., the first-level students), you can use the following non-recursive CTE:
WITH Alice_Students AS (
SELECT * FROM student WHERE mentor_id = 1
)
SELECT * FROM Alice_Students;
The execution result is as follows:
+------------+---------+-----------+
| student_id | name | mentor_id |
+------------+---------+-----------+
| 2 | Bob | 1 |
| 3 | Charlie | 1 |
+------------+---------+-----------+
2 rows in set
Recursive CTE example
If you want to find all students directly or indirectly mentored by Alice (i.e., all students at all levels), you must use a recursive CTE. A recursive CTE references itself to query students at subsequent levels.
WITH RECURSIVE Student_Hierarchy AS (
-- Anchor member: Select Alice as the starting point.
SELECT student_id, name, mentor_id FROM student WHERE mentor_id IS NULL
UNION ALL
-- Recursive member: Select direct students of students at the previous level.
SELECT s.student_id, s.name, s.mentor_id
FROM student s
INNER JOIN Student_Hierarchy sh ON s.mentor_id = sh.student_id
)
SELECT * FROM Student_Hierarchy;
This recursive CTE selects Alice as the starting point, then recursively selects direct students of each student, until no more students are found.
The execution result is as follows:
+------------+---------+-----------+
| student_id | name | mentor_id |
+------------+---------+-----------+
| 1 | Alice | NULL |
| 3 | Charlie | 1 |
| 2 | Bob | 1 |
| 5 | Eve | 3 |
| 4 | David | 2 |
+------------+---------+-----------+
5 rows in set
In this example, the recursive CTE enables you to traverse the hierarchical structure of students. A non-recursive CTE can only select students at a fixed level.