CTE|V4.3.3| docs|Distributed Database

CTE

Last Updated：2024-12-02 03:48:26 Updated

A common table expression (CTE) is a named temporary result set that applies to the current statement. It is not stored as an object and is used only during queries. Different from derived tables, CTEs can reference themselves and can be referenced multiple times in the same query.

Application scenarios are as follows:

You can use CTEs to reuse the same subquery in multiple places to avoid repeatedly writing the same logic.
You can use CTEs to simplify recursive queries, such as locating tree-structured data.
You can use CTEs to split a complex query into smaller parts to make the query logic clearer and easier to understand.

OceanBase Database supports recursive CTEs and non-recursive CTEs.

Syntax

CTE is optional in the syntax of DML statements and is defined by using the WITH clause. You can separate multiple subclauses in a WITH clause with commas (,). Each subclause provides a subquery to generate a result set and associates the subquery with a name. The syntax is as follows:

 WITH [RECURSIVE]
      cte_name [(column_name [, column_name] ...)] AS (subquery)
      [, cte_name [(column_name [, column_name] ...)] AS (subquery)] ...

Parameters

Parameter	Description
`[RECURSIVE]`	An optional keyword that specifies to create a recursive CTE. To create a recursive CTE, specify the `RECURSIVE` keyword. A recursive CTE refers to itself and is used to process data with a recursive structure, such as a tree structure and a graph. The definition of a recursive CTE contains a basic query, which is the start condition. Then, recursive operations are performed on the result of the initial query until the stop condition is met. To create a non-recursive CTE, do not specify the `RECURSIVE` keyword. A non-recursive CTE does not reference itself. Such a CTE is used only to construct a one-time temporary result set and does not involve recursive operations.
`cte_name`	The CTE name, which can be referenced in a WITH clause.
`column_name`	The list of selected column names, which is used to specify aliases for columns in the CTE. The list allows you to use more readable column names in the main query.
`AS(subquery)`	The CTE subquery, which is used to generate the CTE result set. AS must be followed with parentheses.

If the CTE name is followed by a list of names, these names are the column names. The number of column names here must be the same as that in the SELECT statement of the CTE. If no column name is specified, the column names in the first SELECT list in AS(subquery) will be used.

Application scenarios of the WITH clause

You can use the WITH clause in the following scenarios:

At the beginning of a SELECT statement
```
WITH ... SELECT ...
```

At the beginning of a subquery, including derived table subqueries

SELECT ... WHERE id IN (WITH ... SELECT ...) ...
SELECT * FROM (WITH ... SELECT ...) AS dt ...

Right before the SELECT statement in a statement that contains a SELECT statement

INSERT ... WITH ... SELECT ...
REPLACE ... WITH ... SELECT ...
CREATE TABLE ... WITH ... SELECT ...
CREATE VIEW ... WITH ... SELECT ...

Only one WITH clause is allowed at the same level. You can separate multiple subclauses in a WITH clause with commas (,).

WITH cte1 AS (...), cte2 AS (...) SELECT ...

A WITH clause can define one or multiple CTEs. Make sure that the CTE names are unique in the same WITH clause. The following example shows an invalid WITH clause:

WITH cte1 AS (...), cte1 AS (...) SELECT ...

Structure of a recursive CTE

The following describes the structure of a recursive CTE:

If a CTE in a WITH clause references itself, the WITH clause must begin with the RECURSIVE keyword. Otherwise, the RECURSIVE keyword is optional.
A recursive CTE subquery comprises two parts separated with UNION [ALL].
```
SELECT ...      -- Returns the initial row set.
UNION ALL
SELECT ...      -- Returns an additional row set.
```
The first SELECT generates one or more initial rows for the CTE without referencing the CTE name. The second SELECT generates additional rows and references the CTE name in the FROM clause for recursion. The recursion ends when no new rows are generated. Therefore, a recursive CTE consists of a non-recursive SELECT part and a recursive SELECT part. Each SELECT part can be a combination of multiple SELECT statements.
The type of result columns of a CTE is inferred from the column type of the non-recursive SELECT part, and all columns can be null. The recursive SELECT part is ignored in determining the type of result columns.
Each iteration of the recursive part applies only to the rows generated in the previous iteration. If the recursive part contains multiple query blocks, the iteration of each query block is scheduled based on a built-in order, and the rows operated by each query block are those generated by the last iteration or those generated by other query blocks since the last iteration ends.

Here is an example:

WITH RECURSIVE cte1 (n) AS
(
  SELECT 1                /*The non-recursive part that retrieves a single row to generate the initial row set.*/
  UNION ALL
  SELECT n + 2 FROM cte WHERE n < 10     /*The recursive part that generates a new value greater than value n in the previous row set by 2 until n is not smaller than 10.*/
)
SELECT * FROM cte1;

Limitations

The following syntax constraints apply to recursive CTE subqueries:

The recursive SELECT part must not contain the following structures:
- Aggregate functions such as SUM()
- Analytic functions
- GROUP BY
- ORDER BY
- DISTINCT
The recursive SELECT part can reference the CTE name in the subquery of the FROM clause only once. The recursive part can reference another table and join it with the CTE. In this case, the CTE cannot be located on the right side of LEFT JOIN.

For a recursive CTE, the estimated cost displayed by using EXPLAIN represents the cost of each iteration, which may differ greatly from the total cost. The optimizer cannot predict the number of iterations because it cannot predict when the WHERE clause will become false.

UNION [DISTINCT] and LIMIT are not allowed between the recursive SELECT part and non-recursive SELECT part.

Examples

The following examples show the differences between recursive and non-recursive CTEs by using a student profile table that contains the student ID, student name, and mentor ID columns.

First, create a student profile table and insert some data.

obclient> CREATE TABLE student (
    student_id INT PRIMARY KEY,
    name VARCHAR(100),
    mentor_id INT,
    FOREIGN KEY (mentor_id) REFERENCES student(student_id)
);
Query OK, 0 rows affected

obclient> INSERT INTO student (student_id, name, mentor_id) VALUES
(1, 'Alice', NULL),
(2, 'Bob', 1),
(3, 'Charlie', 1),
(4, 'David', 2),
(5, 'Eve', 3);
Query OK, 5 rows affected

In this data model, Alice is the top student and has no mentor. Bob and Charlie are Alice's students. David is Bob's student, and Eve is Charlie's student.

Example of a non-recursive CTE

A non-recursive CTE does not reference itself. To select all the direct students of Alice, you can use a non-recursive CTE as follows:

WITH Alice_Students AS (
    SELECT * FROM student WHERE mentor_id = 1
)
SELECT * FROM Alice_Students;

The execution result is as follows:

+------------+---------+-----------+
| student_id | name    | mentor_id |
+------------+---------+-----------+
|          2 | Bob     |         1 |
|          3 | Charlie |         1 |
+------------+---------+-----------+
2 rows in set

Example of a recursive CTE

To select all the direct and indirect students of Alice, you need to use a recursive CTE. A recursive CTE references itself to query students downwards level by level.

WITH RECURSIVE Student_Hierarchy AS (
    -- Anchor member: Select Alice as the starting point.
    SELECT student_id, name, mentor_id FROM student WHERE mentor_id IS NULL
    UNION ALL
    -- Recursive members: Select direct students based on the ID of the upper-level student.
    SELECT s.student_id, s.name, s.mentor_id
    FROM student s
    INNER JOIN Student_Hierarchy sh ON s.mentor_id = sh.student_id
)
SELECT * FROM Student_Hierarchy;

This recursive CTE first selects Alice as the starting point, and then recursively selects the direct students of each student until there are no more students.

The execution result is as follows:

+------------+---------+-----------+
| student_id | name    | mentor_id |
+------------+---------+-----------+
|          1 | Alice   |      NULL |
|          3 | Charlie |         1 |
|          2 | Bob     |         1 |
|          5 | Eve     |         3 |
|          4 | David   |         2 |
+------------+---------+-----------+
5 rows in set

In preceding examples, the recursive CTE traverses the hierarchy of students, while the non-recursive CTE selects only a fixed level of students.