This topic describes the basic concept and limitations on lateral derived tables, and provides some examples to introduce how to use lateral derived tables.
Concept
A derived table is a subquery used in a FROM clause. The result set of this subquery is temporarily used as a table in outer queries. A derived table is usually used to group and aggregate data, or used to create a dataset that meets the specified conditions for use in the main query.
Lateral derived tables are a special type of derived tables. You can use the LATERAL keyword to specify a lateral derived table so that it can reference fields in another table or derived table previously defined in the same FROM clause. This way, a subquery in the lateral derived table can reference tables defined in the same FROM clause and access column values in these tables.
A lateral derived table differs from a normal derived table in that it can reference columns in tables that are previously defined in the same FROM clause.
Limitations
- The
LATERALkeyword must be used in theFROMclause. It can be in a list of tables separated with commas (,) or in a join expression, such asJOIN,INNER JOIN,CROSS JOIN,LEFT [OUTER] JOIN, orRIGHT [OUTER] JOIN. - When the
LATERALkeyword is used in aJOINstatement to reference fields in a left-side table, supported join types areINNER JOIN,CROSS JOIN, andLEFT [OUTER] JOIN. - When the
LATERALkeyword is used in aJOINstatement to reference fields in a right-side table, supported joint types areINNER JOIN,CROSS JOIN, andRIGHT [OUTER] JOIN. - If an aggregate function is referenced in a lateral derived table, this aggregate function cannot directly depend on the external query where the
FROMclause that contains the lateral derived table resides. In other words, a lateral derived table cannot use an aggregate function to directly reference the result of an external query.
Syntax
SELECT select_list
FROM table_name1,
LATERAL (SELECT select_list
FROM table_name2
WHERE table_name2.col_name = table_name1.col_name) AS lateral_derived_table_name
[...];
Parameters in the syntax are described as follows:
| Parameter | Description |
|---|---|
| select_list | The list of columns to retrieve, which can be column names, expressions, or aggregate functions. Separate multiple columns with commas (,). |
| table_name1 | The primary table to query. |
| LATERAL | A lateral join subquery, namely, a lateral derived table. |
| table_name2 | A secondary table referenced in the LATERAL subquery to provide related additional information for each row in the primary table specified by table_name1. |
| lateral_derived_table_name | The alias of the secondary table, which is referenced in subsequent queries. |
| [...] | An optional query clause, such as a WHERE clause. |
Examples
Create a test table and add test data to the table
Create a table named
students.CREATE TABLE students ( id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(50) NOT NULL, age INT );Insert three data records into the
studentstable.INSERT INTO students (name, age) VALUES ('name1', 20), ('name2', 22), ('name3', 21);Create a table named
scores.CREATE TABLE scores ( id INT AUTO_INCREMENT PRIMARY KEY, student_id INT, subject VARCHAR(50) NOT NULL, score DECIMAL(5, 2), FOREIGN KEY (student_id) REFERENCES students(id) );Insert nine data records into the
scorestable.INSERT INTO scores (student_id, subject, score) VALUES (1, 'A', 86.5), (1, 'B', 90.0), (1, 'C', 91.5), (2, 'A', 86.0), (2, 'B', 92.0), (2, 'C', 89.5), (3, 'A', 93.0), (3, 'B', 92.5), (3, 'C', 85.0);
Query the test data
Query the students and scores tables for the name, average score, and highest score of each student.
Q1: Use multiple subqueries to obtain required data. Perform a group operation and an aggregate operation on the scores table. Scan the scores table twice to calculate the average score and highest score of each student. Then, use the WHERE clause to join the results with the students table.
SELECT st.name, sc.avg_score, scs.max_score
FROM students st,
(SELECT student_id, AVG(score) avg_score
FROM scores
GROUP BY student_id) sc,
(SELECT student_id, MAX(score) max_score
FROM scores
GROUP BY student_id) scs
WHERE sc.student_id = st.id
AND scs.student_id = st.id;
The result is as follows:
+-------+-----------+-----------+
| name | avg_score | max_score |
+-------+-----------+-----------+
| name1 | 89.333333 | 91.50 |
| name2 | 89.166667 | 92.00 |
| name3 | 90.166667 | 93.00 |
+-------+-----------+-----------+
3 rows in set
Q2: Use the LATERAL keyword to obtain required data. Use the LATERAL keyword (lateral derived table) to calculate the average score and highest score of each student in an SQL statement, and scan the scores table once for the ID of each student.
SELECT st.name, ld_tbl.avg_score, ld_tbl.max_score
FROM students st,
LATERAL (SELECT AVG(score) avg_score, MAX(score) max_score
FROM scores sc
WHERE sc.student_id = st.id) ld_tbl;
The result is as follows:
+-------+-----------+-----------+
| name | avg_score | max_score |
+-------+-----------+-----------+
| name1 | 89.333333 | 91.50 |
| name2 | 89.166667 | 92.00 |
| name3 | 90.166667 | 93.00 |
+-------+-----------+-----------+
3 rows in set
Q2 has higher performance than Q1 in processing a large dataset because it avoids repeatedly scanning the scores table. The syntax of Q2 is clearer and more compact and easier to understand and maintain.