GROUP BY operators group and aggregate data in SQL statements.
Since hash and merge algorithms are used for data grouping, GROUP BY operators also fall into the following types: HASH GROUP BY and MERGE GROUP BY. For a newly generated execution plan, the SQL optimizer selects from these two GROUP BY operators based on cost evaluation.
General aggregate functions (SUM, MAX, MIN, AVG, COUNT, and STDDEV) are also executed by assigning the GROUP BY operator. However, if an SQL statement contains only aggregate functions but no GROUP BY, the SCALAR GROUP BY operator is assigned. In that sense, GROUP BY operators can be divided into three types: SCALAR GROUP BY, HASH GROUP BY, and MERGE GROUP BY.
SCALAR GROUP BY
Example 1: an execution plan that contains a SCALAR GROUP BY operator
obclient> CREATE TABLE t1(c1 INT, c2 INT);
Query OK, 0 rows affected
obclient> INSERT INTO t1 VALUES(1, 1);
Query OK, 1 rows affected
obclient> INSERT INTO t1 VALUES(2, 2);
Query OK, 1 rows affected
obclient> INSERT INTO t1 VALUES(3, 3);
Query OK, 1 rows affected
Q1:
obclient> EXPLAIN SELECT SUM(c1) FROM t1;
Query Plan:
| ========================================
|ID|OPERATOR |NAME|EST. ROWS|COST|
----------------------------------------
|0 |SCALAR GROUP BY| |1 |37 |
|1 | TABLE SCAN |T1 |3 |37 |
========================================
Outputs & filters:
-------------------------------------
0 - output([T_FUN_SUM(T1.C1)]), filter(nil),
group(nil), agg_func([T_FUN_SUM(T1.C1)])
1 - output([T1.C1]), filter(nil),
access([T1.C1]), partitions(p0)
In the preceding example, the Outputs & filters section for query Q1 shows in detail the output information of the SCALAR GROUP BY operator.
| Field | Description |
|---|---|
| output | The output expressions of the operator. |
| filter | The filter conditions of the operator. In this example, filter is set to nil because no filter condition is configured for the SCALAR GROUP BY operator. |
| group | The columns to be grouped. For example, query Q1 contains the SCALAR GROUP BY operator. Therefore, the value of this field is nil. |
| agg_func | The aggregate functions involved. For example, query Q1 is to sum up the data in column c1 of table t1. Therefore, the value is T_FUN_SUM(t1.c1). |
HASH GROUP BY
Example 2: an execution plan that contains a HASH GROUP BY operator
Q2:
obclient> EXPLAIN SELECT SUM(c2) FROM t1 GROUP BY c1 HAVING SUM(c2) > 2;
Query Plan:
| ======================================
|ID|OPERATOR |NAME|EST. ROWS|COST|
--------------------------------------
|0 |HASH GROUP BY| |1 |40 |
|1 | TABLE SCAN |T1 |3 |37 |
======================================
Outputs & filters:
-------------------------------------
0 - output([T_FUN_SUM(T1.C2)]), filter([T_FUN_SUM(T1.C2) > 2]),
group([T1.C1]), agg_func([T_FUN_SUM(T1.C2)])
1 - output([T1.C1], [T1.C2]), filter(nil),
access([T1.C1], [T1.C2]), partitions(p0)
In the preceding example, the Outputs & filters section for query Q2 shows in detail the output information of the HASH GROUP BY operator.
| Field | Description |
|---|---|
| output | The output expressions of the operator. |
| filter | The filter conditions of the operator. Since the setting requires that the sum of column c2 after grouping is greater than 2, the filter is T_FUN_SUM(t1.c2) > 2. |
| group | The columns to be grouped. For example, in query Q2, since we are grouping the c1 column of table t1, the value is T1.C1. |
| agg_func | The aggregate functions involved. For example, query Q2 is to sum up the data in column c2 of table t1. Therefore, the value is T_FUN_SUM(t1.c2). |
Note
The HASH GROUP BY operator ensures that the data is grouped by a hash algorithm in the execution.
MERGE GROUP BY
Example 3: an execution plan that contains a MERGE GROUP BY operator
Q3:
obclient> EXPLAIN SELECT /*+NO_USE_HASH_AGGREGATION*/SUM(c2) FROM t1 GROUP BY c1 HAVING SUM(c2) > 2;
Query Plan:
| =======================================
|ID|OPERATOR |NAME|EST. ROWS|COST|
---------------------------------------
|0 |MERGE GROUP BY| |1 |45 |
|1 | SORT | |3 |44 |
|2 | TABLE SCAN |T1 |3 |37 |
=======================================
Outputs & filters:
-------------------------------------
0 - output([T_FUN_SUM(T1.C2)]), filter([T_FUN_SUM(T1.C2) > 2]),
group([T1.C1]), agg_func([T_FUN_SUM(T1.C2)])
1 - output([T1.C1], [T1.C2]), filter(nil), sort_keys([T1.C1, ASC])
2 - output([T1.C1], [T1.C2]), filter(nil),
access([T1.C1], [T1.C2]), partitions(p0)
In the preceding example, the Outputs & filters section for query Q3 shows in detail the information of the MERGE GROUP BY operator. It shows that the MERGE GROUP BY operator was selected for the execution plan that was generated for the same SQL statement, and the basic information of the operator was the same as that of other GROUP BY operators, except for the algorithm selected for grouping in the execution. Additionally, Operator 2 TABLE SCAN in the example returns an unsorted result, since the MERGE GROUP BY operator is used, a SORT operator must be assigned.
Notice
The NO_USE_HASH_AGGREGATION and USE_HASH_AGGREGATION hints can control the type of algorithms selected by GROUP BY operators for grouping.