GROUP BY |V2.2.77|OceanBase Database| docs|Distributed Database

GROUP BY

Last Updated：2023-08-18 09:26:34 Updated

The GROUP BY operator enables data grouping and aggregating in SQL statements.

Algorithms for data grouping include HASH and MERGE . So, the GROUP BY operator also includes HASH GROUP BY and MERGE GROUP BY. For a newly generated execution plan, the SQL optimizer selects from these two GROUP BY operators based on cost evaluation.

General aggregate functions (SUM, MAX, MIN, AVG, COUNT, STDDEV) are also executed by assigning the GROUP BY operator. However, if an SQL contains only aggregate functions but no GROUP BY, the SCALAR GROUP BY operator is assigned. In that sense, GROUP BY operators can be divided into three types: SCALAR GROUP BY, HASH GROUP BY, and MERGE GROUP BY.

SCALAR GROUP BY

Example 1: An execution plan containing a SCALAR GROUP BY operator.

obclient>CREATE TABLE t1(c1 INT, c2 INT);
Query OK, 0 rows affected (0.12 sec)

obclient>INSERT INTO t1 VALUES(1, 1);
Query OK, 1 rows affected (0.12 sec)

obclient>INSERT INTO t1 VALUES(2, 2);
Query OK, 1 rows affected (0.12 sec)

obclient>INSERT INTO t1 VALUES(3, 3);
Query OK, 1 rows affected (0.12 sec)

Q1:
obclient> EXPLAIN SELECT SUM(c1) FROM t1\G;
*************************** 1. row ***************************
Query Plan:
| ========================================
|ID|OPERATOR       |NAME|EST. ROWS|COST|
----------------------------------------
|0 |SCALAR GROUP BY|    |1        |37  |
|1 | TABLE SCAN    |T1  |3        |37  |
========================================

Outputs & filters: 
-------------------------------------
  0 - output([T_FUN_SUM(T1.C1)]), filter(nil), 
      group(nil), agg_func([T_FUN_SUM(T1.C1)])
  1 - output([T1.C1]), filter(nil), 
      access([T1.C1]), partitions(p0)

In the preceding example, the Outputs & filters section in the execution plan display of query Q1 shows in detail the output information of the SCALAR GROUP BY operator.

Field	Description
output	The output expression of the operator.
filter	The filter conditions of the operator. In this example, the condition is set to `nil` because no filter is configured for the `SCALAR GROUP BY` operator.
group	Indicates the columns to be grouped. For example, as query Q1 has a `SCALAR GROUP BY` operator, the value of group is `nil`.
agg_func	Indicates the aggregate functions involved. For example, query Q1 is to sum up the data in column `c1` of table `t1`, therefore the value is `T_FUN_SUM(t1.c1)`.

HASH GROUP BY

Example 2: An execution plan containing a HASH GROUP BY operator

Q2: 
obclient>EXPLAIN SELECT SUM(c2) FROM t1 GROUP BY c1 HAVING SUM(c2) > 2\G;
*************************** 1. row ***************************
Query Plan:
| ======================================
|ID|OPERATOR     |NAME|EST. ROWS|COST|
--------------------------------------
|0 |HASH GROUP BY|    |1        |40  |
|1 | TABLE SCAN  |T1  |3        |37  |
======================================

Outputs & filters: 
-------------------------------------
  0 - output([T_FUN_SUM(T1.C2)]), filter([T_FUN_SUM(T1.C2) > 2]), 
      group([T1.C1]), agg_func([T_FUN_SUM(T1.C2)])
  1 - output([T1.C1], [T1.C2]), filter(nil), 
      access([T1.C1], [T1.C2]), partitions(p0)

In the preceding example, the Outputs & filters section in the execution plan display of query Q2 shows in detail the output information of the HASH GROUP BY operator.

Field	Description
output	The output expression of the operator.
filter	The filter conditions of the operator. Because the setting requires that the sum of column `c2` after grouping is greater than 2, the filter is `T_FUN_SUM(t1.c2) > 2`.
group	Indicates the columns to be grouped. For example, as query Q2 has a `HASH GROUP BY` operator, the value of group is `nil`.
agg_func	Indicates the aggregate functions involved. For example, query Q2 is to sum up the data in column `c1` of table `t1`, therefore the value is `T_FUN_SUM(t1.c1)`.

Note

The HASH GROUP BY operator ensures that the data is grouped by HASH algorithm in the execution.

MERGE GROUP BY

Example 3: An execution plan containing a MERGE GROUP BY operator.

Q3: 
obclient>EXPLAIN SELECT /*+NO_USE_HASH_AGGREGATION*/SUM(c2) FROM 
       t1 GROUP BY c1 HAVING SUM(c2) > 2\G;
*************************** 1. row ***************************
Query Plan:
| =======================================
|ID|OPERATOR      |NAME|EST. ROWS|COST|
---------------------------------------
|0 |MERGE GROUP BY|    |1        |45  |
|1 | SORT         |    |3        |44  |
|2 |  TABLE SCAN  |T1  |3        |37  |
=======================================

Outputs & filters: 
-------------------------------------
  0 - output([T_FUN_SUM(T1.C2)]), filter([T_FUN_SUM(T1.C2) > 2]), 
      group([T1.C1]), agg_func([T_FUN_SUM(T1.C2)])
  1 - output([T1.C1], [T1.C2]), filter(nil), sort_keys([T1.C1, ASC])
  2 - output([T1.C1], [T1.C2]), filter(nil), 
      access([T1.C1], [T1.C2]), partitions(p0)

In the preceding example, the Outputs & filters section in the execution plan display of query Q3 shows in detail the information of the MERGE GROUP BY operator. It shows that theMERGE GROUP BY operator was selected for the execution plan that was generated by the same SQL statement, and the basic information of the operator was the same, except for the algorithm selected for grouping in the execution. Additionally, TABLE SCAN, the No. 2 operator in the demo, returned an unsorted result, while the GROUP BY algorithm used the MERGE GROUP BY operator, so a SORT operator must be assigned. Notice

NO_USE_HASH_AGGREGATION and USE_HASH_AGGREGATION hints can control the type of algorithm selected by the GROUP BY operator for grouping.