DISTINCT operators remove duplicate data rows, including duplicate NULL values.
DISTINCT operators fall into the following types: HASH DISTINCT and MERGE DISTINCT.
HASH DISTINCT
The HASH DISTINCT operator uses the HASH algorithm to perform a DISTINCT operation.
Example 1: Use the HASH algorithm to perform a DISTINCT operation to deduplicate column c1 of table t1
obclient> CREATE TABLE t1(c1 INT, c2 INT);
Query OK, 0 rows affected
obclient> CREATE TABLE t2(c1 INT, c2 INT);
Query OK, 0 rows affected
obclient> EXPLAIN SELECT /*+USE_HASH_AGGREGATION*/ DISTINCT c1 FROM t1\G
*************************** 1. row ***************************
Query Plan:
|=======================================
|ID|OPERATOR |NAME|EST. ROWS|COST |
---------------------------------------
|0 |HASH DISTINCT| |101 |99169|
|1 | TABLE SCAN |t1 |100000 |66272|
=======================================
Outputs & filters:
-------------------------------------
0 - output([t1.c1]), filter(nil),
distinct([t1.c1])
1 - output([t1.c1]), filter(nil),
access([t1.c1]), partitions(p0)
In the preceding example, the No. 0 operator HASH DISTINCT in the execution plan display performs the deduplication. The outputs & filters section shows in detail the output information of the HASH DISTINCT operator.
| Parameter | Description |
|---|---|
| output | The output columns of the operator. |
| filter | The filter predicates of the operator. In this example, filter is set to nil because no filter condition is configured for the HASH DISTINCT operator. |
| partition | The partitions to be scanned in the query. |
| distinct | The columns to be deduplicated. For example, the t1.c1 parameter in distinct([t1.c1]) specifies to deduplicate the c1 column of the t1 table by using the HASH algorithm. |
MERGE DISTINCT
The MERGE DISTINCT operator uses the MERGE algorithm to perform a DISTINCT operation.
Example 2: Use the MERGE algorithm to perform a DISTINCT operation.
obclient> EXPLAIN SELECT /*+NO_USE_HASH_AGGREGATION*/ DISTINCT c1 FROM t1\G
*************************** 1. row ***************************
Query Plan:
|=======================================
|ID|OPERATOR |NAME|EST. ROWS|COST|
---------------------------------------
|0 |MERGE DISTINCT| |3 |40 |
|1 | SORT | |3 |39 |
|2 | TABLE SCAN |t1 |3 |37 |
=======================================
Outputs & filters:
-------------------------------------
0 - output([t1.c1]), filter(nil),
distinct([t1.c1])
1 - output([t1.c1]), filter(nil), sort_keys([t1.c1, ASC])
2 - output([t1.c1]), filter(nil),
access([t1.c1]), partitions(p0)
In the preceding example, the No. 0 operator MERGE DISTINCT performs the deduplication by using the MERGE algorithm. Because the MERGE DISTINCT operator only takes ordered input data, a SORT operator is required to sort the unordered output data from the No. 2 operator before deduplication. The outputs & filters section in the execution plan display shows in detail the output information of the MERGE DISTINCT operator.
| Parameter | Description |
|---|---|
| output | The output columns of the operator. |
| filter | The filter predicates of the operator. In this example, filter is set to nil because no filter condition is configured for the MERGE DISTINCT operator. |
| distinct | The columns to be deduplicated. For example, the t1.c1 parameter in distinct([t1.c1]) specifies to deduplicate the c1 column of Table t1 by using the MERGE algorithm. |