DISTINCT operators remove duplicate data rows, including duplicate NULL values.
DISTINCT operators fall into the following types: HASH DISTINCT and MERGE DISTINCT.
HASH DISTINCT
The HASH DISTINCT operator uses a hash algorithm to perform a DISTINCT operation.
Example 1: Use a hash algorithm to perform a DISTINCT operation to deduplicate column c1 of table t1.
obclient> CREATE TABLE t1(c1 INT, c2 INT);
Query OK, 0 rows affected
obclient> CREATE TABLE t2(c1 INT, c2 INT);
Query OK, 0 rows affected
obclient> EXPLAIN SELECT /*+USE_HASH_AGGREGATION*/ DISTINCT c1 FROM t1;
Query Plan:
|=======================================
|ID|OPERATOR |NAME|EST. ROWS|COST |
---------------------------------------
|0 |HASH DISTINCT| |101 |99169|
|1 | TABLE SCAN |t1 |100000 |66272|
=======================================
Outputs & filters:
-------------------------------------
0 - output([t1.c1]), filter(nil),
distinct([t1.c1])
1 - output([t1.c1]), filter(nil),
access([t1.c1]), partitions(p0)
In the preceding example, Operator 0 HASH DISTINCT in the execution plan performs the deduplication. The Outputs & filters section shows in detail the output information of the HASH DISTINCT operator.
| Field | Description |
|---|---|
| output | The output columns of the operator. |
| filter | The filter predicates of the operator. In this example, filter is set to nil because no filter condition is configured for the HASH DISTINCT operator. |
| partition | The partitions to be scanned in the query. |
| distinct | The columns to be deduplicated. For example, t1.c1 in distinct([t1.c1]) specifies to deduplicate the c1 column of the t1 table by using a hash algorithm. |
MERGE DISTINCT
The MERGE DISTINCT operator uses a merge algorithm to perform a DISTINCT operation.
Example 2: Use a merge algorithm to perform a DISTINCT operation.
obclient> EXPLAIN SELECT /*+NO_USE_HASH_AGGREGATION*/ DISTINCT c1 FROM t1;
Query Plan:
|=======================================
|ID|OPERATOR |NAME|EST. ROWS|COST|
---------------------------------------
|0 |MERGE DISTINCT| |3 |40 |
|1 | SORT | |3 |39 |
|2 | TABLE SCAN |t1 |3 |37 |
=======================================
Outputs & filters:
-------------------------------------
0 - output([t1.c1]), filter(nil),
distinct([t1.c1])
1 - output([t1.c1]), filter(nil), sort_keys([t1.c1, ASC])
2 - output([t1.c1]), filter(nil),
access([t1.c1]), partitions(p0)
In the preceding example, Operator 0 MERGE DISTINCT performs the deduplication by using a merge algorithm. Because the MERGE DISTINCT operator only takes sorted input data, a SORT operator is required to sort the unsorted output data from Operator 2 before deduplication. The Outputs & filters section shows in detail the output information of the MERGE DISTINCT operator.
| Field | Description |
|---|---|
| output | The output columns of the operator. |
| filter | The filter predicates of the operator. In this example, filter is set to nil because no filter condition is configured for the MERGE DISTINCT operator. |
| distinct | The columns to be deduplicated. For example, t1.c1 in distinct([t1.c1]) specifies to deduplicate the c1 column of table t1 by using a merge algorithm. |