DISTINCT

2024-03-05 01:54:27  Updated

DISTINCT operators remove duplicate data rows, including duplicate NULL values.

DISTINCT operators fall into the following types: HASH DISTINCT and MERGE DISTINCT.

HASH DISTINCT

The HASH DISTINCT operator uses a hash algorithm to perform a DISTINCT operation.

Example 1: Use a hash algorithm to perform a DISTINCT operation to deduplicate column c1 of table t1.

obclient> CREATE TABLE t1(c1 INT, c2 INT);
Query OK, 0 rows affected

obclient> CREATE TABLE t2(c1 INT, c2 INT);
Query OK, 0 rows affected

obclient> EXPLAIN SELECT /*+USE_HASH_AGGREGATION*/ DISTINCT c1 FROM t1;

Query Plan:
|=======================================
|ID|OPERATOR     |NAME|EST. ROWS|COST |
---------------------------------------
|0 |HASH DISTINCT|    |101      |99169|
|1 | TABLE SCAN  |t1  |100000   |66272|
=======================================
Outputs & filters:
-------------------------------------
  0 - output([t1.c1]), filter(nil),
      distinct([t1.c1])
  1 - output([t1.c1]), filter(nil),
      access([t1.c1]), partitions(p0)

In the preceding example, Operator 0 HASH DISTINCT in the execution plan performs the deduplication. The Outputs & filters section shows in detail the output information of the HASH DISTINCT operator.

Field Description
output The output columns of the operator.
filter The filter predicates of the operator. In this example, filter is set to nil because no filter condition is configured for the HASH DISTINCT operator.
partition The partitions to be scanned in the query.
distinct The columns to be deduplicated. For example, t1.c1 in distinct([t1.c1]) specifies to deduplicate the c1 column of the t1 table by using a hash algorithm.

MERGE DISTINCT

The MERGE DISTINCT operator uses a merge algorithm to perform a DISTINCT operation.

Example 2: Use a merge algorithm to perform a DISTINCT operation.

obclient> EXPLAIN SELECT /*+NO_USE_HASH_AGGREGATION*/ DISTINCT c1 FROM t1;

Query Plan:
|=======================================
|ID|OPERATOR      |NAME|EST. ROWS|COST|
---------------------------------------
|0 |MERGE DISTINCT|    |3        |40  |
|1 | SORT         |    |3        |39  |
|2 |  TABLE SCAN  |t1  |3        |37  |
=======================================
Outputs & filters:
-------------------------------------
  0 - output([t1.c1]), filter(nil),
      distinct([t1.c1])
  1 - output([t1.c1]), filter(nil), sort_keys([t1.c1, ASC])
  2 - output([t1.c1]), filter(nil),
      access([t1.c1]), partitions(p0)

In the preceding example, Operator 0 MERGE DISTINCT performs the deduplication by using a merge algorithm. Because the MERGE DISTINCT operator only takes sorted input data, a SORT operator is required to sort the unsorted output data from Operator 2 before deduplication. The Outputs & filters section shows in detail the output information of the MERGE DISTINCT operator.

Field Description
output The output columns of the operator.
filter The filter predicates of the operator. In this example, filter is set to nil because no filter condition is configured for the MERGE DISTINCT operator.
distinct The columns to be deduplicated. For example, t1.c1 in distinct([t1.c1]) specifies to deduplicate the c1 column of table t1 by using a merge algorithm.

Contact Us