A DISTINCT hint directs the optimizer to perform specific actions when processing a query that contains the DISTINCT keyword.
| Hint | Description |
|---|---|
USE_HASH_DISTINCT |
Enables the optimizer to use the HASH algorithm to perform the DISTINCT operation. Its reverse operation is NO_USE_HASH_DISTINCT. |
NO_USE_HASH_DISTINCT |
Disables the optimizer from using the HASH algorithm to perform the DISTINCT operation. Its reverse operation is USE_HASH_DISTINCT. |
DISTINCT_PUSHDOWN |
Directs the optimizer to allow the DISTINCT operation to be pushed down in a distributed environment. Its reverse operation is NO_DISTINCT_PUSHDOWN. |
NO_DISTINCT_PUSHDOWN |
Directs the optimizer to prohibit the DISTINCT operation from being pushed down in a distributed environment. Its reverse operation is DISTINCT_PUSHDOWN. |
USE_HASH_DISTINCT hint
The USE_HASH_DISTINCT hint enables the optimizer to use the HASH algorithm for the DISTINCT operation. The NO_USE_HASH_DISTINCT hint is the negation of this hint.
Syntax
/*+ USE_HASH_DISTINCT [ ( [ @ qb_name ] ) ] */
Examples
In the following example, the USE_HASH_DISTINCT hint is used to enable the HASH deduplication algorithm.
-- Use the `USE_HASH_DISTINCT` hint to enable the `HASH` algorithm for the DISTINCT operation.
EXPLAIN BASIC SELECT /*+ USE_HASH_DISTINCT */ DISTINCT c1, c2, c3 FROM t1;
| Query Plan |
+-------------------------------------------------------------------+
| =========================== |
| |ID|OPERATOR |NAME| |
| --------------------------- |
| |0 |HASH DISTINCT | | |
| |1 |└─TABLE FULL SCAN|T1 | |
| =========================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([T1.C1], [T1.C2], [T1.C3]), filter(nil), rowset=16 |
| distinct([T1.C1], [T1.C2], [T1.C3]) |
| 1 - output([T1.C1], [T1.C2], [T1.C3]), filter(nil), rowset=16 |
| access([T1.C1], [T1.C2], [T1.C3]), partitions(p0) |
| is_index_back=false, is_global_index=false, |
| range_key([T1.__pk_increment]), range(MIN ; MAX)always true |
+-------------------------------------------------------------------+
14 rows in set
NO_USE_HASH_DISTINCT hint
The NO_USE_HASH_DISTINCT hint disables a specific algorithm for executing the DISTINCT operation. The USE_HASH_DISTINCT hint is its inverse operation.
Syntax
/*+ NO_USE_HASH_DISTINCT [ ( [ @ qb_name ] ) ] */
Examples
In the following query, the NO_USE_HASH_DISTINCT hint is used to prevent the optimizer from using the HASH algorithm when executing the DISTINCT operation. Since OceanBase Database supports both the HASH and MERGE algorithms for deduplication, using the NO_USE_HASH_DISTINCT hint actually results in the use of the MERGE deduplication algorithm.
-- Use the `NO_USE_HASH_DISTINCT` hint to ensure that the query does not use the `HASH` algorithm but instead uses the `MERGE` algorithm when executing the `DISTINCT` operation.
EXPLAIN BASIC SELECT /*+ NO_USE_HASH_DISTINCT */ DISTINCT c1, c2, c3 FROM t1;
+-------------------------------------------------------------------+
| Query Plan |
+-------------------------------------------------------------------+
| ============================= |
| |ID|OPERATOR |NAME| |
| ----------------------------- |
| |0 |MERGE DISTINCT | | |
| |1 |└─SORT | | |
| |2 | └─TABLE FULL SCAN|T1 | |
| ============================= |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([T1.C1], [T1.C2], [T1.C3]), filter(nil), rowset=16 |
| distinct([T1.C1], [T1.C2], [T1.C3]) |
| 1 - output([T1.C1], [T1.C2], [T1.C3]), filter(nil), rowset=16 |
| sort_keys([T1.C1, ASC], [T1.C2, ASC], [T1.C3, ASC]) |
| 2 - output([T1.C1], [T1.C2], [T1.C3]), filter(nil), rowset=16 |
| access([T1.C1], [T1.C2], [T1.C3]), partitions(p0) |
| is_index_back=false, is_global_index=false, |
| range_key([T1.__pk_increment]), range(MIN ; MAX)always true |
+-------------------------------------------------------------------+
DISTINCT_PUSHDOWN hint
The DISTINCT_PUSHDOWN hint instructs the optimizer to push down the DISTINCT operation during distributed execution, allowing for local deduplication within each partition node. The reverse operation of this hint is the NO_DISTINCT_PUSHDOWN hint.
Syntax
/*+ DISTINCT_PUSHDOWN [ ( [ @ qb_name ] ) ] */
Examples
The following example shows how to use the DISTINCT_PUSHDOWN hint to enable the distribution of the DISTINCT operation in a distributed query. By default, the DISTINCT operation is pushed down without adding the hint.
-- Use the `DISTINCT_PUSHDOWN` hint to distribute the `DISTINCT` operation in a distributed query.
CREATE TABLE tp1(c1 INT, c2 INT, c3 INT) PARTITION BY HASH(c1) PARTITIONS 10;
EXPLAIN BASIC SELECT /*+ DISTINCT_PUSHDOWN */ DISTINCT c2, c3 FROM tp1;
+---------------------------------------------------------------------------+
| Query Plan |
+---------------------------------------------------------------------------+
| =============================================== |
| |ID|OPERATOR |NAME | |
| ----------------------------------------------- |
| |0 |PX COORDINATOR | | |
| |1 |└─EXCHANGE OUT DISTR |:EX10001| |
| |2 | └─HASH DISTINCT | | |
| |3 | └─EXCHANGE IN DISTR | | |
| |4 | └─EXCHANGE OUT DISTR (HASH)|:EX10000| |
| |5 | └─HASH DISTINCT | | |
| |6 | └─PX PARTITION ITERATOR| | |
| |7 | └─TABLE FULL SCAN |TP1 | |
| =============================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([INTERNAL_FUNCTION(TP1.C2, TP1.C3)]), filter(nil), rowset=16 |
| 1 - output([INTERNAL_FUNCTION(TP1.C2, TP1.C3)]), filter(nil), rowset=16 |
| dop=1 |
| 2 - output([TP1.C2], [TP1.C3]), filter(nil), rowset=16 |
| distinct([TP1.C2], [TP1.C3]) |
| 3 - output([TP1.C2], [TP1.C3]), filter(nil), rowset=16 |
| 4 - output([TP1.C2], [TP1.C3]), filter(nil), rowset=16 |
| (#keys=2, [TP1.C2], [TP1.C3]), dop=1 |
| 5 - output([TP1.C2], [TP1.C3]), filter(nil), rowset=16 |
| distinct([TP1.C2], [TP1.C3]) |
| 6 - output([TP1.C2], [TP1.C3]), filter(nil), rowset=16 |
| force partition granule |
| 7 - output([TP1.C2], [TP1.C3]), filter(nil), rowset=16 |
| access([TP1.C2], [TP1.C3]), partitions(p[0-9]) |
| is_index_back=false, is_global_index=false, |
| range_key([TP1.__pk_increment]), range(MIN ; MAX)always true |
+---------------------------------------------------------------------------+
NO_DISTINCT_PUSHDOWN hint
The NO_DISTINCT_PUSHDOWN hint prevents the optimizer from pushing down the DISTINCT operation during distributed execution. The opposite of this hint is the DISTINCT_PUSHDOWN hint.
Syntax
/*+ NO_DISTINCT_PUSHDOWN [ ( [ @ qb_name ] ) ] */
Examples
The following example shows how to use the NO_DISTINCT_PUSHDOWN hint to prevent the pushing down of the DISTINCT operation for a distributed query.
-- Use the NO_DISTINCT_PUSHDOWN hint to prevent the pushing down of the DISTINCT operation for a distributed query.
CREATE TABLE tp1(c1 INT, c2 INT, c3 INT) PARTITION BY HASH(c1) PARTITIONS 10;
EXPLAIN BASIC SELECT /*+ NO_DISTINCT_PUSHDOWN */ DISTINCT c2, c3 FROM tp1;
+---------------------------------------------------------------------------+
| Query Plan |
+---------------------------------------------------------------------------+
| =============================================== |
| |ID|OPERATOR |NAME | |
| ----------------------------------------------- |
| |0 |PX COORDINATOR | | |
| |1 |└─EXCHANGE OUT DISTR |:EX10001| |
| |2 | └─HASH DISTINCT | | |
| |3 | └─EXCHANGE IN DISTR | | |
| |4 | └─EXCHANGE OUT DISTR (HASH)|:EX10000| |
| |5 | └─PX PARTITION ITERATOR | | |
| |6 | └─TABLE FULL SCAN |TP1 | |
| =============================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([INTERNAL_FUNCTION(TP1.C2, TP1.C3)]), filter(nil), rowset=16 |
| 1 - output([INTERNAL_FUNCTION(TP1.C2, TP1.C3)]), filter(nil), rowset=16 |
| dop=1 |
| 2 - output([TP1.C2], [TP1.C3]), filter(nil), rowset=16 |
| distinct([TP1.C2], [TP1.C3]) |
| 3 - output([TP1.C2], [TP1.C3]), filter(nil), rowset=16 |
| 4 - output([TP1.C2], [TP1.C3]), filter(nil), rowset=16 |
| (#keys=2, [TP1.C2], [TP1.C3]), dop=1 |
| 5 - output([TP1.C2], [TP1.C3]), filter(nil), rowset=16 |
| force partition granule |
| 6 - output([TP1.C2], [TP1.C3]), filter(nil), rowset=16 |
| access([TP1.C2], [TP1.C3]), partitions(p[0-9]) |
| is_index_back=false, is_global_index=false, |
| range_key([TP1.__pk_increment]), range(MIN ; MAX)always true |
+---------------------------------------------------------------------------+
27 rows in set