The DISTINCT hint is used to guide the optimizer to take specific actions when processing queries that contain the DISTINCT keyword.
| Hint type | Description |
|---|---|
USE_HASH_DISTINCT |
Enables the optimizer to use the HASH algorithm for executing DISTINCT operations. Its reverse operation is the NO_USE_HASH_DISTINCT hint. |
NO_USE_HASH_DISTINCT |
Disables the optimizer from using the HASH algorithm for executing DISTINCT operations. Its reverse operation is the USE_HASH_DISTINCT hint. |
DISTINCT_PUSHDOWN |
Indicates that the optimizer allows DISTINCT operations to be pushed down in distributed environments. Its reverse operation is the NO_DISTINCT_PUSHDOWN hint. |
NO_DISTINCT_PUSHDOWN |
Indicates that the optimizer prohibits DISTINCT operations from being pushed down in distributed environments. Its reverse operation is the DISTINCT_PUSHDOWN hint. |
USE_HASH_DISTINCT Hint
The USE_HASH_DISTINCT hint enables the optimizer to use the HASH algorithm for the DISTINCT operation. Its reverse operation is the NO_USE_HASH_DISTINCT hint.
Syntax
/*+ USE_HASH_DISTINCT [ ( [ @ qb_name ] ) ] */
Examples
The following query example shows how to use the USE_HASH_DISTINCT hint to enable the HASH deduplication algorithm.
-- Use the `USE_HASH_DISTINCT` hint to make the query use the `HASH` algorithm for the `DISTINCT` operation.
EXPLAIN BASIC SELECT /*+ USE_HASH_DISTINCT */ DISTINCT c1, c2, c3 FROM t1;
| Query Plan |
+-------------------------------------------------------------------+
| =========================== |
| |ID|OPERATOR |NAME| |
| --------------------------- |
| |0 |HASH DISTINCT | | |
| |1 |└─TABLE FULL SCAN|T1 | |
| =========================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([T1.C1], [T1.C2], [T1.C3]), filter(nil), rowset=16 |
| distinct([T1.C1], [T1.C2], [T1.C3]) |
| 1 - output([T1.C1], [T1.C2], [T1.C3]), filter(nil), rowset=16 |
| access([T1.C1], [T1.C2], [T1.C3]), partitions(p0) |
| is_index_back=false, is_global_index=false, |
| range_key([T1.__pk_increment]), range(MIN ; MAX)always true |
+-------------------------------------------------------------------+
14 rows in set
NO_USE_HASH_DISTINCT Hint
The NO_USE_HASH_DISTINCT hint disables the specific algorithm used for the distinct operation. Its opposite is the USE_HASH_DISTINCT hint.
Syntax
/*+ NO_USE_HASH_DISTINCT [ ( [ @ qb_name ] ) ] */
Examples
In the following query example, the NO_USE_HASH_DISTINCT hint is used to disable the optimizer from using the HASH algorithm for the DISTINCT operation. Since OceanBase Database supports both the HASH and MERGE deduplication algorithms, using the NO_USE_HASH_DISTINCT hint results in the use of the MERGE deduplication algorithm.
-- Use the `NO_USE_HASH_DISTINCT` hint to ensure that the `DISTINCT` operation in the query does not use the `HASH` algorithm but instead uses the `MERGE` algorithm.
EXPLAIN BASIC SELECT /*+ NO_USE_HASH_DISTINCT */ DISTINCT c1, c2, c3 FROM t1;
+-------------------------------------------------------------------+
| Query Plan |
+-------------------------------------------------------------------+
| ============================= |
| |ID|OPERATOR |NAME| |
| ----------------------------- |
| |0 |MERGE DISTINCT | | |
| |1 |└─SORT | | |
| |2 | └─TABLE FULL SCAN|T1 | |
| ============================= |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([T1.C1], [T1.C2], [T1.C3]), filter(nil), rowset=16 |
| distinct([T1.C1], [T1.C2], [T1.C3]) |
| 1 - output([T1.C1], [T1.C2], [T1.C3]), filter(nil), rowset=16 |
| sort_keys([T1.C1, ASC], [T1.C2, ASC], [T1.C3, ASC]) |
| 2 - output([T1.C1], [T1.C2], [T1.C3]), filter(nil), rowset=16 |
| access([T1.C1], [T1.C2], [T1.C3]), partitions(p0) |
| is_index_back=false, is_global_index=false, |
| range_key([T1.__pk_increment]), range(MIN ; MAX)always true |
+-------------------------------------------------------------------+
DISTINCT_PUSHDOWN Hint
The DISTINCT_PUSHDOWN hint instructs the optimizer to push down the DISTINCT operation during distributed execution, allowing for local deduplication at each partition node. Its counterpart is the NO_DISTINCT_PUSHDOWN hint.
Syntax
/*+ DISTINCT_PUSHDOWN [ ( [ @ qb_name ] ) ] */
Examples
The following example demonstrates how to use the DISTINCT_PUSHDOWN hint to enable the pushdown of the DISTINCT operation during distributed execution. If no hint is specified, the DISTINCT operation is pushed down by default.
-- Use the `DISTINCT_PUSHDOWN` hint to push down the `DISTINCT` operation to each partition node during distributed query execution.
CREATE TABLE tp1(c1 INT, c2 INT, c3 INT) PARTITION BY HASH(c1) PARTITIONS 10;
EXPLAIN BASIC SELECT /*+ DISTINCT_PUSHDOWN */ DISTINCT c2, c3 FROM tp1;
+---------------------------------------------------------------------------+
| Query Plan |
+---------------------------------------------------------------------------+
| =============================================== |
| |ID|OPERATOR |NAME | |
| ----------------------------------------------- |
| |0 |PX COORDINATOR | | |
| |1 |└─EXCHANGE OUT DISTR |:EX10001| |
| |2 | └─HASH DISTINCT | | |
| |3 | └─EXCHANGE IN DISTR | | |
| |4 | └─EXCHANGE OUT DISTR (HASH)|:EX10000| |
| |5 | └─HASH DISTINCT | | |
| |6 | └─PX PARTITION ITERATOR| | |
| |7 | └─TABLE FULL SCAN |TP1 | |
| =============================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([INTERNAL_FUNCTION(TP1.C2, TP1.C3)]), filter(nil), rowset=16 |
| 1 - output([INTERNAL_FUNCTION(TP1.C2, TP1.C3)]), filter(nil), rowset=16 |
| dop=1 |
| 2 - output([TP1.C2], [TP1.C3]), filter(nil), rowset=16 |
| distinct([TP1.C2], [TP1.C3]) |
| 3 - output([TP1.C2], [TP1.C3]), filter(nil), rowset=16 |
| 4 - output([TP1.C2], [TP1.C3]), filter(nil), rowset=16 |
| (#keys=2, [TP1.C2], [TP1.C3]), dop=1 |
| 5 - output([TP1.C2], [TP1.C3]), filter(nil), rowset=16 |
| distinct([TP1.C2], [TP1.C3]) |
| 6 - output([TP1.C2], [TP1.C3]), filter(nil), rowset=16 |
| force partition granule |
| 7 - output([TP1.C2], [TP1.C3]), filter(nil), rowset=16 |
| access([TP1.C2], [TP1.C3]), partitions(p[0-9]) |
| is_index_back=false, is_global_index=false, |
| range_key([TP1.__pk_increment]), range(MIN ; MAX)always true |
+---------------------------------------------------------------------------+
NO_DISTINCT_PUSHDOWN Hint
The NO_DISTINCT_PUSHDOWN hint prevents the optimizer from pushing down the DISTINCT operation during distributed execution. The opposite of this hint is the DISTINCT_PUSHDOWN hint.
Syntax
/*+ NO_DISTINCT_PUSHDOWN [ ( [ @ qb_name ] ) ] */
Examples
The following example shows how to use the NO_DISTINCT_PUSHDOWN hint to prevent the optimizer from pushing down the DISTINCT operation during distributed execution.
-- Use the NO_DISTINCT_PUSHDOWN hint to prevent the optimizer from pushing down the DISTINCT operation during distributed execution.
CREATE TABLE tp1(c1 INT, c2 INT, c3 INT) PARTITION BY HASH(c1) PARTITIONS 10;
EXPLAIN BASIC SELECT /*+ NO_DISTINCT_PUSHDOWN */ DISTINCT c2, c3 FROM tp1;
+---------------------------------------------------------------------------+
| Query Plan |
+---------------------------------------------------------------------------+
| =============================================== |
| |ID|OPERATOR |NAME | |
| ----------------------------------------------- |
| |0 |PX COORDINATOR | | |
| |1 |└─EXCHANGE OUT DISTR |:EX10001| |
| |2 | └─HASH DISTINCT | | |
| |3 | └─EXCHANGE IN DISTR | | |
| |4 | └─EXCHANGE OUT DISTR (HASH)|:EX10000| |
| |5 | └─PX PARTITION ITERATOR | | |
| |6 | └─TABLE FULL SCAN |TP1 | |
| =============================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([INTERNAL_FUNCTION(TP1.C2, TP1.C3)]), filter(nil), rowset=16 |
| 1 - output([INTERNAL_FUNCTION(TP1.C2, TP1.C3)]), filter(nil), rowset=16 |
| dop=1 |
| 2 - output([TP1.C2], [TP1.C3]), filter(nil), rowset=16 |
| distinct([TP1.C2], [TP1.C3]) |
| 3 - output([TP1.C2], [TP1.C3]), filter(nil), rowset=16 |
| 4 - output([TP1.C2], [TP1.C3]), filter(nil), rowset=16 |
| (#keys=2, [TP1.C2], [TP1.C3]), dop=1 |
| 5 - output([TP1.C2], [TP1.C3]), filter(nil), rowset=16 |
| force partition granule |
| 6 - output([TP1.C2], [TP1.C3]), filter(nil), rowset=16 |
| access([TP1.C2], [TP1.C3]), partitions(p[0-9]) |
| is_index_back=false, is_global_index=false, |
| range_key([TP1.__pk_increment]), range(MIN ; MAX)always true |
+---------------------------------------------------------------------------+
27 rows in set