The GI operator divides a table's data access task into subtasks based on partition or data block granularity, generating multiple parallelizable subtask units to iterate over the entire table.
GI Operator Types
Based on the iteration granularity, OceanBase Database primarily uses two types of GI operators (PX PARTITION ITERATOR and PX BLOCK ITERATOR) to adapt to different query scenarios and performance requirements.
Operator type |
Feature Description |
Core Principles |
Application scenarios |
|---|---|---|---|
| PX PARTITION ITERATOR | Data iteration and task partitioning are performed at the granularity of partitions. | Utilize the table's own partitioning logic (such asHASH、RANGEEach partition is treated as an independent parallel work unit. The scheduler assigns a worker thread to each partition for scanning. |
Applicable to parallel full-table scans or partition key condition queries on partitioned tables. When data distribution is even, it achieves good load balancing. |
| PX BLOCK ITERATOR | Data iteration and task partitioning are performed at the granularity of data blocks. | Ignoring logical partition boundaries, tasks are directly split based on the physical distribution of data at the storage layer (such as macroblocks or microblocks). This results in finer granularity and generates significantly more task units than the number of partitions. | It is suitable for scenarios requiring extremely high parallelism, especially when performing parallel scans on non-partitioned tables or a single large partition, allowing for more efficient utilization of cluster computing resources. |
PX PARTITION ITERATOR
The PX PARTITION ITERATOR operator iterates data at the partition granularity.
PX PARTITION ITERATOR Example
Create table
tbl1.obclient> CREATE TABLE tbl1 (col1 INT, col2 INT) PARTITION BY HASH(col1) PARTITIONS 4;Create index
idx_tbl1on columncol1in tabletbl1.obclient> CREATE INDEX idx_tbl1 ON tbl1 (col1);Q1: Use the
FULLhint for a full table scan, query columncol1from tabletbl1, and view the execution plan.obclient> EXPLAIN SELECT /*+FULL(tbl1)*/ col1 FROM tbl1;The result is as follows:
+----------------------------------------------------------------------+ | Query Plan | +----------------------------------------------------------------------+ | ============================================================= | | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | | ------------------------------------------------------------- | | |0 |PX COORDINATOR | |1 |10 | | | |1 |└─EXCHANGE OUT DISTR |:EX10000|1 |10 | | | |2 | └─PX PARTITION ITERATOR| |1 |9 | | | |3 | └─TABLE FULL SCAN |TBL1 |1 |9 | | | ============================================================= | | Outputs & filters: | | ------------------------------------- | | 0 - output([INTERNAL_FUNCTION(TBL1.COL1)]), filter(nil), rowset=16 | | 1 - output([INTERNAL_FUNCTION(TBL1.COL1)]), filter(nil), rowset=16 | | dop=1 | | 2 - output([TBL1.COL1]), filter(nil), rowset=16 | | force partition granule | | 3 - output([TBL1.COL1]), filter(nil), rowset=16 | | access([TBL1.COL1]), partitions(p[0-3]) | | is_index_back=false, is_global_index=false, | | range_key([TBL1.__pk_increment]), range(MIN ; MAX)always true | +----------------------------------------------------------------------+ 19 rows in set
In the Q1 query result, besides the PX PARTITION ITERATOR operator (operator 2), the following execution operators are also displayed:
PX COORDINATORandEXCHANGE OUT DISTR: Both operators belong to theEXCHANGEfamily and are used for inter-thread data exchange.PX COORDINATORis a special case ofEXCHANGE IN, responsible not only for pulling back remote data but also for scheduling subplan execution. For details, see EXCHANGE.TABLE FULL SCAN: This operator belongs to theTABLE SCANfamily and shows which index (or primary table) the optimizer chooses to access data. For details, see TABLE SCAN.
The Outputs & filters section in the Q1 execution plan display lists the output information for the PX PARTITION ITERATOR operator as follows:
Information Name |
Meaning |
Example |
|---|---|---|
| output | The list of columns or expressions output by the operator. | output([TBL1.COL1])indicates that this operator outputs the table.tbl1in thecol1Column. |
| filter | The operator requires a predicate to filter the application. | filter(nil)Indicates that no further rows need to be filtered. |
| rowset | Indicates the vectorization size of the current operator. | rowset=16indicates that the vectorization size of the current operator is 16. |
| force partition granule | This is a key hint, indicating that the query optimizer forces parallel scanning and iteration at the partition granularity. | force partition granuleThe presence of this indicates that the query explicitly targets the table.tbl1Each partition of the table (p0, p1, p2, p3) as the basic unit of parallel work. |
PX BLOCK ITERATOR
The PX BLOCK ITERATOR operator iterates data at the data block granularity.
Compared to PX PARTITION ITERATOR, PX BLOCK ITERATOR operates at a finer granularity, dividing data into more tasks and supporting higher parallelism.
PX BLOCK ITERATOR Example
Create table
tbl2.obclient> CREATE TABLE tbl2 (col1 INT, col2 INT) PARTITION BY HASH(col1) PARTITIONS 4;Create index
idx_tbl2on columncol1of tabletbl2.obclient> CREATE INDEX idx_tbl2 ON tbl2 (col1);Q2: Use the
PARALLELhint to specify a degree of parallelism of 4, query columncol1of tabletbl2, and view the execution plan for this query.obclient> EXPLAIN SELECT /*+PARALLEL(4)*/ col1 FROM tbl2;The return result is as follows:
+------------------------------------------------------------------------------------------+ | Query Plan | +------------------------------------------------------------------------------------------+ | =============================================================== | | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | | --------------------------------------------------------------- | | |0 |PX COORDINATOR | |1 |1 | | | |1 |└─EXCHANGE OUT DISTR |:EX10000 |1 |1 | | | |2 | └─PX BLOCK ITERATOR| |1 |1 | | | |3 | └─TABLE FULL SCAN|TBL2(IDX_TBL2)|1 |1 | | | =============================================================== | | Outputs & filters: | | ------------------------------------- | | 0 - output([INTERNAL_FUNCTION(TBL2.COL1)]), filter(nil), rowset=16 | | 1 - output([INTERNAL_FUNCTION(TBL2.COL1)]), filter(nil), rowset=16 | | dop=4 | | 2 - output([TBL2.COL1]), filter(nil), rowset=16 | | 3 - output([TBL2.COL1]), filter(nil), rowset=16 | | access([TBL2.COL1]), partitions(p0) | | is_index_back=false, is_global_index=true, | | range_key([TBL2.COL1], [TBL2.__pk_increment]), range(MIN,MIN ; MAX,MAX)always true | +------------------------------------------------------------------------------------------+ 18 rows in set
The Outputs & filters section in the execution plan display for Q2 above lists in detail the output information for the PX BLOCK ITERATOR operator. The meaning of the fields is the same as that for the PX PARTITION ITERATOR operator.
