Description
The GENERATOR function is a special built-in function that can be used within the table function framework. Its primary purpose is to generate a virtual table with a specified number of rows. This table does not contain any actual data but provides a structure into which data can be inserted. The GENERATOR function is typically used to generate a set of consecutive rows for batch operations or simulating a dataset.
Table function is an advanced feature in SQL that allows a function to return an entire table rather than just a single value. This means that the table function can produce multiple rows and columns, similar to a regular data table.
TABLE(GENERATOR(N)) can generate a virtual table with N rows, where N is a 64-bit integer greater than or equal to 0. If you want this virtual table to include specific data, you need to add the logic for generating the data after the SELECT statement, such as constants, computed expressions, function calls, or user-defined functions (UDFs).
TABLE(GENERATOR(N)) supports parallel execution. You can use it in parallel SELECT queries or parallel Data Manipulation Language (DML) operations. However, the process of generating these virtual rows is not parallel itself. Only one thread is responsible for reading the rows generated by TABLE(GENERATOR(N)). Therefore, in a Data Flow Operation (DFO) containing TABLE(GENERATOR(N)), the Degree of Parallelism (DOP) setting should be 1.
Considerations
TABLE(GENERATOR(N))generates a virtual table that does not contain actual data, and the data needs to be specified in theSELECTstatement.- Although
TABLE(GENERATOR(N))can be used in parallel operations, the row generation process itself is single-threaded.
Syntax
TABLE(GENERATOR(N))
Parameters
N: A 64-bit positive integer greater than or equal to 0, used to specify the number of rows to generate.
Examples
- Example 1:
obclient [SYS]> SELECT * FROM TABLE(GENERATOR(10));
The above SQL statement will generate a virtual table containing 10 rows, with each row being empty because no specific data to display has been specified.
The result is as follows:
+--------------+
| COLUMN_VALUE |
+--------------+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
| 10 |
+--------------+
10 rows in set
If you want to generate specific data, you can write it like this:
obclient [SYS]> SELECT 'Hello, World!' AS greeting FROM TABLE(GENERATOR(10));
This query will generate a virtual table containing 10 rows, with each row in the greeting column containing the string Hello, World!.
The result is as follows:
+---------------+
| GREETING |
+---------------+
| Hello, World! |
| Hello, World! |
| Hello, World! |
| Hello, World! |
| Hello, World! |
| Hello, World! |
| Hello, World! |
| Hello, World! |
| Hello, World! |
| Hello, World! |
+---------------+
10 rows in set
- Example 2: Use the
GENERATOR()function to generate random numbers that meet certain conditions.
obclient> SELECT COUNT(*) FROM TABLE(GENERATOR(100000));
+----------+
| COUNT(*) |
+----------+
| 100000 |
+----------+
1 row in set
obclient> SELECT NORMAL(0, 1, RANDOM()) FROM TABLE(GENERATOR(5));
+------------------------+
| NORMAL(0, 1, RANDOM()) |
|------------------------|
| 0.227384164 |
| 0.9945290748 |
| -0.2045078571 |
| -1.594607893 |
| -0.8213296842 |
+------------------------+
5 rows in set
obclient> SELECT RANDSTR(1, ZIPF(1, 5, RANDOM())) str FROM TABLE(GENERATOR(5));
+------------------------+
| str |
|------------------------|
| A |
| D |
| A |
| A |
| C |
+------------------------+
5 rows in set
- Example 3: The
GENERATOR()function can also be used to join with other tables, as shown below:
obclient> CREATE TABLE t1 (c1 INT);
Query OK, 0 rows affected
obclient> INSERT INTO t1 VALUES (1), (2);
Query OK, 2 rows affected
Records: 2 Duplicates: 0 Warnings: 0
obclient> SELECT c1, RANDOM(1) FROM t1, TABLE(GENERATOR(3));
+------+----------------------+
| c1 | RANDOM(1) |
+------+----------------------+
| 1 | -6753783847308464280 |
| 2 | -6707106347154343346 |
| 1 | -899926183391115878 |
| 2 | -8835543475904200562 |
| 1 | -2750444335953844424 |
| 2 | 7588216632478230601 |
+------+----------------------+
6 rows in set
obclient> EXPLAIN SELECT c1, RANDOM(1) FROM t1, TABLE(GENERATOR(3));
+----------------------------------------------------------------------------------+
| Query Plan |
+----------------------------------------------------------------------------------+
| ================================================================== |
| |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| |
| ------------------------------------------------------------------ |
| |0 |NESTED-LOOP JOIN CARTESIAN | |398 |14 | |
| |1 |├─FUNCTION_TABLE |FUNC_TABLE1|199 |1 | |
| |2 |└─MATERIAL | |2 |2 | |
| |3 | └─TABLE FULL SCAN |T1 |2 |2 | |
| ================================================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([T1.C1], [RANDOM(cast(1, BIGINT(-1, 0)))]), filter(nil), rowset=256 |
| conds(nil), nl_params_(nil), use_batch=false |
| 1 - output(nil), filter(nil) |
| value(GENERATOR(cast(3, BIGINT(-1, 0)))) |
| 2 - output([T1.C1]), filter(nil), rowset=256 |
| 3 - output([T1.C1]), filter(nil), rowset=256 |
| access([T1.C1]), partitions(p0) |
| is_index_back=false, is_global_index=false, |
| range_key([T1.__pk_increment]), range(MIN ; MAX)always true |
+----------------------------------------------------------------------------------+
19 rows in set
obclient> SELECT /*+ PARALLEL(2) */ c1, RANDOM(1) FROM t1, TABLE(GENERATOR(3));
+------+----------------------+
| c1 | random(1) |
+------+----------------------+
| 1 | -6753783847308464280 |
| 2 | -6707106347154343346 |
| 1 | -899926183391115878 |
| 2 | -8835543475904200562 |
| 1 | -2750444335953844424 |
| 2 | 7588216632478230601 |
+------+----------------------+
6 rows in set
obclient> EXPLAIN SELECT /*+ PARALLEL(2) */ c1, RANDOM(1) FROM t1, TABLE(GENERATOR(3));
+----------------------------------------------------------------------------------+
| Query Plan |
+----------------------------------------------------------------------------------+
| ================================================================== |
| |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| |
| ------------------------------------------------------------------ |
| |0 |NESTED-LOOP JOIN CARTESIAN | |398 |13 | |
| |1 |├─FUNCTION_TABLE |FUNC_TABLE1|199 |1 | |
| |2 |└─MATERIAL | |2 |3 | |
| |3 | └─PX COORDINATOR | |2 |3 | |
| |4 | └─EXCHANGE OUT DISTR |:EX10000 |2 |2 | |
| |5 | └─PX BLOCK ITERATOR | |2 |1 | |
| |6 | └─TABLE FULL SCAN |T1 |2 |1 | |
| ================================================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([T1.C1], [RANDOM(cast(1, BIGINT(-1, 0)))]), filter(nil), rowset=256 |
| conds(nil), nl_params_(nil), use_batch=false |
| 1 - output(nil), filter(nil) |
| value(GENERATOR(cast(3, BIGINT(-1, 0)))) |
| 2 - output([T1.C1]), filter(nil), rowset=256 |
| 3 - output([T1.C1]), filter(nil), rowset=256 |
| 4 - output([T1.C1]), filter(nil), rowset=256 |
| dop=2 |
| 5 - output([T1.C1]), filter(nil), rowset=256 |
| 6 - output([T1.C1]), filter(nil), rowset=256 |
| access([T1.C1]), partitions(p0) |
| is_index_back=false, is_global_index=false, |
| range_key([T1.__pk_increment]), range(MIN ; MAX)always true |
+----------------------------------------------------------------------------------+
26 rows in set