Purpose
This statement stores query results in a variable or file. Specifically:
SELECT ... INTO OUTFILEstores the result set in an external file and allows you to specify the output format.SELECT ... INTO DUMPFILEwrites an unformatted single line to an external file.SELECT ... INTO var_liststores the result set in a variable.
Privilege requirements
To execute the SELECT INTO statement, you must have the SELECT privilege on the target table. For more information about the privileges of OceanBase Database, see Privilege classification in Oracle mode.
Syntax
SELECT [/*+parallel(N)*/] column_list_option
INTO {OUTFILE 'file_name' [PARTITION BY part_expr] [{CHARSET | CHARACTER SET} charset_name] [field_opt] [line_opt] [file_opt]
| OUTFILE 'file_name' [PARTITION BY part_expr] [FORMAT = (external_file_format_list)] [file_opt]
| DUMPFILE 'file_name'
| into_var_list}
FROM table_name_list
[WHERE where_conditions]
[GROUP BY group_by_list [HAVING having_search_conditions]]
[ORDER BY order_expression_list];
column_list_option:
column_name [, column_name ...]
field_opt:
{COLUMNS | FIELDS} field_term_list
field_term_list:
field_term [, field_term ...]
field_term:
{[OPTIONALLY] ENCLOSED | TERMINATED | ESCAPED} BY string
line_opt:
LINES line_term_list
line_term_list:
line_term [line_term]
line_term:
{STARTING | TERMINATED} BY string
file_opt:
file_option [, file_option ...]
file_option:
SINGLE [=] {TRUE | FALSE}
| MAX_FILE_SIZE [=] {int | string}
| BUFFER_SIZE [=] {int | string}
external_file_format_list:
type_csv_option
| type_parquet_option
| type_orc_option
type_csv_option:
TYPE = 'CSV'
LINE_DELIMITER = '<string>' | <expr>
FIELD_DELIMITER = '<string>' | <expr>
ESCAPE = '<character>' | <expr>
FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>
FIELD_ENCLOSED_BY = '<character>' | <expr>
ENCODING = 'charset'
COMPRESSION = [NONE, GZIP, DEFLATE, ZSTD] | '<string>'
FILE_EXTENSION = ['<string>']
type_parquet_option:
TYPE = 'PARQUET',
COMPRESSION = '<string>'
ROW_GROUP_SIZE = '<string>' | <int>
type_orc_option:
TYPE = 'ORC'
COMPRESSION = '<string>'
COMPRESSION_BLOCK_SIZE = '<string>' | <int>
STRIPE_SIZE = '<string>' | <int>
ROW_INDEX_STRIDE = <int>
Parameters
| Parameter | Description |
|---|---|
| parallel(N) | Optional. Specifies the degree of parallelism for executing the statement. |
| column_list_option | Specifies the column options for export. To export all data, use *. column_name: the name of the column. For more information about column options in a query statement, see SIMPLE SELECT. |
| file_name | Specifies the path and name of the export file. For more information, see file_name below. |
| PARTITION BY part_expr | Optional. Specifies the partitioning method for exporting data. The value of part_expr is used as part of the export path. For each row of data, the value of part_expr is calculated. Rows with the same value of part_expr are grouped into the same partition and exported to the same directory.
Notice
|
| FORMAT = (external_file_format_list) | Optional. The FORMAT clause specifies properties related to the export file format. Use TYPE to specify the export file format, which can be CSV, PARQUET, or ORC. TYPE cannot be empty. For more information, see external_file_format_list below. |
| CHARSET | CHARACTER SET charset_name | Optional. Specifies the character set for exporting data to an external file. charset_name specifies the name of the character set. |
| field_opt | Optional. Specifies the field format options for export. Use FIELDS or COLUMNS to specify the format of each field in the output file. For more information, see field_term below. |
| line_opt | Optional. Specifies the start and end delimiters for data rows in the export. Use the LINES clause to set the start and end characters for each row. For more information, see line_term below. |
| file_opt | Optional. Specifies whether to export data to multiple files and the size of each file when exporting to multiple files. For more information, see file_option below. |
| FROM table_name_list | Specifies the data source. |
| WHERE where_conditions | Optional. Specifies the filter conditions. Only data that meets the conditions is included in the query results. For more information about filter conditions in a query statement, see SIMPLE SELECT. |
| GROUP BY group_by_list | Optional. Specifies the fields for grouping, typically used with aggregate functions.
NoteAll columns in the |
| HAVING having_search_conditions | Optional. Filters the grouped data. The HAVING clause is similar to the WHERE clause, but it can use aggregate functions such as SUM and AVG. |
| ORDER BY order_expression_list | Optional. Specifies the columns to sort the result set in ascending (ASC) or descending (DESC) order. If neither ASC nor DESC is specified, the default is ASC.
|
file_name
file_name has the following format:
When you save the exported file to an OBServer node, the format is:
/\$PATH/\$FILENAME, where the parameters are described as follows:\$PATH: specifies the path where the exported file is saved on the OBServer node.\$FILENAME: specifies the name of the exported file. WhenSINGLE = FALSE, it specifies the prefix of the exported file. If not specified, the default prefixdatais used, and the system automatically generates the suffix.
When you save the exported file to Alibaba Cloud Object Storage Service (OSS), the format is:
oss://\$PATH/\$FILENAME/?host=\$HOST&access_id=\$ACCESS_ID&access_key=\$ACCESSKEY, where the parameters are described as follows:\$PATH: specifies the path where the exported file is saved in the bucket.\$FILENAME: specifies the name of the exported file. WhenSINGLE = FALSE, it specifies the prefix of the exported file. If not specified, the default prefixdatais used, and the system automatically generates the suffix.\$HOST: specifies the host name or the domain name of the Content Delivery Network (CDN) accelerated service for accessing the OSS service.\$ACCESS_ID: specifies the Access Key ID required to access the OSS service for authentication.\$ACCESSKEY: specifies the Access Key Secret required to access the OSS service for authentication.
Note
- OceanBase Database supports exporting data to S3 and object storage services that support the S3 protocol.
- Alibaba Cloud OSS has a file size limit. If the size of an exported file exceeds 5 GB, it will be split into multiple files, each of which is less than 5 GB.
When you save the exported file to Hadoop Distributed File System (HDFS), the format is:
Notice
To use the write capability of HDFS external tables, OceanBase Database must be deployed in an environment with the Java SDK. For more information about how to deploy the Java SDK environment, see Deploy the OceanBase Database Java SDK environment.
The export format for SIMPLE authentication:
Basic authentication (without Kerberos):
hdfs://\$NAMENODE:\$PORT/\$PATH/\$FILENAME[?username=\$USERNAME]Description:
- When basic authentication (SIMPLE) is used, you can specify the HDFS username by using the
usernameparameter. - If the
usernameparameter is not specified, the default HDFS username defined in the OceanBase Database configuration is used.
- When basic authentication (SIMPLE) is used, you can specify the HDFS username by using the
The export format for Kerberos authentication:
hdfs://\$NAMENODE:\$PORT/\$PATH/\$FILENAME?principal=\$PRINCIPAL&keytab=\$KEYTAB&krb5conf=\$KRB5CONF&configs=\$CONFIGSDescription:
- For Kerberos authentication, you must provide the Kerberos principal (
principal), keytab (keytab), and Kerberos configuration file (krb5conf). - The
usernameparameter is not applicable in this scenario because Kerberos authentication is identified by theprincipal.
- For Kerberos authentication, you must provide the Kerberos principal (
The parameters are described as follows:
Common parameters:
\$NAMENODE: specifies the IP address of the HDFS NameNode.\$PORT: specifies the port of the HDFS NameNode.\$PATH: specifies the storage path of the file in HDFS.\$FILENAME: specifies the name of the exported file.
Parameters specific to SIMPLE authentication:
?username=\$USERNAME: an optional parameter that specifies the HDFS username. If theusernameparameter is not specified, the default HDFS username defined in the OceanBase Database configuration is used.
Parameters specific to Kerberos authentication:
\$PRINCIPAL: specifies the Kerberos principal name. For example,principal=hdfs/hadoop@EXAMPLE.COM.\$KEYTAB: specifies the path of the Kerberos keytab file. For example,keytab=/path/to/hdfs.keytab.\$KRB5CONF: specifies the path of the Kerberos configuration file. For example,krb5conf=/path/to/krb5.conf.\$CONFIGS: specifies additional HDFS configuration parameters in the key-value pair format. For example,configs=dfs.data.transfer.protection=authentication,privacy.
The following examples show the values of file_name when you save the exported file to HDFS:
SIMPLE authentication (basic authentication):
Applies to an HDFS cluster without Kerberos authentication.
Example 1 (default username):
hdfs://10.10.10.1:9000/data/export/sales_data.csvExample 2 (specified username):
hdfs://10.10.10.1:9000/data/export/sales_data.csv?username=hiveuser
Kerberos authentication:
Applies to an HDFS cluster with Kerberos authentication.
You must specify the query parameters as needed. Multiple configuration items (such as configs) are supported.
Example:
hdfs://10.10.10.1:9000/data/export/sales_data.csv?principal=hdfs/hadoop@EXAMPLE.COM&keytab=/etc/security/keytabs/hdfs.keytab&krb5conf=/etc/krb5.conf&configs=dfs.data.transfer.protection=authentication,privacy
field_term
[OPTIONALLY] ENCLOSED BY string: specifies the character that wraps field values. By default, no quotation marks are used. For example,ENCLOSED BY '"'indicates that character values are enclosed in double quotation marks. If theOPTIONALLYkeyword is used, the specified character is only applied to string values.TERMINATED BY string: specifies the character that separates field values. For example,TERMINATED BY ','indicates that a comma is used to separate field values.ESCAPED BY string: specifies the escape character for handling special characters or parsing data in special formats. The default escape character is the backslash (\).
line_term
STARTING BY string: specifies the starting character of each line.TERMINATED BY string: specifies the ending character of each line. By default, the newline character is used. For example,... LINES TERMINATED BY '\n' ...indicates that a newline character is used as the ending character of each line.
file_option
SINGLE [=] {TRUE | FALSE}: specifies whether to export data to a single file or multiple files.SINGLE [=] TRUE: the default value, which indicates that data can be exported to only one file.SINGLE [=] FALSE: indicates that data can be exported to multiple files.Notice
When the parallelism is greater than 1 and
SINGLE = FALSE, data can be exported to multiple files, achieving parallel read and write and improving the export speed.
MAX_FILE_SIZE [=] {int | string}: specifies the size of a single file during export. This option takes effect only whenSINGLE = FALSE.BUFFER_SIZE [=] {int | string}: specifies the size of memory allocated to each thread for each partition during export. If the data is not partitioned, it is treated as a single partition. The default value is 1 MB.Note
BUFFER_SIZEis used for export performance tuning. If the machine has sufficient memory and you want to improve the export efficiency, you can set a larger value (such as 4 MB). If the machine has insufficient memory, you can set a smaller value (such as 4 KB). If you set the value to 0, all partitions in a single thread use a common memory block.
external_file_format_list
When TYPE = 'CSV', the following fields are included:
LINE_DELIMITER: specifies the line delimiter of the CSV file. The default value isLINE_DELIMITER='\n'.FIELD_DELIMITER: optional. Specifies the column delimiter of the CSV file. The default value isFIELD_DELIMITER='\t'.ESCAPE: specifies the escape character of the CSV file. It must be one byte. The default value isESCAPE ='\'.FIELD_OPTIONALLY_ENCLOSED_BY: optional. Specifies the character used to enclose field values in the CSV file. The default value is empty. This option applies only to certain types of fields (such as CHAR, VARCHAR, TEXT, and JSON).FIELD_ENCLOSED_BY: optional. Specifies the character used to enclose field values in the CSV file. The default value is empty, indicating that no enclosure is applied. This option applies to all types of fields.ENCODING: specifies the character set encoding format of the file. If not specified, the default value isUTF8MB4.COMPRESSION: optional. Specifies the compression format of the file. The supported compression formats are NONE, GZIP, DEFLATE, and ZSTD.GZIP/DEFLATE: GZIP-compressed files.ZSTD: ZSTD-compressed files.NONE: indicates that the file is not compressed (default value).
FILE_EXTENSION: optional. Specifies a custom file extension. This option is used only for multi-file exports and is supported only for the CSV format. If this parameter is not specified, the file extension is determined by the format type. For the CSV format, the default file extension is.csv.
When TYPE = 'PARQUET', the following fields are included:
COMPRESSION: specifies the compression format of the PARQUET file. The default compression suffix is.parquet. The supported compression formats areUNCOMPRESSED(indicating that the file is not compressed),SNAPPY,GZIP,BROTLI,ZSTD,LZ4, andLZ4_HADOOP. If not specified, the default value isUNCOMPRESSED.ROW_GROUP_SIZE: specifies the size of a ROW GROUP in the PARQUET file, in bytes. You can specify a number or a string in the format'64MB'. If not specified, the default value is256MB. We recommend that you use the default value.
When TYPE = 'ORC', the following fields are included:
COMPRESSION: specifies the compression format of the ORC file. The default compression suffix is.orc. The supported compression formats areUNCOMPRESSED(indicating that the file is not compressed),SNAPPY,ZLIB,LZ4, andZSTD. If not specified, the default value isUNCOMPRESSED.COMPRESSION_BLOCK_SIZE: specifies the size of blocks into which data is divided during compression, in bytes. You can specify a number or a string in the format'64KB'. If not specified, the default value is256KB. We recommend that you use the default value.STRIPE_SIZE: specifies the size of a Stripe in the ORC file, in bytes. You can specify a number or a string in the format'64MB'. If not specified, the default value is64MB. We recommend that you use the default value.ROW_INDEX_STRIDE: specifies the frequency of index records. It defines how often an index is recorded. If not specified, the default value is10000. We recommend that you use the default value.
Examples
Export data to a local file
Set the export file path.
To export data, you must set the system variable
secure_file_privto specify the path where the export file can be accessed.Notice
For security reasons, you can only modify the global variable
secure_file_privby connecting to the database through a local socket. For more information, see secure_file_priv.Log in to the OBServer node that is to be connected to the OceanBase database.
ssh admin@xxx.xxx.xxx.xxxExecute the following command to connect to the
oracle001tenant through a local Unix socket.obclient -S /home/admin/oceanbase/run/sql.sock -usys@oracle001 -p******Set the export path to
/home/admin/test_data.obclient> SET GLOBAL secure_file_priv = "/home/admin/test_data";Log out.
After reconnecting to the database, use the
SELECT INTO OUTFILEstatement to export data. Specify a comma as the delimiter between two field values, enclose string values with double quotation marks ("), and use a newline character as the end marker.Write data to a single file in serial mode and specify the file name as
test_tbl1.csv.obclient> SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/test_tbl1.csv' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' FROM test_tbl1;The execution result is as follows:
Query OK, 9 rows affectedWrite data to multiple files in parallel mode without specifying file names, and ensure that each file is no larger than 4 MB.
obclient> SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' SINGLE = FALSE MAX_FILE_SIZE = '4MB' FROM test_tbl1;The execution result is as follows:
Query OK, 9 rows affectedWrite data to multiple files in parallel mode, specify the file name prefix as
dd2024, and ensure that each file is no larger than 4 MB.obclient> SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/dd2024' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' SINGLE = FALSE MAX_FILE_SIZE = '4MB' FROM test_tbl1;The execution result is as follows:
Query OK, 9 rows affected
Note
- When multiple export tasks are performed to the same path at the same time, errors or partial data export may occur. You can avoid these issues by setting the export path appropriately. For example, when the following two statements are executed at the same time:
SELECT /*+parallel(2)*/ * INTO OUTFILE 'test/data' SINGLE = FALSE FROM t1;andSELECT /*+parallel(2)*/ * INTO OUTFILE 'test/data' SINGLE = FALSE FROM t2;, an error may occur because the export file names are the same. In this case, we recommend that you set the export paths totest/data1andtest/data2, respectively. - When
SINGLE = FALSEand the export fails due to an error such as "file already exists", you can delete all files with the same prefix as the export target in the export directory, or delete the export directory and recreate it, and then execute the export statement again. For example, after the following statement fails:SELECT /*+parallel(2)*/ * INTO OUTFILE 'test/data' SINGLE = FALSE FROM t1;, you can delete all files with thedataprefix in thetestdirectory, or directly delete thetestdirectory and recreate it, and then execute the export statement again.
Log in to the server and view the exported files in the
/home/admin/test_datadirectory of the OBServer node.
[xxx@xxx /home/admin/test_data]# ls
The execution result is as follows:
data_0_0_0 data_0_1_0 dd2024_0_0_0 dd2024_0_1_0 test_tbl1.csv
In the result, test_tbl1.csv is the file name of the exported data in the serial single-file export mode. data_0_0_0 and data_0_1_0 are the file names of the exported data in the parallel multi-file export mode without specifying file names. dd2024_0_0_0 and dd2024_0_1_0 are the file names of the exported data in the parallel multi-file export mode with the file name prefix specified as dd2024.
Export data to an OSS bucket
Use the SELECT INTO OUTFILE statement to export data from the test_tbl2 table to a specified OSS bucket by partition. The partitioning is based on the combination of the col1 and col2 columns. Rows with the same values in these columns are stored in the same partition and exported to the same directory.
obclient> SELECT /*+parallel(3)*/ *
INTO OUTFILE 'oss://$DATA_FOLDER_NAME/?host=$OSS_HOST&access_id=$OSS_ACCESS_ID&access_key=$OSS_ACCESS_KEY'
PARTITION BY CONCAT(col1,'/',col2)
SINGLE = FALSE BUFFER_SIZE = '2MB'
FROM test_tbl2;
The storage path is specified by the $DATA_FOLDER_NAME variable. You also need to provide the OSS host address, Access ID, and Access Key.
Export data to HDFS
Notice
To use the HDFS external table write feature, OceanBase Database must be deployed in an environment with the Java SDK. For more information about how to deploy the Java SDK environment, see Deploy the OceanBase Database Java SDK environment.
Export data with SIMPLE authentication
Export data from the lineitem_external table of OceanBase Database to HDFS and save it in CSV format.
SELECT *
INTO OUTFILE 'hdfs://${namenode}:${port}/path/to/lineitem.csv'
FORMAT = (
TYPE = 'CSV'
LINE_DELIMITER = '\n'
FIELD_DELIMITER = ','
FIELD_OPTIONALLY_ENCLOSED_BY ='"'
)
SINGLE = TRUE
FROM lineitem_external;
Export data with KERBEROS authentication
Unlike SIMPLE authentication, KERBEROS authentication requires additional configuration files and key files. Export data from the lineitem_external table of OceanBase Database to HDFS and save it in CSV format.
SELECT *
INTO OUTFILE 'hdfs://${namenode}:${port}/path/to/lineitem.csv?principal=hdfs/hadoop@EXAMPLE.COM&keytab=/path/to/hdfs.keytab&krb5conf=/path/to/krb5.conf&configs=dfs.data.transfer.protection=authentication,privacy'
FORMAT = (
TYPE = 'CSV'
LINE_DELIMITER = '\n'
FIELD_DELIMITER = ','
FIELD_OPTIONALLY_ENCLOSED_BY ='"'
)
SINGLE = TRUE
FROM lineitem_external;
