Purpose
You can use this statement to store query results in a variable, a file, or a table. Here are the details:
SELECT ... INTO OUTFILEstores the result set in an external file, and you can specify the output format.SELECT ... INTO DUMPFILEwrites an unformatted single line to an external file.SELECT ... INTO var_liststores the result set in variables.
Privilege requirements
To execute the SELECT INTO statement, you must have the SELECT privilege on the table. For more information about OceanBase Database privileges, see Privilege types in the Oracle-compatible mode.
Syntax
SELECT [/*+parallel(N)*/] column_list_option
INTO {OUTFILE 'file_name' [PARTITION BY part_expr] [{CHARSET | CHARACTER SET} charset_name] [field_opt] [line_opt] [file_opt]
| OUTFILE 'file_name' [PARTITION BY part_expr] [FORMAT = (external_file_format_list)] [file_opt]
| DUMPFILE 'file_name'
| into_var_list}
FROM table_name_list
[WHERE where_conditions]
[GROUP BY group_by_list [HAVING having_search_conditions]]
[ORDER BY order_expression_list];
column_list_option:
column_name [, column_name ...]
field_opt:
{COLUMNS | FIELDS} field_term_list
field_term_list:
field_term [, field_term ...]
field_term:
{[OPTIONALLY] ENCLOSED | TERMINATED | ESCAPED} BY string
line_opt:
LINES line_term_list
line_term_list:
line_term [line_term]
line_term:
{STARTING | TERMINATED} BY string
file_opt:
file_option [, file_option ...]
file_option:
SINGLE [=] {TRUE | FALSE}
| MAX_FILE_SIZE [=] {int | string}
| BUFFER_SIZE [=] {int | string}
external_file_format_list:
type_csv_option
| type_parquet_option
| type_orc_option
type_csv_option:
TYPE = 'CSV'
LINE_DELIMITER = '<string>' | <expr>
FIELD_DELIMITER = '<string>' | <expr>
ESCAPE = '<character>' | <expr>
FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>
FIELD_ENCLOSED_BY = '<character>' | <expr>
ENCODING = 'charset'
COMPRESSION = [NONE, GZIP, DEFLATE, ZSTD] | '<string>'
FILE_EXTENSION = ['<string>']
type_parquet_option:
TYPE = 'PARQUET',
COMPRESSION = '<string>'
ROW_GROUP_SIZE = '<string>' | <int>
type_orc_option:
TYPE = 'ORC'
COMPRESSION = '<string>'
COMPRESSION_BLOCK_SIZE = '<string>' | <int>
STRIPE_SIZE = '<string>' | <int>
ROW_INDEX_STRIDE = <int>
Parameters
| Parameter | Description |
|---|---|
| parallel(N) | Optional. Specifies the degree of parallelism for executing the statement. |
| column_list_option | Specifies the column options for export. To select all data, use *.column_name: Column name. For more information on column options in query statements, refer to SIMPLE SELECT. |
| file_name | Specifies the path and name of the export file. For detailed information, refer to file_name below. |
| PARTITION BY part_expr |
NoteFor OceanBase Database V4.3.2, the partitioning method for exported data is controllable since V4.3.2 BP1. part_expr serves as part of the export path. For each row of data, the system calculates the value of part_expr. Rows with the same value of part_expr are considered to be in the same partition and exported to the same directory.
Note
|
| FORMAT = (external_file_format_list) | Optional. The FORMAT clause specifies attributes related to the export file format. Use TYPE to specify the exported file format, which can be CSV, PARQUET, or ORC. TYPE cannot be left empty. For more information, see external_file_format_list. |
| CHARSET | CHARACTER SET charset_name | Optional. Specifies the character set to export data to an external file. charset_name indicates the name of the character set. |
| field_opt | Optional. Specifies the format of fields. You can specify it by using the FIELDS or COLUMNS clause. For more information, see field_term. |
| line_opt | Optional. Specifies the start and end characters of data rows. You can specify it by using the LINES clause. For more information, see line_term. |
| file_opt | Optional. Specifies whether to export to multiple files and the size of each file when exporting to multiple files. For detailed information, refer to file_option below. |
| FROM table_name_list | Specifies the object from which to select data. |
| WHERE where_conditions | Optional. Specifies filtering conditions so that the query result only includes data that meets the conditions. For more information on filtering in query statements, refer to SIMPLE SELECT. |
| GROUP BY group_by_list | Optional. Specifies the fields for grouping, typically used in conjunction with aggregate functions.
ExplanationAll columns following the |
| HAVING having_search_conditions | Optional. Filters grouped data. The HAVING clause is similar to the WHERE clause, but the HAVING clause can use aggregate functions (such as SUM and AVG.). |
| ORDER BY order_expression_list | Optional. Specifies how the result set is ordered by one or more columns, using ASC or DESC to display the query results. If neither ASC nor DESC is specified, the default is ASC.
|
file_name
The file_name parameter is in the following format:
When you export a file to an OBServer node, use the format of
\$PATH/\$FILENAME. The parameters are described as follows:\$PATH: specifies the path to save the exported file, which is the path of the exported file on the OBServer node.\$FILENAME: the name of the file to be exported. WhenSINGLE = FALSE, it specifies the prefix of the exported file. If you do not specify the\$FILENAMEparameter, the system sets the prefix of the exported file to the default valuedata. The system automatically generates the file extension based on the export type.
When you export a file to OSS, the save path is in the format of
oss://\$PATH/\$FILENAME/?host=\$HOST&access_id=\$ACCESS_ID&access_key=\$ACCESSKEY. The parameters are described as follows:\$PATH: the path to save the exported file, which specifies the file path in the bucket.\$FILENAME: the name of the file to be exported. WhenSINGLE = FALSE, it specifies the prefix of the exported file. If you do not specify the name, the system uses the default prefixdata. The system automatically generates the suffix.\$HOST: specifies the endpoint of the OSS service or the CDN domain name for accelerated access to the OSS service.\$ACCESS_ID: specifies the Access Key ID required for accessing OSS service, which is used for authentication.\$ACCESSKEY: specifies the AccessKey Secret required for accessing the OSS service.
Note
- For OceanBase Database V4.3.5,
SELECT INTOcan be used to export data to objects stored on S3 and objects stored on services that comply with the S3 protocol, starting from V4.3.5 BP2. - Alibaba Cloud's Object Storage Service (OSS) has a file size limit. If you want to export a file larger than 5 GB to OSS, the file will be split into multiple files, each smaller than 5 GB.
field_term
[OPTIONALLY] ENCLOSED BY string: specifies the characters that enclose field values. By default, values are not enclosed. For example,ENCLOSED BY '"'indicates that character values are enclosed in double quotation marks. If theOPTIONALLYkeyword is used, the specified characters are used to enclose only string values.TERMINATED BY string: specifies the symbol that separates field values. For example,TERMINATED BY ','uses a comma to separate two field values.ESCAPED BY string: specifies the escape character to process special characters or parse data in a special format. The default escape character is the backslash (\).
line_term
STARTING BY string: specifies the starting character of each row.TERMINATED BY string: specifies the ending character of each line. By default, a line is terminated by a line break. For example,... LINES TERMINATED BY '\n' ...indicates that each line is terminated by a line break.
file_option
SINGLE [=] {TRUE | FALSE}: specifies whether to export data to a single file or multiple files.SINGLE [=] TRUE: specifies to export data to a single file. This is the default value.SINGLE [=] FALSE: specifies to export data to multiple files.Notice
When the degree of parallelism is greater than 1 and
SINGLE = FALSE, data can be exported to multiple files to achieve parallel read and write and improve the export speed.
MAX_FILE_SIZE [=] {int | string}: specifies the maximum size of a single file when data is exported to multiple files. This parameter is effective only whenSINGLE = FALSE.BUFFER_SIZE [=] {int | string}: specifies the size of memory that is separately applied for each partition by each thread when data is exported. The value of this parameter represents the size of memory applied for a single partition when data is exported without partitioning. The default value is 1 MB.Note
BUFFER_SIZEis used for export performance tuning. If sufficient physical memory is available and high export efficiency is desired, you can set this parameter to a large value (for example, 4 MB). If insufficient physical memory is available, you can set this parameter to a small value (for example, 4 KB). If you set this parameter to 0, all partitions of a single thread share a public memory.- Starting from OceanBase Database V4.3.2 BP1, the
BUFFER_SIZEparameter is supported.
external_file_format_list
When TYPE = 'CSV', the file contains the following columns:
LINE_DELIMITER: specifies the line delimiter of the CSV file. The default value is'\n'.FIELD_DELIMITER: specifies the field delimiter of the CSV file. The default value is'\t'.ESCAPE: specifies the escape character of the CSV file. It must be 1 byte in length. The default value is'\'.FIELD_OPTIONALLY_ENCLOSED_BY: specifies the characters that enclose the field values of the CSV file. This parameter is optional. The default value is an empty string. This option allows you to add enclosing characters only to specific types of fields, such as CHAR, VARCHAR, TEXT, and JSON.FIELD_ENCLOSED_BY: specifies the characters that enclose the field values of the CSV file. This parameter is optional. The default value is an empty string, which specifies not to add enclosing characters. This option adds enclosing characters to all types of fields.ENCODING: specifies the character set encoding format of the file. If this parameter is not specified, the default valueUTF8MB4takes effect.COMPRESSION: specifies the compression format of the file. This parameter is optional. The supported compression formats are NONE, GZIP, DEFLATE, and ZSTD.GZIP/DEFLATE: specifies that the file is a GZIP-compressed file.ZSTD: specifies that the file is a ZSTD-compressed file.NONE: specifies that the file is not compressed. This is the default value.
FILE_EXTENSION: specifies the user-defined file extension. This parameter is required only when multiple files are exported. It is supported only for CSV files. If this parameter is not specified, the file extension is determined by the file format. The default file extension for CSV files is.csv.
When TYPE = 'PARQUET', the following fields are included:
COMPRESSION: specifies the compression format of the PARQUET file. The default value is.parquet. The supported compression formats areUNCOMPRESSED(which specifies that the file is not compressed),SNAPPY,GZIP,BROTLI,ZSTD,LZ4, andLZ4_HADOOP. If this parameter is not specified, the default valueUNCOMPRESSEDtakes effect.ROW_GROUP_SIZE: specifies the size of the ROW GROUP in the PARQUET file. The unit is bytes. You can specify a numeric value or a string with a suffix ofB,K,M, orGto indicate bytes, kilobytes, megabytes, or gigabytes respectively. The default value is256MB. We recommend that you use the default value.
When TYPE = 'ORC', it contains the following fields:
COMPRESSION: specifies the compression format of the ORC file. The default value is.orc. The supported compression formats areUNCOMPRESSED(which specifies that the file is not compressed),SNAPPY,ZLIB,LZ4, andZSTD. If this parameter is not specified, the default valueUNCOMPRESSEDtakes effect.COMPRESSION_BLOCK_SIZE: specifies the size of the block into which the data is compressed. The unit is bytes. You can specify a numeric value or a string with a suffix ofB,K,M, orGto indicate bytes, kilobytes, megabytes, or gigabytes respectively. The default value is256KB. We recommend that you use the default value.STRIPE_SIZE: specifies the size of a Stripe in the ORC file. The unit is bytes. You can specify a numeric value or a string with a suffix ofB,K,M, orGto indicate bytes, kilobytes, megabytes, or gigabytes respectively. The default value is64MB. We recommend that you use the default value.ROW_INDEX_STRIDE: specifies the frequency of index records. It defines the number of rows between two consecutive index records. If this parameter is not specified, the default value10000takes effect. We recommend that you use the default value.
Examples
Export data to a local file
Set the export file path.
To export a file, you must set the system variable
secure_file_privto a path that allows access to the export file.Notice
For security reasons, when you set the system variable
secure_file_priv, you can only connect to the database using a local Unix socket to execute the SQL statement that modifies the global variable. For more information, see secure_file_priv.Log in to the OBServer node where you want to connect to OceanBase Database.
ssh admin@xxx.xxx.xxx.xxxConnect to the
oracle001tenant using the local Unix socket.obclient -S /home/admin/oceanbase/run/sql.sock -usys@oracle001 -p******Set the export path to
/home/admin/test_data.obclient> SET GLOBAL secure_file_priv = "/home/admin/test_data";Log out.
Reconnect to the database, and then use the
SELECT INTO OUTFILEstatement to export data. Use commas to separate values in different fields; wrap string values with"characters; and use line breaks as the end marker.Export data to a single file in serial mode. Specify the file name as
test_tbl1.csv.obclient> SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/test_tbl1.csv' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' FROM test_tbl1;The return result is as follows:
Query OK, 9 rows affectedWrite data to multiple files in parallel. Do not specify the file name, and ensure that the size of each file does not exceed 4 MB.
obclient> SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' SINGLE = FALSE MAX_FILE_SIZE = '4MB' FROM test_tbl1;The return result is as follows:
Query OK, 9 rows affectedWrite data to multiple files in parallel. Specify the file name prefix as
dd2024, and ensure that the size of each file does not exceed 4 MB.obclient> SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/dd2024' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' SINGLE = FALSE MAX_FILE_SIZE = '4MB' FROM test_tbl1;The return result is as follows:
Query OK, 9 rows affected
Note
- When multiple export tasks export to the same path simultaneously, issues such as errors or partial data export may occur. You can avoid this by properly setting the export paths.
For example:SELECT /*+parallel(2)*/ * INTO OUTFILE 'test/data' SINGLE = FALSE FROM t1;andSELECT /*+parallel(2)*/ * INTO OUTFILE 'test/data' SINGLE = FALSE FROM t2;executed simultaneously may fail due to identical export file names. It is recommended to set separate export paths, such astest/data1andtest/data2. - When
SINGLE = FALSE, and the export fails due to reasons such as "file already exists," you can clear all files in the export directory with the same prefix as the export target or delete and recreate the export directory before retrying the export operation.
For example:SELECT /*+parallel(2)*/ * INTO OUTFILE 'test/data' SINGLE = FALSE FROM t1;fails, you can delete all files with thedataprefix in thetestdirectory or directly delete and recreate thetestdirectory, then retry the export operation.
Log in to the server and view the exported files in the
/home/admin/test_datadirectory on the OBServer node.
[xxx@xxx /home/admin/test_data]# lsThe return result is as follows:
data_0_0_0 data_0_1_0 dd2024_0_0_0 dd2024_0_1_0 test_tbl1.csvIn this example,
test_tbl1.csvis the file name when data is exported to a single file in serial mode;data_0_0_0anddata_0_1_0are the file names when data is written to multiple files in parallel without specifying the file name;dd2024_0_0_0anddd2024_0_1_0are the file names when data is written to multiple files in parallel with the file name prefix set todd2024.Export data to OSS
You can execute the
SELECT INTO OUTFILEstatement to export data from thetest_tbl2table to a specified OSS storage location based on the partitions. The partitions are determined by the combination of thecol1andcol2columns. Rows with the same values in these two columns are grouped into the same partition and exported to the same directory.obclient> SELECT /*+parallel(3)*/ * INTO OUTFILE 'oss://$DATA_FOLDER_NAME/?host=$OSS_HOST&access_id=$OSS_ACCESS_ID&access_key=$OSS_ACCESS_KEY' PARTITION BY CONCAT(col1,'/',col2) SINGLE = FALSE BUFFER_SIZE = '2MB' FROM test_tbl2;The storage location is specified by the
$DATA_FOLDER_NAMEvariable. You also need to provide the host address, access ID, and access key of the OSS.References