Purpose
You can use this statement to store query results in a variable, a file, or a table. Here are the details:
SELECT ... INTO OUTFILEstores the result set in an external file, and you can specify the output format.Note
When you use
SELECT ... INTO OUTFILEto output data, you can use\Nto representNULL.SELECT ... INTO DUMPFILEwrites an unformatted single line to an external file.SELECT ... INTO var_liststores the result set in variables.
Privilege requirements
To execute the SELECT INTO statement, you must have the FILE privilege and the SELECT privilege on the corresponding table. For more information about OceanBase Database privileges, see Privilege types in MySQL-compatible mode.
Here is an example:
To grant the FILE privilege to a user, use the following command format:
GRANT FILE ON *.* TO user_name;
In this command, user_name specifies the user who needs to execute the SELECT INTO statement.
Syntax
select_stmt
INTO {OUTFILE 'file_name' [PARTITION BY part_expr] [{CHARSET | CHARACTER SET} charset_name] [field_opt] [line_opt] [file_opt]
| OUTFILE 'file_name' [PARTITION BY part_expr] [FORMAT = (external_file_format_list)] [file_opt]
| DUMPFILE 'file_name'
| into_var_list};
field_opt:
{COLUMNS | FIELDS} field_term_list
field_term_list:
field_term [, field_term ...]
field_term:
{[OPTIONALLY] ENCLOSED | TERMINATED | ESCAPED} BY string
line_opt:
LINES line_term_list
line_term_list:
line_term [line_term]
line_term:
{STARTING | TERMINATED} BY string
file_opt:
file_option [, file_option ...]
file_option:
SINGLE [=] {TRUE | FALSE}
| MAX_FILE_SIZE [=] {int | string}
| BUFFER_SIZE [=] {int | string}
external_file_format_list:
type_csv_option
| type_parquet_option
| type_orc_option
type_csv_option:
TYPE = 'CSV'
LINE_DELIMITER = '<string>' | <expr>
FIELD_DELIMITER = '<string>' | <expr>
ESCAPE = '<character>' | <expr>
FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>
FIELD_ENCLOSED_BY = '<character>' | <expr>
ENCODING = 'charset'
COMPRESSION = [NONE, GZIP, DEFLATE, ZSTD] | '<string>'
FILE_EXTENSION = ['<string>']
type_parquet_option:
TYPE = 'PARQUET',
COMPRESSION = '<string>'
ROW_GROUP_SIZE = '<string>' | <int>
type_orc_option:
TYPE = 'ORC'
COMPRESSION = '<string>'
COMPRESSION_BLOCK_SIZE = '<string>' | <int>
STRIPE_SIZE = '<string>' | <int>
ROW_INDEX_STRIDE = <int>
Parameters
| Parameter | Description |
|---|---|
| select_stmt | The query statement to be executed. select_stmt must return a result set. For more information about the structure and options of the query statement, see SELECT.
NoteYou can also specify the |
| file_name | The path and name of the exported file. For more information, see file_name. |
| PARTITION BY part_expr |
NoteFor OceanBase Database V4.3.2, the partitioning method for exported data is controllable since V4.3.2 BP1. part_expr serves as part of the export path. For each row of data, the system calculates the value of part_expr. Rows with the same value of part_expr are considered to be in the same partition and exported to the same directory.
Note
|
| FORMAT = (external_file_format_list) | Optional. The FORMAT clause specifies attributes related to the export file format. Use TYPE to specify the exported file format, which can be CSV, PARQUET, or ORC. TYPE cannot be left empty. For more information, see external_file_format_list. |
| CHARSET | CHARACTER SET charset_name | Optional. Specifies the character set to export data to an external file. charset_name indicates the name of the character set. |
| field_opt | Optional. Specifies the format of fields. You can specify it by using the FIELDS or COLUMNS clause. For more information, see field_term. |
| line_opt | Optional. Specifies the start and end characters of data rows. You can specify it by using the LINES clause. For more information, see line_term. |
| file_opt | Optional. Specifies whether to export data to multiple files and the size of each file when data is exported to multiple files. For more information, see file_option. |
file_name
The file_name parameter is in the following format:
When you export a file to an OBServer node, use the format of
\$PATH/\$FILENAME. The parameters are described as follows:\$PATH: specifies the path to save the exported file, which is the path of the exported file on the OBServer node.\$FILENAME: the name of the file to be exported. WhenSINGLE = FALSE, it specifies the prefix of the exported file. If you do not specify the\$FILENAMEparameter, the system sets the prefix of the exported file to the default valuedata. The system automatically generates the file extension based on the export type.
When you export a file to OSS, the save path is in the format of
oss://\$PATH/\$FILENAME/?host=\$HOST&access_id=\$ACCESS_ID&access_key=\$ACCESSKEY. The parameters are described as follows:\$PATH: the path to save the exported file, which specifies the file path in the bucket.\$FILENAME: the name of the file to be exported. WhenSINGLE = FALSE, it specifies the prefix of the exported file. If you do not specify the name, the system uses the default prefixdata. The system automatically generates the suffix.\$HOST: specifies the endpoint of the OSS service or the CDN domain name for accelerated access to the OSS service.\$ACCESS_ID: specifies the Access Key ID required for accessing OSS service, which is used for authentication.\$ACCESSKEY: specifies the AccessKey Secret required for accessing the OSS service.
Note
- For OceanBase Database V4.3.5,
SELECT INTOcan be used to export data to objects stored on S3 and objects stored on services that comply with the S3 protocol, starting from V4.3.5 BP2. - Alibaba Cloud's Object Storage Service (OSS) has a file size limit. If you want to export a file larger than 5 GB to OSS, the file will be split into multiple files, each smaller than 5 GB.
field_term
[OPTIONALLY] ENCLOSED BY string: specifies the characters that enclose field values. By default, values are not enclosed. For example,ENCLOSED BY '"'indicates that character values are enclosed in double quotation marks. If theOPTIONALLYkeyword is used, the specified characters are used to enclose only string values.TERMINATED BY string: specifies the symbol that separates field values. For example,TERMINATED BY ','uses a comma to separate two field values.ESCAPED BY string: specifies the escape character to process special characters or parse data in a special format. The default escape character is the backslash (\).
line_term
STARTING BY string: specifies the starting character of each row.TERMINATED BY string: specifies the ending character of each line. By default, a line is terminated by a line break. For example,... LINES TERMINATED BY '\n' ...indicates that each line is terminated by a line break.
file_option
SINGLE [=] {TRUE | FALSE}: specifies whether to export data to a single file or multiple files.SINGLE [=] TRUE: specifies to export data to a single file. This is the default value.SINGLE [=] FALSE: specifies to export data to multiple files.Notice
When the degree of parallelism is greater than 1 and
SINGLE = FALSE, data can be exported to multiple files to achieve parallel read and write and improve the export speed.
MAX_FILE_SIZE [=] {int | string}: specifies the maximum size of a single file when data is exported to multiple files. This parameter is effective only whenSINGLE = FALSE.BUFFER_SIZE [=] {int | string}: specifies the size of memory that is separately applied for each partition by each thread when data is exported. The value of this parameter represents the size of memory applied for a single partition when data is exported without partitioning. The default value is 1 MB.Note
BUFFER_SIZEis used for export performance tuning. If sufficient physical memory is available and high export efficiency is desired, you can set this parameter to a large value (for example, 4 MB). If insufficient physical memory is available, you can set this parameter to a small value (for example, 4 KB). If you set this parameter to 0, all partitions of a single thread share a public memory.- Starting from OceanBase Database V4.3.2 BP1, the
BUFFER_SIZEparameter is supported.
external_file_format_list
When TYPE = 'CSV', the file contains the following columns:
LINE_DELIMITER: specifies the line delimiter of the CSV file. The default value is'\n'.FIELD_DELIMITER: specifies the field delimiter of the CSV file. The default value is'\t'.ESCAPE: specifies the escape character of the CSV file. It must be 1 byte in length. The default value is'\'.FIELD_OPTIONALLY_ENCLOSED_BY: specifies the characters that enclose the field values of the CSV file. This parameter is optional. The default value is an empty string. This option allows you to add enclosing characters only to specific types of fields, such as CHAR, VARCHAR, TEXT, and JSON.FIELD_ENCLOSED_BY: specifies the characters that enclose the field values of the CSV file. This parameter is optional. The default value is an empty string, which specifies not to add enclosing characters. This option adds enclosing characters to all types of fields.ENCODING: specifies the character set encoding format of the file. If this parameter is not specified, the default valueUTF8MB4takes effect.COMPRESSION: specifies the compression format of the file. This parameter is optional. The supported compression formats are NONE, GZIP, DEFLATE, and ZSTD.GZIP/DEFLATE: specifies that the file is a GZIP-compressed file.ZSTD: specifies that the file is a ZSTD-compressed file.NONE: specifies that the file is not compressed. This is the default value.
FILE_EXTENSION: specifies the user-defined file extension. This parameter is required only when multiple files are exported. It is supported only for CSV files. If this parameter is not specified, the file extension is determined by the file format. The default file extension for CSV files is.csv.
When TYPE = 'PARQUET', the following fields are included:
COMPRESSION: specifies the compression format of the PARQUET file. The default value is.parquet. The supported compression formats areUNCOMPRESSED(which specifies that the file is not compressed),SNAPPY,GZIP,BROTLI,ZSTD,LZ4, andLZ4_HADOOP. If this parameter is not specified, the default valueUNCOMPRESSEDtakes effect.ROW_GROUP_SIZE: specifies the size of the ROW GROUP in the PARQUET file. The unit is bytes. You can specify a numeric value or a string with a suffix ofB,K,M, orGto indicate bytes, kilobytes, megabytes, or gigabytes respectively. The default value is256MB. We recommend that you use the default value.
When TYPE = 'ORC', it contains the following fields:
COMPRESSION: specifies the compression format of the ORC file. The default value is.orc. The supported compression formats areUNCOMPRESSED(which specifies that the file is not compressed),SNAPPY,ZLIB,LZ4, andZSTD. If this parameter is not specified, the default valueUNCOMPRESSEDtakes effect.COMPRESSION_BLOCK_SIZE: specifies the size of the block into which the data is compressed. The unit is bytes. You can specify a numeric value or a string with a suffix ofB,K,M, orGto indicate bytes, kilobytes, megabytes, or gigabytes respectively. The default value is256KB. We recommend that you use the default value.STRIPE_SIZE: specifies the size of a Stripe in the ORC file. The unit is bytes. You can specify a numeric value or a string with a suffix ofB,K,M, orGto indicate bytes, kilobytes, megabytes, or gigabytes respectively. The default value is64MB. We recommend that you use the default value.ROW_INDEX_STRIDE: specifies the frequency of index records. It defines the number of rows between two consecutive index records. If this parameter is not specified, the default value10000takes effect. We recommend that you use the default value.
Examples
Export data to a local file
Set the export file path.
To export a file, you must set the system variable
secure_file_privto a path that allows access to the export file.Notice
For security reasons, when you set the system variable
secure_file_priv, you can only connect to the database using a local Unix socket to execute the SQL statement that modifies the global variable. For more information, see secure_file_priv.Log in to the OBServer node where you want to connect to OceanBase Database.
ssh admin@xxx.xxx.xxx.xxxConnect to the
mysql001tenant using the local Unix socket.obclient -S /home/admin/oceanbase/run/sql.sock -uroot@mysql001 -p******Set the export path to
/home/admin/test_data.SET GLOBAL secure_file_priv = "/home/admin/test_data";Log out.
Reconnect to the database, and then use the
SELECT INTO OUTFILEstatement to export data. Use commas to separate values in different fields; wrap string values with"characters; and use line breaks as the end marker.Export data to a single file in serial mode. Specify the file name as
test_tbl1.csv.SELECT /*+parallel(2)*/ * FROM test_tbl1 INTO OUTFILE '/home/admin/test_data/test_tbl1.csv' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';The return result is as follows:
Query OK, 9 rows affectedWrite data to multiple files in parallel. Do not specify the file name (that is, use the default prefix
data), and ensure that the size of each file does not exceed 4 MB.SELECT /*+parallel(2)*/ * FROM test_tbl1 INTO OUTFILE '/home/admin/test_data/' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' SINGLE = FALSE MAX_FILE_SIZE = '4MB';The return result is as follows:
Query OK, 9 rows affectedWrite data to multiple files in parallel. Specify the file name prefix as
dd2024, and ensure that the size of each file does not exceed 4 MB.SELECT /*+parallel(2)*/ * FROM test_tbl1 INTO OUTFILE '/home/admin/test_data/dd2024' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' SINGLE = FALSE MAX_FILE_SIZE = '4MB';The return result is as follows:
Query OK, 9 rows affected
Note
- When multiple export tasks are exporting data to the same path at the same time, errors may occur or only a part of data may be exported. You can avoid such issues by properly setting the export path. For example,
SELECT /*+parallel(2)*/ * FROM t1 INTO OUTFILE 'test/data' SINGLE = FALSE;andSELECT /*+parallel(2)*/ * FROM t2 INTO OUTFILE 'test/data' SINGLE = FALSE;may both be executed at the same time, but they may fail because they have the same file name. In this case, you can set the export path totest/data1andtest/data2respectively to avoid such issues. - When
SINGLE = FALSE, if the export fails for reasons such as file already exist, you can delete all files with the same prefix as the export target in the export directory, or delete the export directory and recreate it, and then perform the export operation again. For example, ifSELECT /*+parallel(2)*/ * FROM t1 INTO OUTFILE 'test/data' SINGLE = FALSE;fails, you can delete all files with thedataprefix in thetestdirectory, or directly delete thetestdirectory and recreate it, and then try to perform the export operation again.
Log in to the server and view the exported files in the
/home/admin/test_datadirectory on the OBServer node.
[xxx@xxx /home/admin/test_data]# lsThe return result is as follows:
data_0_0_0 data_0_1_0 dd2024_0_0_0 dd2024_0_1_0 test_tbl1.csvIn this example,
test_tbl1.csvis the file name when data is exported to a single file in serial mode;data_0_0_0anddata_0_1_0are the file names when data is written to multiple files in parallel without specifying the file name;dd2024_0_0_0anddd2024_0_1_0are the file names when data is written to multiple files in parallel with the file name prefix set todd2024.Export data to OSS
You can execute the
SELECT INTO OUTFILEstatement to export data from thetest_tbl2table to a specified OSS storage location based on the partitions. The partitions are determined by the combination of thecol1andcol2columns. Rows with the same values in these two columns are grouped into the same partition and exported to the same directory.SELECT /*+parallel(3)*/ * FROM test_tbl2 INTO OUTFILE 'oss://$DATA_FOLDER_NAME/?host=$OSS_HOST&access_id=$OSS_ACCESS_ID&access_key=$OSS_ACCESS_KEY' PARTITION BY CONCAT(col1,'/',col2) SINGLE = FALSE BUFFER_SIZE = '2MB';The storage location is specified by the
$DATA_FOLDER_NAMEvariable. You also need to provide the host address, access ID, and access key of the OSS.References