Purpose
This statement is used to store query results in a variable or file. Specifically:
SELECT ... INTO OUTFILEstores the result set in an external file and allows you to specify the output format.Note
When using
SELECT ... INTO OUTFILE, you can use\Nto representNULL.SELECT ... INTO DUMPFILEwrites an unformatted single line to an external file.SELECT ... INTO var_liststores the result set in a variable.
Privilege requirements
To execute the SELECT INTO statement, you must have the FILE privilege and the SELECT privilege on the corresponding table. For more information about the privileges of OceanBase Database, see Privilege types in MySQL mode.
Here is an example:
To grant the FILE privilege to a user, you can use the following command:
GRANT FILE ON *.* TO user_name;
Here, user_name specifies the user to execute the SELECT INTO statement.
Syntax
select_stmt
INTO {OUTFILE 'file_name' [PARTITION BY part_expr] [{CHARSET | CHARACTER SET} charset_name] [field_opt] [line_opt] [file_opt]
| OUTFILE 'file_name' [PARTITION BY part_expr] [FORMAT = (external_file_format_list)] [file_opt]
| DUMPFILE 'file_name'
| into_var_list}
field_opt:
{COLUMNS | FIELDS} field_term_list
field_term_list:
field_term [, field_term ...]
field_term:
{[OPTIONALLY] ENCLOSED | TERMINATED | ESCAPED} BY string
line_opt:
LINES line_term_list
line_term_list:
line_term [line_term]
line_term:
{STARTING | TERMINATED} BY string
file_opt:
file_option [, file_option ...]
file_option:
SINGLE [=] {TRUE | FALSE}
| MAX_FILE_SIZE [=] {int | string}
| BUFFER_SIZE [=] {int | string}
external_file_format_list:
type_csv_option
| type_parquet_option
| type_orc_option
type_csv_option:
TYPE = 'CSV'
LINE_DELIMITER = '<string>' | <expr>
FIELD_DELIMITER = '<string>' | <expr>
ESCAPE = '<character>' | <expr>
FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>
FIELD_ENCLOSED_BY = '<character>' | <expr>
ENCODING = 'charset'
COMPRESSION = [NONE, GZIP, DEFLATE, ZSTD] | '<string>'
FILE_EXTENSION = ['<string>']
type_parquet_option:
TYPE = 'PARQUET',
COMPRESSION = '<string>'
ROW_GROUP_SIZE = '<string>' | <int>
type_orc_option:
TYPE = 'ORC'
COMPRESSION = '<string>'
COMPRESSION_BLOCK_SIZE = '<string>' | <int>
STRIPE_SIZE = '<string>' | <int>
ROW_INDEX_STRIDE = <int>
Parameters
| Parameter | Description |
|---|---|
| select_stmt | The query statement to be executed. select_stmt must return a result set. For more information about the syntax and options of the query statement, see SELECT.
NoteThe |
| file_name | The path and name of the exported file. For more information, see file_name. |
| PARTITION BY part_expr | An optional clause that specifies the partitioning method for the exported data. part_expr is used as part of the export path. For each row of data, the value of part_expr is calculated. Rows with the same value of part_expr are placed in the same partition and exported to the same directory.
Notice
|
| FORMAT = (external_file_format_list) | An optional clause that specifies the export file format. Use the TYPE option to specify the export file format. The TYPE option cannot be empty. For more information, see external_file_format_list. |
| CHARSET | CHARACTER SET charset_name | An optional clause that specifies the character set for the exported file. charset_name specifies the name of the character set. |
| field_opt | An optional clause that specifies the export field format. You can use the FIELDS or COLUMNS clause to specify the format of each field in the output file. For more information, see field_term. |
| line_opt | An optional clause that specifies the start and end delimiters for data rows. You can use the LINES clause to specify the start and end characters for each row in the output file. For more information, see line_term. |
| file_opt | An optional clause that specifies whether to export to multiple files and the size of each file when exporting to multiple files. For more information, see file_option. |
file_name
file_name has the following format:
When you save the exported file to an OBServer node, the format is:
/\$PATH/\$FILENAME, where the parameters are described as follows:\$PATH: specifies the path to save the exported file, that is, the path of the exported file on the OBServer node.\$FILENAME: specifies the name of the exported file. WhenSINGLE = FALSE, it specifies the prefix of the exported file. If you do not specify the prefix, the system generates a default prefixdata. The system automatically generates the suffix.
When you save the exported file to Alibaba Cloud OSS, the format is:
oss://\$PATH/\$FILENAME/?host=\$HOST&access_id=\$ACCESS_ID&access_key=\$ACCESSKEY, where the parameters are described as follows:\$PATH: specifies the path to save the exported file, that is, the path of the file in the bucket.\$FILENAME: specifies the name of the exported file. WhenSINGLE = FALSE, it specifies the prefix of the exported file. If you do not specify the prefix, the system generates a default prefixdata. The system automatically generates the suffix.\$HOST: specifies the host name or the CDN-accelerated domain name of the OSS service, that is, the address of the OSS service.\$ACCESS_ID: specifies the Access Key ID required to access the OSS service for identity verification.\$ACCESSKEY: specifies the Access Key Secret required to access the OSS service for identity verification.
Note
- OceanBase Database supports exporting data to S3 and object storage services based on the S3 protocol.
- Alibaba Cloud OSS has a file size limit. When you export files larger than 5 GB to OSS, the system splits the files into multiple files, each of which is smaller than 5 GB.
When you save the exported file to HDFS, the format is:
Notice
When you use the write capability of HDFS external tables, OceanBase Database must be deployed in an environment with the Java SDK. For more information about how to deploy the Java SDK environment, see Deploy the OceanBase Database Java SDK environment.
The export format for SIMPLE authentication:
Basic authentication (without Kerberos):
hdfs://\$NAMENODE:\$PORT/\$PATH/\$FILENAME[?username=\$USERNAME]Description:
- When you use basic authentication (SIMPLE), you can specify the HDFS username by using the
usernameparameter. - If you do not specify the
usernameparameter, the system uses the HDFS username defined in the OceanBase Database configuration.
- When you use basic authentication (SIMPLE), you can specify the HDFS username by using the
The export format for Kerberos authentication:
hdfs://\$NAMENODE:\$PORT/\$PATH/\$FILENAME?principal=\$PRINCIPAL&keytab=\$KEYTAB&krb5conf=\$KRB5CONF&configs=\$CONFIGSDescription:
- For Kerberos authentication, you must provide the Kerberos principal (
principal), keytab (keytab), and Kerberos configuration file (krb5conf). - In this scenario, the
usernameparameter is not applicable because Kerberos authentication identifies the user by using theprincipal.
- For Kerberos authentication, you must provide the Kerberos principal (
The parameters are described as follows:
Common parameters:
\$NAMENODE: specifies the IP address of the HDFS NameNode.\$PORT: specifies the port of the HDFS NameNode.\$PATH: specifies the storage path of the file in HDFS.\$FILENAME: specifies the name of the exported file.
Parameters for SIMPLE authentication:
?username=\$USERNAME: an optional parameter that specifies the HDFS username. If you do not specify theusernameparameter, the system uses the HDFS username defined in the OceanBase Database configuration.
Parameters for Kerberos authentication:
\$PRINCIPAL: specifies the Kerberos principal name. For example,principal=hdfs/hadoop@EXAMPLE.COM.\$KEYTAB: specifies the path of the Kerberos keytab file. For example,keytab=/path/to/hdfs.keytab.\$KRB5CONF: specifies the path of the Kerberos configuration file. For example,krb5conf=/path/to/krb5.conf.\$CONFIGS: specifies additional HDFS configuration parameters in the key-value pair format. For example,configs=dfs.data.transfer.protection=authentication,privacy.
The following examples show the value of
file_namewhen you save the exported file to HDFS:SIMPLE authentication (basic authentication):
Applies to HDFS clusters without Kerberos authentication.
Example 1 (default username):
hdfs://10.10.10.1:9000/data/export/sales_data.csvExample 2 (specified username):
hdfs://10.10.10.1:9000/data/export/sales_data.csv?username=hiveuser
Kerberos authentication:
Applies to HDFS clusters that require Kerberos authentication.
You can specify the query parameters as needed. Multiple configuration items are supported, such as
configs.Example:
hdfs://10.10.10.1:9000/data/export/sales_data.csv?principal=hdfs/hadoop@EXAMPLE.COM&keytab=/etc/security/keytabs/hdfs.keytab&krb5conf=/etc/krb5.conf&configs=dfs.data.transfer.protection=authentication,privacy
field_term
[OPTIONALLY] ENCLOSED BY string: specifies the character that wraps field values. By default, no quotation marks are used. For example,ENCLOSED BY '"'indicates that character values are enclosed in double quotation marks. If you use theOPTIONALLYkeyword, the specified character is used only for string values.TERMINATED BY string: specifies the character that separates field values. For example,TERMINATED BY ','indicates that a comma is used to separate two field values.ESCAPED BY string: specifies the escape character for handling special characters or parsing data in special formats. The default escape character is the backslash (\).
line_term
STARTING BY string: specifies the starting character of each line.TERMINATED BY string: specifies the ending character of each line. By default, a newline character is used. For example,... LINES TERMINATED BY '\n' ...indicates that a newline character is used as the ending character of a line.
file_option
SINGLE [=] {TRUE | FALSE}: specifies whether to export data to a single file or multiple files.SINGLE [=] TRUE: the default value, which indicates that data is exported to a single file.SINGLE [=] FALSE: indicates that data can be exported to multiple files.Notice
When the degree of parallelism is greater than 1 and
SINGLE = FALSE, data can be exported to multiple files, achieving parallel read and write and improving the export speed.
MAX_FILE_SIZE [=] {int | string}: specifies the size of a single file during export. This option is effective only whenSINGLE = FALSE.BUFFER_SIZE [=] {int | string}: specifies the size of memory allocated to each thread for each partition during export (considered as a single partition if there are no partitions). The default value is 1 MB.Note
BUFFER_SIZEis used for export performance tuning. If the machine has sufficient memory and you want to improve the export efficiency, set it to a larger value (e.g., 4 MB). If the machine has insufficient memory, set it to a smaller value (e.g., 4 KB). If set to 0, all partitions in a single thread use a shared memory block.
external_file_format_list
TYPE = 'CSV': specifies the following fields:
LINE_DELIMITER: specifies the line delimiter of the CSV file. The default value isLINE_DELIMITER='\n'.FIELD_DELIMITER: specifies the column delimiter of the CSV file. The default value isFIELD_DELIMITER='\t'.ESCAPE: specifies the escape character of the CSV file. It must be one byte. The default value isESCAPE ='\'.FIELD_OPTIONALLY_ENCLOSED_BY: specifies the character that encloses field values in the CSV file. The default value is an empty string. This option applies only to specific types of fields (e.g., CHAR, VARCHAR, TEXT, JSON).FIELD_ENCLOSED_BY: specifies the character that encloses field values in the CSV file. The default value is an empty string, indicating that no enclosure is applied. This option applies to all types of fields.ENCODING: specifies the character set encoding format of the file. If not specified, the default value isUTF8MB4.COMPRESSION: specifies the compression format of the file. Valid values: NONE, GZIP, DEFLATE, and ZSTD.GZIP/DEFLATE: GZIP compression.ZSTD: ZSTD compression.NONE: no compression (default value).
FILE_EXTENSION: specifies the user-defined file extension. This option is used only for multi-file exports and is supported only for CSV files. If not specified, the file extension is determined by the format type. The default file extension for CSV files is.csv.
TYPE = 'PARQUET': specifies the following fields:
COMPRESSION: specifies the compression format of the PARQUET file. The default compression suffix is.parquet. Valid values:UNCOMPRESSED(no compression),SNAPPY,GZIP,BROTLI,ZSTD,LZ4, andLZ4_HADOOP. If not specified, the default value isUNCOMPRESSED.ROW_GROUP_SIZE: specifies the size of a ROW GROUP in the PARQUET file, in bytes. You can specify a number or a string in the format of'64MB'. If not specified, the default value is256MB. We recommend that you use the default value.
TYPE = 'ORC': specifies the following fields:
COMPRESSION: specifies the compression format of the ORC file. The default compression suffix is.orc. Valid values:UNCOMPRESSED(no compression),SNAPPY,ZLIB,LZ4, andZSTD. If not specified, the default value isUNCOMPRESSED.COMPRESSION_BLOCK_SIZE: specifies the size of blocks into which data is divided during compression, in bytes. You can specify a number or a string in the format of'64KB'. If not specified, the default value is256KB. We recommend that you use the default value.STRIPE_SIZE: specifies the size of a Stripe in the ORC file, in bytes. You can specify a number or a string in the format of'64MB'. If not specified, the default value is64MB. We recommend that you use the default value.ROW_INDEX_STRIDE: specifies the frequency of index records. It defines how often an index is recorded. If not specified, the default value is10000. We recommend that you use the default value.
Examples
Export data to a local file
Set the export file path.
To export a file, you must set the system variable
secure_file_privto specify the path that can be accessed by the exported file.Notice
For security reasons, you can set the system variable
secure_file_privonly by using a local socket connection. For more information, see secure_file_priv.Log in to the OBServer node that connects to the OceanBase database.
ssh admin@xxx.xxx.xxx.xxxExecute the following command to connect to the
mysql001tenant by using a local Unix socket.obclient -S /home/admin/oceanbase/run/sql.sock -uroot@mysql001 -p******Set the export path to
/home/admin/test_data.SET GLOBAL secure_file_priv = "/home/admin/test_data";Log out.
After you reconnect to the database, use the
SELECT INTO OUTFILEstatement to export data. Specify a comma as the delimiter between two field values, enclose string values with the"character, and use a line break as the end marker.Export data to a single file named
test_tbl1.csv.SELECT /*+parallel(2)*/ * FROM test_tbl1 INTO OUTFILE '/home/admin/test_data/test_tbl1.csv' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';Export data to multiple files in parallel. Do not specify a file name (the default prefix is
data), and ensure that the size of each file does not exceed 4 MB.SELECT /*+parallel(2)*/ * FROM test_tbl1 INTO OUTFILE '/home/admin/test_data/' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' SINGLE = FALSE MAX_FILE_SIZE = '4MB';Export data to multiple files in parallel. Specify the prefix of the file name as
dd2024, and ensure that the size of each file does not exceed 4 MB.SELECT /*+parallel(2)*/ * FROM test_tbl1 INTO OUTFILE '/home/admin/test_data/dd2024' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' SINGLE = FALSE MAX_FILE_SIZE = '4MB';
Note
- When multiple export tasks are performed to the same path, errors or partial data export may occur. You can avoid these issues by setting the export path appropriately.
For example:SELECT /*+parallel(2)*/ * FROM t1 INTO OUTFILE 'test/data' SINGLE = FALSE;andSELECT /*+parallel(2)*/ * FROM t2 INTO OUTFILE 'test/data' SINGLE = FALSE;may cause errors due to the same file name when executed simultaneously. We recommend that you set the export paths totest/data1andtest/data2. - When
SINGLE = FALSEand the export fails due to reasons such as the file already existing, you can delete all files with the same prefix as the export target in the export directory, or delete the export directory and recreate it, and then retry the export operation.
For example:SELECT /*+parallel(2)*/ * FROM t1 INTO OUTFILE 'test/data' SINGLE = FALSE;failed. In this case, you can delete all files with thedataprefix in thetestdirectory, or directly delete thetestdirectory and recreate it, and then retry the export operation.
Log in to the server and check the exported file information in the
/home/admin/test_datadirectory of the OBServer node.
[xxx@xxx /home/admin/test_data]# ls
The return result is as follows:
data_0_0_0 data_0_1_0 dd2024_0_0_0 dd2024_0_1_0 test_tbl1.csv
In the result, test_tbl1.csv is the file name of the data exported to a single file in serial mode. data_0_0_0 and data_0_1_0 are the file names of the data exported to multiple files in parallel without specifying the file name. dd2024_0_0_0 and dd2024_0_1_0 are the file names of the data exported to multiple files in parallel with the prefix dd2024.
Export data to an OSS bucket
Use the SELECT INTO OUTFILE statement to export data from the test_tbl2 table to the specified OSS storage location by partition. The partitioning is based on the combination of the col1 and col2 columns. Rows with the same values in these columns are stored in the same partition and exported to the same directory.
SELECT /*+parallel(3)*/ * FROM test_tbl2
INTO OUTFILE 'oss://$DATA_FOLDER_NAME/?host=$OSS_HOST&access_id=$OSS_ACCESS_ID&access_key=$OSS_ACCESS_KEY'
PARTITION BY CONCAT(col1,'/',col2)
SINGLE = FALSE BUFFER_SIZE = '2MB';
The storage location is specified by the $DATA_FOLDER_NAME variable. You also need to provide the host address, Access ID, and Access Key of the OSS bucket.
Export data to HDFS
Notice
To write data to HDFS, OceanBase Database must be deployed in an environment with the Java SDK. For more information about how to deploy the Java SDK environment, see Deploy OceanBase Database in a Java SDK environment.
Export data with SIMPLE authentication
Export data from the lineitem_external table of OceanBase Database to HDFS and store it in CSV file format.
SELECT * FROM lineitem_external
INTO OUTFILE 'hdfs://${namenode}:${port}/path/to/lineitem.csv'
FORMAT = (
TYPE = 'CSV'
LINE_DELIMITER = '\n'
FIELD_DELIMITER = ','
FIELD_OPTIONALLY_ENCLOSED_BY ='"'
)
SINGLE = TRUE;
Export data with KERBEROS authentication
Unlike SIMPLE authentication, KERBEROS authentication requires additional configuration files and key files. Export data from the lineitem_external table of OceanBase Database to HDFS and store it in CSV file format.
SELECT * FROM lineitem_external
INTO OUTFILE 'hdfs://${namenode}:${port}/path/to/lineitem.csv?principal=hdfs/hadoop@EXAMPLE.COM&keytab=/path/to/hdfs.keytab&krb5conf=/path/to/krb5.conf&configs=dfs.data.transfer.protection=authentication,privacy'
FORMAT = (
TYPE = 'CSV'
LINE_DELIMITER = '\n'
FIELD_DELIMITER = ','
FIELD_OPTIONALLY_ENCLOSED_BY ='"'
)
SINGLE = TRUE;
