The SELECT INTO OUTFILE statement is a common method for exporting data. It allows you to specify the fields to be exported, which is useful in scenarios where you don't want to export the primary key fields. When used in conjunction with the LOAD DATA INFILE statement for importing data, it provides a convenient way to import and export data.
Background information
OceanBase Database is compatible with this syntax.
| Mode | Recommended OceanBase Database version | Recommended client |
|---|---|---|
| MySQL mode | V2.2.40 and later | MySQL Client or OBClient |
| Oracle mode | V2.2.40 and later | OBClient |
Notice
The client must be directly connected to the OceanBase Database instance for import and export operations.
Privilege requirements
To execute the
SELECT INTOstatement in a MySQL tenant, you must have theFILEprivilege and theSELECTprivilege on the table. You can use the following command to grant theFILEprivilege to a user:GRANT FILE ON *.* TO user_name;In the statement,
user_nameis the username of the account that executes theSELECT INTOstatement. For more information about privileges in OceanBase Database in MySQL mode, see Privilege types in MySQL-compatible mode.To execute the
SELECT INTOstatement in an Oracle tenant, you must have theSELECTprivilege on the table. For more information about OceanBase Database privileges, see Privilege types in Oracle-compatible mode.
Syntax
SELECT [/*+parallel(N)*/] column_list_option
INTO {OUTFILE 'file_name' [PARTITION BY part_expr] [{CHARSET | CHARACTER SET} charset_name] [field_opt] [line_opt] [file_opt]
| OUTFILE 'file_name' [PARTITION BY part_expr] [FORMAT = (external_file_format_list)] [file_opt]
| DUMPFILE 'file_name'
| into_var_list}
FROM table_name_list
[WHERE where_conditions]
[GROUP BY group_by_list [HAVING having_search_conditions]]
[ORDER BY order_expression_list];
column_list_option:
column_name [, column_name ...]
field_opt:
{COLUMNS | FIELDS} field_term_list
field_term_list:
field_term [, field_term ...]
field_term:
{[OPTIONALLY] ENCLOSED | TERMINATED | ESCAPED} BY string
line_opt:
LINES line_term_list
line_term_list:
line_term [, line_term ...]
line_term:
{STARTING | TERMINATED} BY string
file_opt:
file_option [, file_option ...]
file_option:
SINGLE [=] {TRUE | FALSE}
| MAX_FILE_SIZE [=] {int | string}
| BUFFER_SIZE [=] {int | string}
external_file_format_list:
type_csv_option
| type_parquet_option
| type_orc_option
type_csv_option:
TYPE = 'CSV'
LINE_DELIMITER = '<string>' | <expr>
FIELD_DELIMITER = '<string>' | <expr>
ESCAPE = '<character>' | <expr>
FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>
FIELD_ENCLOSED_BY = '<character>' | <expr>
ENCODING = 'charset'
COMPRESSION = [NONE, GZIP, DEFLATE, ZSTD] | '<string>'
FILE_EXTENSION = ['<string>']
type_parquet_option:
TYPE = 'PARQUET',
COMPRESSION = '<string>'
ROW_GROUP_SIZE = '<string>' | <int>
type_orc_option:
TYPE = 'ORC'
COMPRESSION = '<string>'
COMPRESSION_BLOCK_SIZE = '<string>' | <int>
STRIPE_SIZE = '<string>' | <int>
ROW_INDEX_STRIDE = <int>
Parameters
| Parameter | Description |
|---|---|
| parallel(N) | Optional. Specifies the degree of parallelism for executing the statement. |
| column_list_option | Specifies the column options. You can use * to select all columns. column_name: the name of the column. For more information about column options in queries, see SIMPLE SELECT. |
| file_name | The path and name of the exported file. For more information, see file_name. |
| PARTITION BY part_expr |
NoteFor OceanBase Database V4.3.2, the PARTITION BY clause has been added in V4.3.2 BP1 to control the partitioning method for exported data. part_expr serves as a part of the export path. The system calculates the value of part_expr for each row of data. Rows of data for which the value of part_expr is the same belong to the same partition and are exported to the same directory.
Notice
|
| FORMAT = (external_file_format_list) | Optional. Specifies the file format for the statement in the FORMAT clause. Use TYPE to specify the file format. TYPE cannot be left empty. For more information, see external_file_format_list. |
| CHARSET | CHARACTER SET charset_name | Optional. Specifies the character set for exporting data to an external file. charset_name indicates the name of the character set. |
| field_opt | Optional. Specifies the field options for exporting data. You can use the FIELDS or COLUMNS clause to specify field options. For more information, see field_term. |
| line_opt | Optional. Specifies the start and end characters of each line in the exported file. You can set the start and end characters of each line in the output file by using the LINES clause. For more information, see line_term. |
| file_opt | Optional. Specifies whether to export data to multiple files and the size of each file when data is exported to multiple files. For more information, see file_option. |
| FROM table_name_list | Specifies the objects from which to select data. |
| WHERE where_conditions | Optional. Specifies the filter conditions. Only data that meets the conditions is returned in the query result. For more information about filters in queries, see SIMPLE SELECT. |
| GROUP BY group_by_list | Optional. Specifies the grouping columns. Generally, it is used in combination with aggregate functions.
NoteAll columns that are not followed by aggregate functions in the columns specified after the |
| HAVING having_search_conditions | Optional. Specifies the filter conditions for grouped data. The HAVING clause is similar to the WHERE clause, but the HAVING clause can use cumulative functions such as SUM and AVG. |
| ORDER BY order_expression_list | Optional. Specifies the one or more columns to sort the result set in ascending (ASC) or descending (DESC) order. The default order is ascending.
|
file_name
The file_name parameter has the following format:
When the export file is stored on an OBServer node, specify the path and file name in the format of
\$PATH/\$FILENAME. The parameters are described as follows:$PATH: specifies the path to save the exported file, which is the path of the export file on the OBServer node.\$FILENAME: the name of the file to be exported. WhenSINGLE = FALSE, it specifies the prefix of the exported file. If you do not specify this parameter, the system uses the default prefixdata. The system automatically generates the suffix.
When an exported file is saved to OSS, the file path is in the
oss://\$PATH/\$FILENAME/?host=\$HOST&access_id=\$ACCESS_ID&access_key=\$ACCESSKEYformat, where:\$PATH: the path of the bucket to which the exported file is saved.\$FILENAME: the name of the exported file.\$HOST: the address of the OSS endpoint.\$ACCESS_ID: the Access ID of the ApsaraDB for OSS account.\$ACCESSKEY: the Access Key of the ApsaraDB for OSS account.Note
- For OceanBase Database V4.3.5,
SELECT INTOhas been adapted to support S3 and S3 protocol-based object storage as the destination for data export starting from V4.3.5 BP2. - Due to file size limitations in Alibaba Cloud OSS, files larger than 5 GB will be split into multiple files, with each file being smaller than 5 GB when exported to OSS.
- For OceanBase Database V4.3.5,
field_term
[OPTIONALLY] ENCLOSED BY string: Specifies the symbol that encloses field values. By default, there is no enclosing symbol. For example,ENCLOSED BY '"'means that character values are enclosed in double quotes. If theOPTIONALLYkeyword is used, the specified enclosing character is applied only to string-type values.TERMINATED BY string: Specifies the delimiter between field values. For example,TERMINATED BY ','specifies a comma as the delimiter between two field values.ESCAPED BY string: Specifies the escape character for handling special characters or parsing specially formatted data. The default escape character is a backslash (\).
line_term
TERMINATED BY string: Specifies the end character for each line. By default, a newline character is used. For example,... LINES TERMINATED BY '\n' ...indicates that each line ends with a newline character.
file_option
SINGLE [=] {TRUE | FALSE}: specifies whether to export data to a single file or multiple files.SINGLE [=] TRUE: specifies to export data to a single file. This is the default value.SINGLE [=] FALSE: specifies to export data to multiple files.Notice
When the degree of parallelism is greater than 1 and
SINGLE = FALSE, data can be exported to multiple files to achieve parallel reading, parallel writing, and improved export efficiency.
MAX_FILE_SIZE [=] {int | string}: specifies the maximum size of a file when data is exported to multiple files. This parameter is effective only whenSINGLE = FALSE.BUFFER_SIZE [=] {int | string}: specifies the size of memory applied for each thread for each partition when data is exported. This parameter is not effective for non-partitioned tables. The default value is 1 MB.Note
BUFFER_SIZEis used for export performance tuning. If sufficient memory is available on the server and you want to improve the export efficiency, you can set a large value (for example, 4 MB). If the memory is insufficient, you can set a small value (for example, 4 KB). If you set it to 0, all partitions of a thread share a public memory.- Starting from OceanBase Database V4.3.2 BP1, the
BUFFER_SIZEparameter is supported.
external_file_format_list
When TYPE = 'CSV', the fields included are as follows:
LINE_DELIMITER: specifies the line delimiter of the CSV file. The default value is'\n'.FIELD_DELIMITER: specifies the field delimiter of the CSV file. The default value is'\t'.ESCAPE: specifies the escape character of the CSV file. The value must be 1 byte in length. The default value is'\'.FIELD_OPTIONALLY_ENCLOSED_BY: specifies the characters that enclose the field values in the CSV file. This parameter is optional. The default value is an empty string. This option allows you to add enclosing characters only to specific types of fields, such as CHAR, VARCHAR, TEXT, and JSON.FIELD_ENCLOSED_BY: specifies the characters that enclose the field values in the CSV file. This parameter is optional. The default value is an empty string, which specifies not to add enclosing characters. This option adds enclosing characters to all types of fields.ENCODING: specifies the character set encoding format of the file. If this parameter is not specified, the default valueUTF8MB4takes effect.COMPRESSION: specifies the compression format of the file. This parameter is optional. The supported compression formats are NONE, GZIP, DEFLATE, and ZSTD.GZIP/DEFLATE: specifies the GZIP compression format.ZSTD: specifies the ZSTD compression format.NONE: specifies that the file is not compressed. This is the default value.
FILE_EXTENSION: specifies the user-defined file extension. This parameter is optional. It is supported only for multi-file export and applies only to CSV files. If this parameter is not specified, the file extension is determined by the file format. The default file extension for CSV files is.csv.
When TYPE = 'PARQUET', the following fields are included:
COMPRESSION: specifies the compression format of the PARQUET file. The default value is.parquet. The supported compression formats areUNCOMPRESSED(which specifies that the file is not compressed),SNAPPY,GZIP,BROTLI,ZSTD,LZ4, andLZ4_HADOOP. If this parameter is not specified, the default valueUNCOMPRESSEDtakes effect.COMPRESSION: specifies the compression format of the ORC file. The default value is.orc. The supported compression formats areUNCOMPRESSED(which specifies that the file is not compressed),SNAPPY,ZLIB,LZ4, andZSTD. If this parameter is not specified, the default valueUNCOMPRESSEDtakes effect.
COMPRESSION_BLOCK_SIZE: specifies the size of the block into which the data is compressed, in bytes. You can specify a numeric value or a string in the format of'64KB'. If this parameter is not specified, the default value256KBtakes effect. We recommend that you use the default value.STRIPE_SIZE: specifies the size of a stripe in the ORC file, in bytes. You can specify a numeric value or a string in the format of'64MB'. If this parameter is not specified, the default value64MBtakes effect. We recommend that you use the default value.
Considerations
If multiple export tasks are exporting data to the same path at the same time, errors may occur and only a part of the data may be exported. You can avoid this issue by properly setting the export path. For example:
SELECT /*+parallel(2)*/ * INTO OUTFILE 'test/data' SINGLE = FALSE FROM t1;SELECT /*+parallel(2)*/ * INTO OUTFILE 'test/data' SINGLE = FALSE FROM t2;If the export files have the same name, errors may occur. We recommend that you set the export path to
test/data1andtest/data2.If
SINGLE = FALSEand the export fails because the file already exists, you can delete all files in the export directory that have the same prefix as the export target, or delete the export directory and recreate it, and then re-export the data. For exampleSELECT /*+parallel(2)*/ * INTO OUTFILE 'test/data' SINGLE = FALSE FROM t1;After the failure, you can delete all files with the
dataprefix in thetestdirectory, or directly delete thetestdirectory and recreate it, and then try to export the data again.Compression applies to data blocks in Parquet and ORC files, not to the entire files. Therefore, only the CSV format supports compression suffixes.
When you export a compressed CSV file, the file names are different whether you export single files or multiple files.
If you export multiple files (
SINGLE = FALSE), and a compression algorithm is specified, use the suffix corresponding to the compression algorithm. The suffix is placed at the end of the file name. For example, if the file name of the exported file isdata_1_0_1and the specified compression algorithm isgzip, the file name isdata_1_0_1.gz.If you export a single file (
SINGLE = TRUE), the output file name is the same as the specified file name. That is, if a compression suffix is specified, it is also included in the output file name. If no compression suffix is specified, the suffix is not output.
The logical order of the
FILE_EXTENSIONfield when you export a CSV file is as follows:If you specify both the
COMPRESSIONandFILE_EXTENSIONparameters, the suffix order is.file_extension.compression. For example, if the file name of the exported file isdata_0_0_1, theFILE_EXTENSIONparameter is set toxls, and theCOMPRESSIONparameter is set togzip, the file name isdata_0_0_1.xls.gzwhen multiple files are exported.Regardless of whether the value of the
FILE_EXTENSIONparameter starts with a., the output file name contains only one.. For example, if the file name of the exported file isdataand theFILE_EXTENSIONparameter is set toxlsor.xls, the file name isdata.xls.
Example
Export data to a local file
Example 1: Use the SELECT INTO OUTFILE statement to export a CSV file
Log in to the server where the OBServer node to connect to resides.
Go to the server where the OBServer node to connect to resides.
[xxx@xxx /home/admin/test_data]# ssh admin@xxx.xxx.xxx.xxxSet the file path for importing data.
Set the system variable
secure_file_privto specify the path where you can access files for importing or exporting.Notice
For security reasons, when you set the system variable
secure_file_priv, you can connect to the database only through a local Unix socket to execute the SQL statement that modifies the global variable. For more information, see secure_file_priv.Log in to the server where the OBServer node to connect to resides.
[xxx@xxx /home/admin/test_data]# ssh admin@xxx.xxx.xxx.xxxConnect to the
mysql001tenant through a local Unix socket.obclient -S /home/admin/oceanbase/run/sql.sock -uroot@mysql001 -p******Set the import path to
/home/admin/test_data.obclient [test]> SET GLOBAL secure_file_priv = "/home/admin/test_data"; Query OK, 0 rows affected
After reconnecting to the database, use the
SELECT INTO OUTFILEstatement to export data.Export data to a single file in serial mode. Specify the file name as
test_tbl1.csv, the character set asutf8mb4, and the compression format asgzip.obclient [test]>SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/test_tbl1.csv' FORMAT = (TYPE = 'CSV', FIELD_DELIMITER = ',', ENCODING = 'utf8mb4', COMPRESSION = gzip) FROM tbl1;The return result is as follows:
Query OK, 9 rows affectedWrite data to multiple files in parallel. Do not specify the file name, and set the maximum size of each file to 4 MB. Specify the character set as
utf8mb4, and the compression format asgzip.obclient [test]>SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/' FORMAT = (TYPE = 'CSV', FIELD_DELIMITER = ',', ENCODING = 'utf8mb4', COMPRESSION = gzip) SINGLE = FALSE MAX_FILE_SIZE = '4MB' FROM tbl1;The return result is as follows:
Query OK, 9 rows affectedWrite data to multiple files in parallel. Specify the file name prefix as
dd2024, set the maximum size of each file to 4 MB, specify the character set asutf8mb4, and the compression format asgzip.obclient [test]>SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/dd2024' FORMAT = (TYPE = 'CSV', FIELD_DELIMITER = ',', ENCODING = 'utf8mb4', COMPRESSION = gzip) SINGLE = FALSE MAX_FILE_SIZE = '4MB' FROM tbl1;The return result is as follows:
Query OK, 9 rows affected
Log in to the server and check the exported file information in the
/home/admin/test_datadirectory on the OBServer node.[xxx@xxx /home/admin/test_data]# lsThe return result is as follows:
data_0.csv.gz dd2024_0.csv.gz test_tbl1.csvIn this example,
test_tbl1.csvis the file name exported by the serial single-file export example;data_0.csv.gzis the file name exported by the parallel multiple-file export example without a specified file name, with the compression algorithm set togzip; anddd2024_0.csv.gzis the file name exported by the parallel multiple-file export example with a specified file name prefix ofdd2024, with the compression algorithm set togzip.
Log in to the server where the OBServer node to connect to resides.
Go to the server where the OBServer node to connect to resides.
[xxx@xxx /home/admin/test_data]# ssh admin@xxx.xxx.xxx.xxxSet the file path for importing data.
Set the system variable
secure_file_privto specify the path where you can access files for importing or exporting.Notice
For security reasons, when you set the system variable
secure_file_priv, you can connect to the database only through a local Unix socket to execute the SQL statement that modifies the global variable. For more information, see secure_file_priv.Log in to the server where the OBServer node to connect to resides.
[xxx@xxx /home/admin/test_data]# ssh admin@xxx.xxx.xxx.xxxConnect to the
oracle001tenant through a local Unix socket.obclient -S /home/admin/oceanbase/run/sql.sock -uroot@oracle001 -p******Set the import path to
/home/admin/test_data.obclient [SYS]> SET GLOBAL secure_file_priv = "/home/admin/test_data"; Query OK, 0 rows affected
After reconnecting to the database, use the
SELECT INTO OUTFILEstatement to export data.Export data to a single file in serial mode. Specify the file name as
test_tbl1.csv, the character set asutf8mb4, and the compression format asgzip.obclient [SYS]>SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/test_tbl1.csv' FORMAT = (TYPE = 'CSV', FIELD_DELIMITER = ',', ENCODING = 'utf8mb4', COMPRESSION = gzip) FROM tbl1;The return result is as follows:
Query OK, 9 rows affectedWrite data to multiple files in parallel. Do not specify the file name, and set the maximum size of each file to 4 MB. Specify the character set as
utf8mb4, and the compression format asgzip.obclient [SYS]>SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/' FORMAT = (TYPE = 'CSV', FIELD_DELIMITER = ',', ENCODING = 'utf8mb4', COMPRESSION = gzip) SINGLE = FALSE MAX_FILE_SIZE = '4MB' FROM tbl1;The return result is as follows:
Query OK, 9 rows affectedWrite data to multiple files in parallel. Specify the file name prefix as
dd2024, set the maximum size of each file to 4 MB, specify the character set asutf8mb4, and the compression format asgzip.obclient [SYS]>SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/dd2024' FORMAT = (TYPE = 'CSV', FIELD_DELIMITER = ',', ENCODING = 'utf8mb4', COMPRESSION = gzip) SINGLE = FALSE MAX_FILE_SIZE = '4MB' FROM tbl1;The return result is as follows:
Query OK, 9 rows affected
Log in to the server and check the exported file information in the
/home/admin/test_datadirectory on the OBServer node.[xxx@xxx /home/admin/test_data]# lsThe return result is as follows:
data_0.csv.gz dd2024_0.csv.gz test_tbl1.csvIn this example,
test_tbl1.csvis the file name exported by the serial single-file export example;data_0.csv.gzis the file name exported by the parallel multiple-file export example without a specified file name, with the compression algorithm set togzip; anddd2024_0.csv.gzis the file name exported by the parallel multiple-file export example with a specified file name prefix ofdd2024, with the compression algorithm set togzip.
Example 2: Use the SELECT INTO OUTFILE statement to export a PARQUET file
Log in to the server where the OBServer node to connect to resides.
Go to the server where the OBServer node to connect to resides.
[xxx@xxx /home/admin/test_data]# ssh admin@xxx.xxx.xxx.xxxSet the file path for importing data.
Set the system variable
secure_file_privto specify the path where you can access files for importing or exporting.Notice
For security reasons, when you set the system variable
secure_file_priv, you can connect to the database only through a local Unix socket to execute the SQL statement that modifies the global variable. For more information, see secure_file_priv.Log in to the server where the OBServer node to connect to resides.
[xxx@xxx /home/admin/test_data]# ssh admin@xxx.xxx.xxx.xxxConnect to the
mysql001tenant through a local Unix socket.obclient -S /home/admin/oceanbase/run/sql.sock -uroot@mysql001 -p******Set the import path to
/home/admin/test_data.obclient [test]> SET GLOBAL secure_file_priv = "/home/admin/test_data"; Query OK, 0 rows affected
After reconnecting to the database, use the
SELECT INTO OUTFILEstatement to export data. You can use a mapping table to convert the various data types in MySQL mode to the data types supported by PARQUET. For more information about the data types supported by the PARQUET format, see Data type mapping table.Export data to a single file in serial mode. Specify the file name as
test_tbl1.parquet, the compression format asSNAPPY, and the size of a row group as128 MB.obclient [test]>SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/test_tbl1.parquet' FORMAT = (TYPE = 'PARQUET' COMPRESSION = 'SNAPPY' ROW_GROUP_SIZE = '128MB') FROM tbl1;The return result is as follows:
Query OK, 9 rows affectedWrite data to multiple files in parallel. Do not specify the file name, and set the maximum size of each file to 4 MB. Specify the compression format as
SNAPPY, and the size of a row group as128 MB.obclient [test]>SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/' FORMAT = (TYPE = 'PARQUET' COMPRESSION = 'SNAPPY' ROW_GROUP_SIZE = '128MB') SINGLE = FALSE MAX_FILE_SIZE = '4MB' FROM tbl1;The return result is as follows:
Query OK, 9 rows affectedWrite data to multiple files in parallel. Specify the file name prefix as
dd2024, set the maximum size of each file to 4 MB, specify the compression format asSNAPPY, and the size of a row group as128 MB.obclient [test]>SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/dd2024' FORMAT = (TYPE = 'PARQUET' COMPRESSION = 'SNAPPY' ROW_GROUP_SIZE = '128MB') SINGLE = FALSE MAX_FILE_SIZE = '4MB' FROM tbl1;The return result is as follows:
Query OK, 9 rows affected
Log in to the server and check the exported file information in the
/home/admin/test_datadirectory on the OBServer node.[xxx@xxx /home/admin/test_data]# lsThe return result is as follows:
data_0.parquet dd2024_0.parquet test_tbl1.parquetIn this example,
test_tbl1.parquetis the file name exported by the serial single-file export example;data_0.parquetis the file name exported by the parallel multiple-file export example without a specified file name; anddd2024_0.parquetis the file name exported by the parallel multiple-file export example with a specified file name prefix ofdd2024.
Log in to the server where the OBServer node to connect to resides.
Go to the server where the OBServer node to connect to resides.
[xxx@xxx /home/admin/test_data]# ssh admin@xxx.xxx.xxx.xxxSet the file path for importing data.
Set the system variable
secure_file_privto specify the path where you can access files for importing or exporting.Notice
For security reasons, when you set the system variable
secure_file_priv, you can connect to the database only through a local Unix socket to execute the SQL statement that modifies the global variable. For more information, see secure_file_priv.Log in to the server where the OBServer node to connect to resides.
[xxx@xxx /home/admin/test_data]# ssh admin@xxx.xxx.xxx.xxxConnect to the
oracle001tenant through a local Unix socket.obclient -S /home/admin/oceanbase/run/sql.sock -uroot@oracle001 -p******Set the import path to
/home/admin/test_data.obclient [SYS]> SET GLOBAL secure_file_priv = "/home/admin/test_data"; Query OK, 0 rows affected
After reconnecting to the database, use the
SELECT INTO OUTFILEstatement to export data. You can use a mapping table to convert the various data types in Oracle mode to the data types supported by PARQUET. For more information about the data types supported by the PARQUET format, see Data type mapping table.Export data to a single file in serial mode. Specify the file name as
test_tbl1.parquet, the compression format asSNAPPY, and the size of a row group as128 MB.obclient [SYS]>SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/test_tbl1.parquet' FORMAT = (TYPE = 'PARQUET' COMPRESSION = 'SNAPPY' ROW_GROUP_SIZE = '128MB') FROM tbl1;The return result is as follows:
Query OK, 9 rows affectedWrite data to multiple files in parallel. Do not specify the file name, and set the maximum size of each file to 4 MB. Specify the compression format as
SNAPPY, and the size of a row group as128 MB.obclient [SYS]>SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/' FORMAT = (TYPE = 'PARQUET' COMPRESSION = 'SNAPPY' ROW_GROUP_SIZE = '128MB') SINGLE = FALSE MAX_FILE_SIZE = '4MB' FROM tbl1;The return result is as follows:
Query OK, 9 rows affectedWrite data to multiple files in parallel. Specify the file name prefix as
dd2024, set the maximum size of each file to 4 MB, specify the compression format asSNAPPY, and the size of a row group as128 MB.obclient [SYS]>SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/dd2024' FORMAT = (TYPE = 'PARQUET' COMPRESSION = 'SNAPPY' ROW_GROUP_SIZE = '128MB') SINGLE = FALSE MAX_FILE_SIZE = '4MB' FROM tbl1;The return result is as follows:
Query OK, 9 rows affected
Log in to the server and check the exported file information in the
/home/admin/test_datadirectory on the OBServer node.[xxx@xxx /home/admin/test_data]# lsThe return result is as follows:
data_0.parquet dd2024_0.parquet test_tbl1.parquetIn this example,
test_tbl1.parquetis the file name exported by the serial single-file export example;data_0.parquetis the file name exported by the parallel multiple-file export example without a specified file name; anddd2024_0.parquetis the file name exported by the parallel multiple-file export example with a specified file name prefix ofdd2024.
Example 3: Use the SELECT INTO OUTFILE statement to export an ORC file
Log in to the server where the OBServer node to connect to resides.
Go to the server where the OBServer node to connect to resides.
[xxx@xxx /home/admin/test_data]# ssh admin@xxx.xxx.xxx.xxxSet the file path for importing data.
Set the system variable
secure_file_privto specify the path where you can access files for importing or exporting.Notice
For security reasons, when you set the system variable
secure_file_priv, you can connect to the database only through a local Unix socket to execute the SQL statement that modifies the global variable. For more information, see secure_file_priv.Log in to the server where the OBServer node to connect to resides.
[xxx@xxx /home/admin/test_data]# ssh admin@xxx.xxx.xxx.xxxConnect to the
mysql001tenant through a local Unix socket.obclient -S /home/admin/oceanbase/run/sql.sock -uroot@mysql001 -p******Set the import path to
/home/admin/test_data.obclient [test]> SET GLOBAL secure_file_priv = "/home/admin/test_data"; Query OK, 0 rows affected
After reconnecting to the database, use the
SELECT INTO OUTFILEstatement to export data. You can use a mapping table to convert the various data types in MySQL mode to the data types supported by ORC. For more information about the data types supported by the ORC format, see Data type mapping table.Export data to a single file in serial mode. Specify the file name as
test_tbl1.orc, the compression format asSNAPPY, the size of a compressed block as256 KB, the size of a stripe as64 MB, and the index interval as 10,000 rows.obclient [test]>SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/test_tbl1.orc' FORMAT = (TYPE = 'ORC' COMPRESSION = 'SNAPPY' COMPRESSION_BLOCK_SIZE = 262144 STRIPE_SIZE = 67108864 ROW_INDEX_STRIDE = 10000) FROM tbl1;The return result is as follows:
Query OK, 9 rows affectedWrite data to multiple files in parallel. Do not specify the file name, and set the maximum size of each file to 4 MB. Specify the compression format as
SNAPPY, the size of a compressed block as256 KB, the size of a stripe as64 MB, and the index interval as 10,000 rows.obclient [test]>SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/' FORMAT = (TYPE = 'ORC' COMPRESSION = 'SNAPPY' COMPRESSION_BLOCK_SIZE = 262144 STRIPE_SIZE = 67108864 ROW_INDEX_STRIDE = 10000) SINGLE = FALSE MAX_FILE_SIZE = '4MB' FROM tbl1;The return result is as follows:
Query OK, 9 rows affectedWrite data to multiple files in parallel. Specify the file name prefix as
dd2024, set the maximum size of each file to 4 MB, specify the compression format asSNAPPY, the size of a compressed block as256 KB, the size of a stripe as64 MB, and the index interval as 10,000 rows.obclient [test]>SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/dd2024' FORMAT = (TYPE = 'ORC' COMPRESSION = 'SNAPPY' COMPRESSION_BLOCK_SIZE = 262144 STRIPE_SIZE = 67108864 ROW_INDEX_STRIDE = 10000) SINGLE = FALSE MAX_FILE_SIZE = '4MB' FROM tbl1;The return result is as follows:
Query OK, 9 rows affected
Log in to the server and check the exported file information in the
/home/admin/test_datadirectory on the OBServer node.[xxx@xxx /home/admin/test_data]# lsThe return result is as follows:
data_0.orc dd2024_0.orc test_tbl1.orcIn this example,
test_tbl1.orcis the file name exported by the serial single-file export example;data_0.orcis the file name exported by the parallel multiple-file export example without a specified file name; anddd2024_0.orcis the file name exported by the parallel multiple-file export example with a specified file name prefix ofdd2024.
Log in to the server where the OBServer node to connect to resides.
Go to the server where the OBServer node to connect to resides.
[xxx@xxx /home/admin/test_data]# ssh admin@xxx.xxx.xxx.xxxSet the file path for importing data.
Set the system variable
secure_file_privto specify the path where you can access files for importing or exporting.Notice
For security reasons, when you set the system variable
secure_file_priv, you can connect to the database only through a local Unix socket to execute the SQL statement that modifies the global variable. For more information, see secure_file_priv.Log in to the server where the OBServer node to connect to resides.
[xxx@xxx /home/admin/test_data]# ssh admin@xxx.xxx.xxx.xxxConnect to the
oracle001tenant through a local Unix socket.obclient -S /home/admin/oceanbase/run/sql.sock -uroot@oracle001 -p******Set the import path to
/home/admin/test_data.obclient [SYS]> SET GLOBAL secure_file_priv = "/home/admin/test_data"; Query OK, 0 rows affected
After reconnecting to the database, use the
SELECT INTO OUTFILEstatement to export data. You can use a mapping table to convert the various data types in Oracle mode to the data types supported by ORC. For more information about the data types supported by the ORC format, see Data type mapping table.Export data to a single file in serial mode. Specify the file name as
test_tbl1.orc, the compression format asSNAPPY, the size of a compressed block as256 KB, the size of a stripe as64 MB, and the index interval as 10,000 rows.obclient [SYS]>SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/test_tbl1.orc' FORMAT = (TYPE = 'ORC' COMPRESSION = 'SNAPPY' COMPRESSION_BLOCK_SIZE = 262144 STRIPE_SIZE = 67108864 ROW_INDEX_STRIDE = 10000) FROM tbl1;The return result is as follows:
Query OK, 9 rows affectedWrite data to multiple files in parallel. Do not specify the file name, and set the maximum size of each file to 4 MB. Specify the compression format as
SNAPPY, the size of a compressed block as256 KB, the size of a stripe as64 MB, and the index interval as 10,000 rows.obclient [SYS]>SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/' FORMAT = (TYPE = 'ORC' COMPRESSION = 'SNAPPY' COMPRESSION_BLOCK_SIZE = 262144 STRIPE_SIZE = 67108864 ROW_INDEX_STRIDE = 10000) SINGLE = FALSE MAX_FILE_SIZE = '4MB' FROM tbl1;The return result is as follows:
Query OK, 9 rows affectedWrite data to multiple files in parallel. Specify the file name prefix as
dd2024, set the maximum size of each file to 4 MB, specify the compression format asSNAPPY, the size of a compressed block as256 KB, the size of a stripe as64 MB, and the index interval as 10,000 rows.obclient [SYS]>SELECT /*+parallel(2)*/ * INTO OUTFILE '/home/admin/test_data/dd2024' FORMAT = (TYPE = 'ORC' COMPRESSION = 'SNAPPY' COMPRESSION_BLOCK_SIZE = 262144 STRIPE_SIZE = 67108864 ROW_INDEX_STRIDE = 10000) SINGLE = FALSE MAX_FILE_SIZE = '4MB' FROM tbl1;The return result is as follows:
Query OK, 9 rows affected
Log in to the server and check the exported file information in the
/home/admin/test_datadirectory on the OBServer node.[xxx@xxx /home/admin/test_data]# lsThe return result is as follows:
data_0.orc dd2024_0.orc test_tbl1.orcIn this example,
test_tbl1.orcis the file name exported by the serial single-file export example;data_0.orcis the file name exported by the parallel multiple-file export example without a specified file name; anddd2024_0.orcis the file name exported by the parallel multiple-file export example with a specified file name prefix ofdd2024.
Export data to OSS
You can execute the SELECT INTO OUTFILE statement to export data from the test_tbl2 table to a specified OSS storage location based on partitions. The partitions are determined by the combination of the col1 and col2 columns. Rows with the same values in these two columns are considered to be in the same partition and exported to the same directory.
SELECT /*+parallel(3)*/ * FROM test_tbl2
INTO OUTFILE 'oss://$DATA_FOLDER_NAME/?host=$OSS_HOST&access_id=$OSS_ACCESS_ID&access_key=$OSS_ACCESS_KEY'
PARTITION BY CONCAT(col1,'/',col2)
SINGLE = FALSE BUFFER_SIZE = '2MB';
The storage location is specified by the $DATA_FOLDER_NAME variable. You also need to provide the host address, access ID, and access key of the OSS service.
More information
A file exported by using the SELECT INTO OUTFILE method can be imported by using the LOAD DATA statement. For more information, see Import data by using the LOAD DATA statement.