You can create an external table by using the CREATE EXTERNAL TABLE statement. When you create an external table, you must specify the path of the data file and the format of the data file to read data from the external file.
Privileges required
To create an external table, the current user must have the CREATE privilege. For more information about how to view the privileges of the current user, see View user privileges.
Create an external table
The following example shows the syntax for creating an external table:
CREATE EXTERNAL TABLE table_name
( col_name col_type [AS (metadata$filecol{N})]
[ , col_name col_type [AS (metadata$filecol{N})] ]
[ , ... ] )
LOCATION = '<string>'
FORMAT = (
TYPE = 'CSV'
LINE_DELIMITER = '<string>' | <expr>
FIELD_DELIMITER = '<string>' | <expr>
ESCAPE = '<character>' | <expr>
FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>
ENCODING = 'charset'
NULL_IF = ('<string>' | <expr>, '<string>' | <expr> ...)
SKIP_HEADER = <int>
SKIP_BLANK_LINES = { TRUE | FALSE }
TRIM_SPACE = { TRUE | FALSE }
EMPTY_FIELD_AS_NULL = { TRUE | FALSE }
)
[ PATTERN = '<regex_pattern>' ]
The following table describes the parameters in the statement.
col_name col_type [AS (metadata$filecol{N})]: specifies the columns of the external table. You can useAS (metadata$filecol{N})to manually define column mapping.The column types of a external table are the same as those of a normal table. For more information about the data types supported for external tables in MySQL mode of OceanBase Database, see Data types.
By default, the columns of data in the external file are automatically mapped to the columns of the external table in order. That is, the first column of the external table corresponds to the first column of data in the external file.
For example, the
C1column of the external tableext_t1is automatically mapped to the first column of data in the external file; theC2column is automatically mapped to the second column of data.CREATE EXTERNAL TABLE ext_t1 ( C1 int, C2 int ) LOCATION = 'oss://$ACCESS_ID:$ACCESS_KEY@$HOST/tpch_1g_data/lineitem/' FORMAT = ( TYPE = 'CSV' FIELD_DELIMITER = '|' );If the order of columns in the external file is different from that in the external table, you can use a pseudocolumn in the format of
metadata$filecol{N}to specify that a column of the external table corresponds to the Nth column of data in the external file. The columns in the file are numbered starting from 1.For example,
C1 int AS (metadata$filecol2)indicates that theC1column of the external tableext_t2corresponds to the second column in the file;C2 int AS (metadata$filecol4)indicates that theC2column of the external tableext_t2corresponds to the fourth column in the external file.CREATE EXTERNAL TABLE ext_t2 ( C1 int AS (metadata$filecol2), C2 int AS (metadata$filecol4) ) LOCATION = 'oss://$ACCESS_ID:$ACCESS_KEY@$HOST/tpch_1g_data/lineitem/' FORMAT = ( TYPE = 'CSV' FIELD_DELIMITER = '|' );Notice
If you want to manually define column mapping, automatic column mapping will fail and you must manually define the mapping for all columns.
LOCATION = '<string>': specifies the path where the external file is stored. Generally, the data files of an external table are stored in a separate directory, which can contain subdirectories. When you create an external table, the system automatically collects all files in the directory.The following two formats are supported:
Local location format:
LOCATION = '[file://] local_file_path'local_file_path: can be a relative path or an absolute path. If you enter a relative path, the current directory must be the installation directory of OceanBase Database.Notice
In the
LOCATIONparameter using the local location format, the value of the system variablesecure_file_privmust be the parent directory oflocal_file_path, that is,local_file_pathmust be a subdirectory of the path specified bysecure_file_priv.In scenarios that use the local location format, when you use the system variable
secure_file_privto specify a path that OceanBase Database can access, you must set the value ofsecure_file_privto the parent directory oflocal_file_path, that is,local_file_pathmust be a subdirectory of the path specified bysecure_file_priv.The system variable
secure_file_privspecifies the path that OceanBase Database can access when you import data to or export data from a file. For more information aboutsecure_file_priv, see secure_file_priv.
Remote location format:
LOCATION = '{oss|cos|S3}://$ACCESS_ID:$ACCESS_KEY@$HOST/remote_file_path'$ACCESS_ID,$ACCESS_KEY, and$HOSTare required access information for accessing Alibaba Cloud OSS, Tencent Cloud COS, and Amazon S3 respectively. These sensitive access information is stored in the system tables of the database in encrypted form.Notice
When you use an object storage path, separate the parameters in the path with the
&character. Make sure that the values of the parameters you enter contain only uppercase and lowercase English letters, digits, and the characters/-_$+=and wildcard. If you enter characters other than the preceding ones, the setting may fail.
FORMAT = ( TYPE = 'CSV'...): specifies the CSV format for the external file.TYPE: specifies the type of the external file.LINE_DELIMITER: specifies the line delimiter for the file. If you do not specify this parameter, the default valueLINE_DELIMITER='\n'takes effect.FIELD_DELIMITER: specifies the field delimiter for the file. If you do not specify this parameter, the default valueFIELD_DELIMITER='\t'takes effect.ESCAPE: specifies the escape character for the file. For example,ESCAPE ='*'specifies to use an asterisk (*) as the escape character instead of the default escape character (). If you do not specify this parameter, the default valueESCAPE ='\'takes effect.FIELD_OPTIONALLY_ENCLOSED_BY: specifies the characters that enclose the field values in the file. For example,ESCAPE = '"'specifies to enclose the values in double quotation marks. If you do not specify this parameter, the default value takes effect.ENCODING: specifies the character set encoding used by the file. For more information about the character sets supported in MySQL mode of OceanBase Database, see Character sets. If you do not specify this parameter, the default value UTF8MB4 takes effect.NULL_IF: specifies the strings that are treated asNULLvalues. If you do not specify this parameter, the default value takes effect.SKIP_HEADER: specifies to skip the file header. This parameter specifies the number of lines to skip. If you do not specify this parameter, the system does not skip the file header.SKIP_BLANK_LINES: specifies whether to skip blank lines. If you do not specify this parameter, the default valueFALSEtakes effect.TRIM_SPACE: specifies whether to remove leading and trailing spaces from fields in the file. If you do not specify this parameter, the default valueFALSEtakes effect.EMPTY_FIELD_AS_NULL: specifies whether to treat empty strings asNULLvalues. If you do not specify this parameter, the default valueFALSEtakes effect.FORMAT = ( TYPE = 'PARQUET'...): specifies the Parquet format for the external file.PATTERN: specifies the regular pattern string for filtering files in the directory specified byLOCATION. For each file in the directory specified byLOCATION, if the file name matches the pattern string, the external table can access the file. Otherwise, the external table skips the file. If you do not specify this parameter, the external table can access all files in the directory specified byLOCATIONby default.
Assume that a file named data.csv exists in the /home/admin/oceanbase/ directory on your local machine, and the file contains the following data.
1,"lin",98
2,"hei",90
3,"ali",95
On the OBServer node, the tenant administrator connects to the MySQL tenant of the tenant to which the external table belongs through the local Unix socket.
Here is an example of a connection:
obclient -S /home/admin/oceanbase/run/sql.sock -uroot@sys -p********For more information about how to connect to OceanBase Database through a local Unix socket, see secure_file_priv.
Configure the path
/home/admin/oceanbase/that OceanBase Database can access.SET GLOBAL secure_file_priv = "/home/admin/oceanbase/";After the command is executed, you need to restart the session for the modification to take effect.
Reconnect to the database and create an external table named
ext_t3.CREATE EXTERNAL TABLE ext_t3(id int, name char(10),score int) LOCATION = '/home/admin/oceanbase/' FORMAT = ( TYPE = 'CSV' FIELD_DELIMITER = ',' FIELD_OPTIONALLY_ENCLOSED_BY ='"' ) PATTERN = 'data.csv';
After the external table is created, you can use the SHOW CREATE TABLE statement to view the table definition, just like that for a normal table.
SHOW CREATE TABLE ext_t3;
The execution result is as follows:
+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ext_t3 | CREATE EXTERNAL TABLE `ext_t3` (
`id` int(11) GENERATED ALWAYS AS (metadata$filecol1),
`name` char(10) GENERATED ALWAYS AS (metadata$filecol2),
`score` int(11) GENERATED ALWAYS AS (metadata$filecol3)
)
LOCATION='file:///home/admin/oceanbase/'
PATTERN='data.csv'
FORMAT (
TYPE = 'CSV',
FIELD_DELIMITER = ',',
FIELD_OPTIONALLY_ENCLOSED_BY = '"',
ENCODING = 'utf8mb4'
)DEFAULT CHARSET = utf8mb4 ROW_FORMAT = DYNAMIC COMPRESSION = 'zstd_1.3.8' REPLICA_NUM = 1 BLOCK_SIZE = 16384 USE_BLOOM_FILTER = FALSE TABLET_SIZE = 134217728 PCTFREE = 0 |
+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set
You can also access the external table just like a normal table. When you query an external table, the system reads the external file through the driver of the external table and parses the file according to the file format. Then, the system converts the data into internal data types of OceanBase Database and returns the data rows. Here is an example of querying the external table lineitem that is created just now.
SELECT * FROM ext_t3;
The execution result is as follows:
+----+------+-------+
| id | name | score |
+----+------+-------+
| 1 | lin | 98 |
| 2 | hei | 90 |
| 3 | ali | 95 |
+----+------+-------+
3 rows in set
You can also combine an external table with a normal table for query. Assume that a table named info exists in the current database, and the data in the table is as follows:
+------+--------+------+
| name | sex | age |
+------+--------+------+
| lin | male | 8 |
| hei | male | 9 |
| li | female | 8 |
+------+--------+------+
3 rows in set
Here is an example of combining the external table ext_t3 and the normal table info for query.
SELECT info.* FROM info, ext_t3 WHERE info.name = ext_t3.name AND ext_t3.score > 90;
The execution result is as follows:
+------+--------+------+
| name | sex | age |
+------+--------+------+
| lin | male | 8 |
| li | female | 8 |
+------+--------+------+
2 rows in set
For more information about queries, see Read data.
Considerations
An external table can only be queried, and does not support DML operations.
When you query an external table, if the external file accessed by the table is deleted, the system does not return an error, but returns an empty result.
The external file system manages the external file. If the external storage system is unavailable, an error is returned when you query the external table.
What to do next
When you create an external table, the system saves the file list that matches the PATTERN in the specified LOCATION by using the LOCATION parameter in the system table of OceanBase Database. During a scan, the system accesses the external file based on the file list. If other files are added to the external directory, you must perform an operation to add the new files to the file list of the external table. For more information, see Manage external files.
You can also drop an external table. The statement for dropping an external table is the same as that for dropping a normal table. To drop an external table, you need to specify the DROP TABLE statement. For more information, see Drop a table.