Use the CREATE EXTERNAL TABLE statement to create an external table. When you create an external table, you must specify the path and format of the data files so that data can be read from the external files.
Privilege requirements
To create an external table, the current user must have the CREATE TABLE privilege. For information about how to query the privileges of the current user, see Query user privileges. If you do not have the CREATE TABLE privilege, contact the administrator to grant you the privilege. For information about how to grant privileges to a user, see Grant privileges directly.
Create an external table
The following example shows how to create an external table:
CREATE EXTERNAL TABLE table_name
( col_name col_type [AS (metadata$filecol{N})]
[ , col_name col_type [AS (metadata$filecol{N})] ]
[ , ... ] )
LOCATION = '<string>'
FORMAT = (
TYPE = 'CSV'
LINE_DELIMITER = '<string>' | <expr>
FIELD_DELIMITER = '<string>' | <expr>
ESCAPE = '<character>' | <expr>
FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>
ENCODING = 'charset'
NULL_IF = ('<string>' | <expr>, '<string>' | <expr> ...)
SKIP_HEADER = <int>
SKIP_BLANK_LINES = { TRUE | FALSE }
TRIM_SPACE = { TRUE | FALSE }
EMPTY_FIELD_AS_NULL = { TRUE | FALSE }
IGNORE_LAST_EMPTY_COLUMN = {TRUE | FALSE}
)
[ PATTERN = '<regex_pattern>' ]
The following table describes the parameters in the preceding statement.
col_name col_type [AS (metadata$filecol{N})]: defines a column.AS (metadata$filecol{N})specifies the mapping of the column.The column types supported by external tables are the same as those supported by regular tables. For more information about the data types supported by OceanBase Database in Oracle mode, see Data types.
By default, the data columns in the external file are automatically mapped to the columns defined in the external table in sequence. That is, the first column of the external table is automatically mapped to the first data column in the external file.
For example, in the following example, the
C1column of theext_t1table is automatically mapped to the first data column in the external file, and theC2column is automatically mapped to the second data column.CREATE EXTERNAL TABLE ext_t1 ( C1 int, C2 int ) LOCATION = 'oss://$ACCESS_ID:$ACCESS_KEY@$HOST/tpch_1g_data/lineitem/' FORMAT = ( TYPE = 'CSV' FIELD_DELIMITER = '|' );If the order of columns in the external file is different from the order of columns defined in the external table, you can use a pseudo column in the format of
metadata$filecol{N}to specify that the column of the external table is mapped to the Nth column in the external file. Note that the columns in the file are numbered starting from 1.For example, in the following example,
C1 int AS (metadata$filecol2)indicates that theC1column of theext_t2table is mapped to the second column in the file, andC2 int AS (metadata$filecol4)indicates that theC2column of theext_t2table is mapped to the fourth column in the external file.CREATE EXTERNAL TABLE ext_t2 ( C1 int AS (metadata$filecol2), C2 int AS (metadata$filecol4) ) LOCATION = 'oss://$ACCESS_ID:$ACCESS_KEY@$HOST/tpch_1g_data/lineitem/' FORMAT = ( TYPE = 'CSV' FIELD_DELIMITER = '|' );Notice
If you manually define column mappings, automatic column mapping will be disabled, and all columns must be manually mapped.
LOCATION = '<string>': specifies the path where the external file is stored. Usually, the data files of an external table are stored in a separate directory, which can contain subdirectories. When you create an external table, the table automatically collects all files in the directory.The following two formats are supported:
Local Location format:
LOCATION = '[file://] local_file_path'local_file_path: can be a relative path or an absolute path. If you specify a relative path, the current directory must be the installation directory of OceanBase Database.Notice
local_file_pathmust be a directory, not a file. If you want to specify a single file, you need to specify the parent directory of the file inLOCATION, and use thePATTERNattribute to specify the file.For scenarios using the local location format, when you configure the system variable
secure_file_privto specify the file path that OceanBase Database has permission to access,secure_file_privmust be the parent directory oflocal_file_path, meaning thatlocal_file_pathcan only be a subdirectory ofsecure_file_priv.The tenant-level system variable
secure_file_privspecifies the path that OceanBase Database can access when importing or exporting data to or from a file. For more information aboutsecure_file_priv, see secure_file_priv.
Remote Location format:
Notice
When you specify an object storage path, the parameters of the object storage path are separated by the
&symbol. Make sure that the parameter values you enter contain only uppercase and lowercase letters, numbers,/-_$+=, and wildcards. If you enter any other characters, the configuration may fail.If the file is stored on Alibaba Cloud OSS or AWS S3, the format is:
LOCATION = '{oss\|s3}://$ACCESS_ID:$ACCESS_KEY@$HOST:s3_region/remote_file_path', where$ACCESS_ID,$ACCESS_KEY, and$HOSTare the access information required to access Alibaba Cloud OSS, AWS S3, or an object storage service compatible with the S3 protocol, ands3_regionis the region information selected when using S3. These sensitive access details are stored in the database's system tables in an encrypted manner.If the file is stored on HDFS, the format is as follows:
To access the cluster based on the address of a single-node NameNode (NN), use the following format:
LOCATION = hdfs://localhost:port/PATH, wherelocalhostis the address of HDFS,portis the port number of HDFS, andPATHis the file path in HDFS.To access the cluster with Kerberos authentication, use the following format:
LOCATION = 'hdfs://localhost:port/user?principal=xxx&keytab=xxx&krb5conf=xxx&configs=xxx', where:principal: specifies the user for login authentication.keytab: specifies the path of the user authentication key file.krb5conf: specifies the path of the Kerberos environment description file.configs: specifies additional HDFS configuration items. By default, this parameter is empty. However, in a Kerberos environment, this parameter usually has a value, which needs to be configured. For example:dfs.data.transfer.protection=authentication,privacy, which specifies the data transmission protection level asauthenticationandprivacy.
To access the cluster based on Hadoop High Availability (HA), use the following format:
LOCATION = hdfs://nameserviceID/PATH, wherenameserviceIDis the logical nameservice ID of the Hadoop HA cluster, andPATHis the file path.Note
Make sure that the client OBServer configuration includes the
nameservicedefinition and failover strategy for the HA cluster.For Kerberos authentication, the format is as follows:
LOCATION = 'hdfs://nameserviceID/PATH?principal=xxx&keytab=xxx&krb5conf=xxx&configs=dfs.data.transfer.protection=${string}#dfs.nameservices=${nameservice id}#dfs.ha.namenodes.${nameservice id}=${namenode1}, ${namenode2}#dfs.namenode.rpc-address.${nameservice id}.${namenode1}=${namenode 1 address}#dfs.namenode.rpc-address.${nameservice id}.${namenode2}=${namenode 2 address}#dfs.ha.automatic-failover.enabled.${nameservice id}=true#dfs.client.failover.proxy.provider.${nameservice id}=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider', where:principal: specifies the login user, which must be thepricipalof the standby namenode.keytabandkrb5conf: are set in the same way as for a single-node namenode.configs: specifies additional HDFS configurations. You need to set multiple configurations, which are related only to HA configurations and security configurations:dfs.data.transfer.protection=${string}: specifies thedfs.data.transfer.protectionconfiguration of the cluster.dfs.nameservices=${nameservice id}: specifies thenameservice(alias) of the current HA cluster.dfs.ha.namenodes.${nameservice id}=${namenode1}, ${namenode2}: specifies the list of standby namenode IDs of the HA cluster.dfs.namenode.rpc-address.${nameservice id}.${namenode1}=${namenode 1 address}: specifies the specific namenode ofnamenode1for client routing.dfs.namenode.rpc-address.${nameservice id}.${namenode2}=${namenode 2 address}: specifies the specific namenode ofnamenode2for client routing.dfs.ha.automatic-failover.enabled.${nameservice id}=true: specifies that the HA cluster automatically obtains a valid namenode to respond to requests after the cluster receives the requests.dfs.client.failover.proxy.provider.${nameservice id}=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider: specifies the logic class for automatic failover in the HA cluster. You can also customize and upload the logic classes required by the HA cluster.
Notice
HA configurations are bound to the
namespace. For example, in the following sample three, themyclusteris thenamespace. Make sure to configure the related configurations accordingly.
FORMAT = ( TYPE = 'CSV'...): specifies that the external file is in the CSV format.TYPE: specifies the type of the external file.LINE_DELIMITER: specifies the line delimiter of the file.FIELD_DELIMITER: specifies the column delimiter of the file.ESCAPE: specifies the escape character of the file.FIELD_OPTIONALLY_ENCLOSED_BY: specifies the symbol that wraps the field value. For example,ESCAPE = '"'indicates that the value is enclosed in double quotation marks. If this parameter is not specified, the default value is empty.Notice
If the external table data file contains
NULLvalues (not the string NULL, that is, not "NULL"), you must explicitly configure theFIELD_OPTIONALLY_ENCLOSED_BYparameter and the value of this parameter cannot be empty.ENCODING: specifies the character set encoding format of the file. For information about all character sets supported in the Oracle mode, see Character sets. If this parameter is not specified, the default value is UTF8MB4.NULL_IF: specifies the strings that are treated asNULL. If this parameter is not specified, the default value is empty.SKIP_HEADER: specifies the number of header rows to skip. If this parameter is not specified, the default value is 0, indicating that the header rows are not skipped.SKIP_BLANK_LINES: specifies whether to skip blank lines. If this parameter is not specified, the default value isFALSE.TRIM_SPACE: specifies whether to remove leading and trailing spaces from fields in the file. If this parameter is not specified, the default value isFALSE.EMPTY_FIELD_AS_NULL: specifies whether to treat empty strings asNULL. If this parameter is not specified, the default value isFALSE.IGNORE_LAST_EMPTY_COLUMN: specifies whether to ignore the last empty field at the end of a line in the file. The default value isTRUE, indicating that the last empty field is ignored.Note
For V4.3.5, the
IGNORE_LAST_EMPTY_COLUMNparameter is supported starting from V4.3.5 BP2.
FORMAT = ( TYPE = 'PARQUET'...): specifies that the external file is in the PARQUET format.PATTERN: specifies the regular expression pattern to filter files in theLOCATIONdirectory. For each file in theLOCATIONdirectory, if the file matches the pattern, the external table can access the file; otherwise, the external table skips the file. If this parameter is not specified, the default value is to allow access to all files in theLOCATIONdirectory.
Assume that a data.csv file is stored in the /home/admin/oceanbase/ directory on the local machine, and the content of the file is as follows.
1,"lin",98
2,"hei",90
3,"ali",95
Log in to the OBServer node as the tenant administrator and connect to the Oracle tenant of the cluster through the local Unix socket.
The following code shows an example of how to connect to the database:
obclient -S /home/admin/oceanbase/run/sql.sock -uroot@sys -p********For specific operations and descriptions about connecting to OceanBase Database by using a local Unix socket, see secure_file_priv.
Configure the path
/home/admin/oceanbase/that the database can access.SET GLOBAL secure_file_priv = "/home/admin/oceanbase/";After the command is executed, you must restart the session for the modification to take effect.
Connect to the database again and create the external table
ext_t3.CREATE EXTERNAL TABLE ext_t3(ID NUMBER(32), NAME VARCHAR2(30),SCORE NUMBER(32)) LOCATION = '/home/admin/oceanbase/' FORMAT = ( TYPE = 'CSV' FIELD_DELIMITER = ',' FIELD_OPTIONALLY_ENCLOSED_BY ='"' ) PATTERN = 'data.csv';
After the external table is created, you can use the SHOW CREATE TABLE statement to view the table definition, just like for a regular table.
SHOW CREATE TABLE ext_t3;
The execution result is as follows:
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| TABLE | CREATE TABLE |
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXT_T3 | CREATE EXTERNAL TABLE "EXT_T3" (
"ID" NUMBER(32) GENERATED ALWAYS AS (METADATA$FILECOL1),
"NAME" VARCHAR2(30) GENERATED ALWAYS AS (METADATA$FILECOL2),
"SCORE" NUMBER(32) GENERATED ALWAYS AS (METADATA$FILECOL3)
)
LOCATION='file:///home/admin/oceanbase/'
PATTERN='data.csv'
FORMAT (
TYPE = 'CSV',
FIELD_DELIMITER = ',',
FIELD_OPTIONALLY_ENCLOSED_BY = '"',
ENCODING = 'utf8mb4'
)COMPRESS FOR ARCHIVE REPLICA_NUM = 1 BLOCK_SIZE = 16384 USE_BLOOM_FILTER = FALSE TABLET_SIZE = 134217728 PCTFREE = 0 |
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set
You can also access the external table like a regular table. When querying an external table, the system reads the external file directly through the driver layer, parses it according to the file format, converts it into internal data types of OceanBase Database, and returns the data rows. Here is an example of querying the external table lineitem:
SELECT * FROM ext_t3;
The execution result is as follows:
+----+------+-------+
| ID | NAME | SCORE |
+----+------+-------+
| 1 | lin | 98 |
| 2 | hei | 90 |
| 3 | ali | 95 |
+----+------+-------+
3 rows in set
In addition, you can perform combined queries between an external table and a regular table. Suppose there is a regular table named info in the current database, with the following data:
+------+--------+------+
| NAME | SEX | AGE |
+------+--------+------+
| lin | male | 8 |
| hei | male | 9 |
| li | female | 8 |
+------+--------+------+
3 rows in set
Here is an example of a combined query between the external table ext_t3 and the regular table info.
SELECT info.* FROM info, ext_t3 WHERE info.name = ext_t3.name AND ext_t3.score > 90;
The execution result is as follows:
+------+--------+------+
| NAME | SEX | AGE |
+------+--------+------+
| lin | male | 8 |
| li | female | 8 |
+------+--------+------+
2 rows in set
For more information about query operations, see Read data.
Considerations
You can query an external table, but cannot perform DML operations on it.
If the external file that an external table accesses is deleted, no error is returned when you query the external table. Instead, an empty row is returned.
If the external storage system is unavailable, an error is returned when you query an external table.
What to do next
When you create an external table, OceanBase Database saves the list of files that match the PATTERN in the specified path under the LOCATION parameter in the system table. When you scan the external table, it accesses the external files based on the list. If other files are added to the external directory, you must update the external table file list to add the new files. For more information, see Manage external files.
After you create an external table, you can drop it. The statement for dropping an external table is the same as that for dropping a regular table. You can execute the DROP TABLE statement to drop an external table. For more information, see Drop a table.
