This topic describes how to create an external table by using SQL statements. It also describes the prerequisites, overview, and considerations for creating an external table, and provides examples.
Overview
An external table is a logical table object. Its data is stored in an external storage system instead of the database.
For more information about external tables, see Overview of external tables.
Prerequisites
Before you create an external table, make sure that:
You have deployed an OceanBase cluster and created an Oracle-compatible tenant. For more information about how to deploy an OceanBase cluster, see Deployment overview.
You have connected to an Oracle-compatible tenant of OceanBase Database. For more information about how to connect to the database, see Overview of connection methods.
The current user has the
CREATE TABLEprivilege. For more information about how to view the privileges of the current user, see View user privileges. If you do not have this privilege, contact the administrator to grant it to you. For more information about how to grant privileges to users, see Grant privileges directly.
Considerations
An external table can only be queried, and DML operations on it are not supported.
When you query an external table, if the external file accessed by the table is deleted, the system does not return an error, but instead returns an empty result set.
If the external storage system that manages the file accessed by the external table becomes unavailable, an error is returned when you query the external table.
External table data is stored in an external data source. Therefore, factors such as network latency and file system performance affect the query performance. When you create an external table, select an appropriate data source and optimization strategy to improve query efficiency.
Create an external table by using the command line
You can execute the CREATE EXTERNAL TABLE statement to create an external table.
Define the name
When you create an external table, you must name the external table. To avoid confusion and ambiguity, we recommend that you use specific naming rules or prefixes to distinguish external tables from regular tables when naming external tables. For example, you can add a suffix such as _csv to the name of an external table.
Here is an example:
Create an external table named students_csv to store student information.
CREATE EXTERNAL TABLE students_csv external_options
Notice
Since no other attributes are specified for the external table in the preceding SQL statement, the statement cannot be executed.
Define columns
Unlike regular tables, external tables cannot have constraints such as DEFAULT, NOT NULL, UNIQUE, CHECK, PRIMARY KEY, or FOREIGN KEY.
The column types supported for external tables are the same as those supported for regular tables. For more information about the data types supported in the Oracle-compatible mode of OceanBase Database, see Overview.
Define LOCATION
The LOCATION option specifies the path for external tables. Generally, the data files of an external table are stored in a separate directory, which can contain subdirectories. When you create an external table, the table automatically collects all files in the directory.
OceanBase Database supports the following two path formats:
Local location format:
LOCATION = '[file://] local_file_path'Notice
For scenarios that use the local location format, you must set the system variable
secure_file_privto specify an accessible path. For more information, see secure_file_priv.Remote location format:
LOCATION = '{oss|S3}://$ACCESS_ID:$ACCESS_KEY@$HOST:s3_region/remote_file_path'. In this format,$ACCESS_ID,$ACCESS_KEY, and$HOSTare required for accessing Alibaba Cloud OSS, AWS S3, and object storage services that support the S3 protocol.s3_regionindicates the region selected when you use S3. These sensitive access information is encrypted and stored in system tables of the database.LOCATION = 'hdfs://$ {hdfs_namenode_address}:${port}/PATH.localhost'. In this format,portindicates the port number of HDFS, andPATHindicates the directory path in HDFS.- With Kerberos authentication:
LOCATION = 'hdfs://localhost:port/user?principal=xxx&keytab=xxx&krb5conf=xxx&configs=xxx'. Where:principal: the authentication user.keytab: the key file for user authentication.krb5conf: the description file of the Kerberos environment for the user.configs: the additional HDFS configuration item. By default, it is empty, but in a Kerberos environment, this parameter is usually configured. For example,dfs.data.transfer.protection=authentication,privacy, which specifies the data transfer protection level asauthenticationandprivacy.
- With Kerberos authentication:
Notice
When you use an object storage path, make sure that the values of the parameters in the object storage path are composed of uppercase and lowercase letters, numbers, and the following characters: /-_$+= and wildcard characters. If the path contains other characters, the setting may fail.
Define FORMAT
FORMAT = ( TYPE = 'CSV'... )specifies the CSV format for external files. The parameters are as follows:TYPE: the type of the external file.LINE_DELIMITER: the line delimiter for the CSV file. The default value is'\n'.FIELD_DELIMITER: the field delimiter for the CSV file. The default value is'\t'.ESCAPE: the escape character for the CSV file, which can be only 1 byte in length. The default value is'\'.FIELD_OPTIONALLY_ENCLOSED_BY: the characters that enclose the field values in the CSV file. The default value is an empty string.
Notice
When the external data file contains
NULLvalues (not the string NULL and not "NULL"), you must explicitly configure theFIELD_OPTIONALLY_ENCLOSED_BYparameter, and its value cannot be empty.ENCODING: the character set encoding format of the file. For the character sets supported in Oracle-compatible mode, see Character set and collation. If this parameter is not specified, UTF8MB4 is used by default.NULL_IF: the strings to be treated asNULLvalues. The default value is an empty string.SKIP_HEADER: specifies to skip the file header, and specifies the number of lines to skip.SKIP_BLANK_LINES: specifies whether to skip blank lines. The default value isFALSE, which specifies not to skip blank lines.TRIM_SPACE: specifies whether to remove leading and trailing spaces from fields in the file. The default value isFALSE, which specifies not to remove leading and trailing spaces.EMPTY_FIELD_AS_NULL: specifies whether to treat empty strings asNULLvalues. The default value isFALSE, which specifies not to treat empty strings asNULLvalues.
FORMAT = ( TYPE = 'PARQUET'... )specifies the Parquet format for external files.FORMAT = ( TYPE = 'ORC'... )specifies the ORC format for external files.
(Optional) Define PATTERN
The PATTERN option specifies a regular pattern string to filter files in the LOCATION directory. For each file under the LOCATION directory, if the file path matches the pattern string, the external table accesses the file. Otherwise, the external table skips the file. If this parameter is not specified, the external table defaults to accessing all files under the LOCATION directory. The external table stores the list of files that match the LOCATION directory path specified by PATTERN in the system table of the database. During a scan, the external table accesses external files based on this list.
(Optional) Define partitions of the external table
Automatically define partitions of the external table
The external table automatically defines partitions based on the expression of the partitioning key and calculates and adds the partitions. You can specify the value or range of the partitioning key in queries. In this case, the partitions are pruned, and the external table reads files only in the specified partitions.
Manually define partitions of the external table
If you want to manually add and drop partitions without letting the external table automatically manage partitions, you must set PARTITION_TYPE = USER_SPECIFIED.
Example 1
Notice
The commands in the example are desensitized. Replace the IP address with the real IP address of your server during verification.
The following example describes how to create an external table in the Oracle-compatible mode of OceanBase Database. The steps are as follows:
Create an external file.
Execute the following command to create a file named
test_tbl1.csvin the/home/admin/external_csvdirectory on the server where you want to log in to the OBServer node.[admin@xxx /home/admin/external_csv]# vi test_tbl1.csvThe content of the file is as follows:
1,'Emma' 2,'William' 3,'Olivia'Set the path of the imported file.
Notice
For security reasons, you can only connect to the database through a local socket to execute the SQL statement for modifying the global variable
secure_file_priv. For more information, see secure_file_priv.Execute the following command to log in to the server where the OBServer node is located.
ssh admin@10.10.10.1Execute the following command to connect to the
oracle001tenant through a local Unix socket.obclient -S /home/admin/oceanbase/run/sql.sock -usys@oracle001 -p******Execute the following SQL command to set the import path to
/home/admin/external_csv.SET GLOBAL secure_file_priv = "/home/admin/external_csv";
Reconnect to the
oracle001tenant.Here is the sample code:
obclient -h10.10.10.1 -P2881 -usys@oracle001 -p****** -AExecute the following SQL command to create an external table named
test_tbl1_csv.CREATE EXTERNAL TABLE test_tbl1_csv ( id INT, name VARCHAR(50) ) LOCATION = '/home/admin/external_csv' FORMAT = ( TYPE = 'CSV' FIELD_DELIMITER = ',' FIELD_OPTIONALLY_ENCLOSED_BY ='''' ) PATTERN = 'test_tbl1.csv';Execute the following SQL command to view the data of the external table
test_tbl1_csv.SELECT * FROM test_tbl1_csv;The return result is as follows:
+------+---------+ | ID | NAME | +------+---------+ | 1 | Emma | | 2 | William | | 3 | Olivia | +------+---------+ 3 rows in set
Example 2
To create an HDFS external table, make sure that OceanBase Database is used in an environment with JAVA SDK. For more information about deploying the JAVA SDK environment, see Deploy the Java SDK environment for OceanBase Database.
Below are examples of creating an HDFS external table:
With Kerberos:
CREATE EXTERNAL TABLE ext_data(ID NUMBER(32), NAME VARCHAR2(30),SCORE NUMBER(32))
FORMAT = (
TYPE = 'CSV'
FIELD_DELIMITER = ','
FIELD_OPTIONALLY_ENCLOSED_BY ='"'
)
LOCATION = 'hdfs://localhost:8020/user?principal=principal_str&keytab=/path/to/keytab&krb5conf=/path/to/krb5conf_file&configs=xxx=xxx#xxx=xxx'
PATTERN = 'data.csv';
where:
principal: specifies the login authentication user.keytab: specifies the user authentication key file.krb5conf: specifies the description file for the Kerberos environment.configs: specifies the additional HDFS configurations. By default, it is empty. However, in a Kerberos environment, this parameter is usually configured. For example,dfs.data.transfer.protection=authentication,privacyspecifies the data transfer protection levels asauthenticationandprivacy.
Without Kerberos:
CREATE EXTERNAL TABLE ext_data(ID NUMBER(32), NAME VARCHAR2(30),SCORE NUMBER(32))
FORMAT = (
TYPE = 'CSV'
FIELD_DELIMITER = ','
FIELD_OPTIONALLY_ENCLOSED_BY ='"'
)
LOCATION = 'hdfs://localhost:8020/user'
PATTERN = 'test_tbl1.csv';
References
For more information about how to view and modify the properties of external files, see Manage external files.