This topic describes how to use an SQL statement to create an external table.
Overview
An external table is a logical table object. Its data is stored in an external storage system instead of the database.
For more information, see Overview of external tables.
Prerequisites
Before you create an external table, make sure that:
You have logged on to a MySQL tenant of OceanBase Database. For more information about how to connect to a database, see Connection methods.
Note
You can query the
oceanbase.DBA_OB_TENANTSview in thesystenant to confirm the mode of the tenant to which you have logged on.You have the
CREATEprivilege. For more information about how to view your privileges, see View user privileges.
Considerations
External tables support only query operations, but not DML operations.
When you query an external table, if the external file accessed by the table is deleted, the system does not generate an error but returns empty rows.
Files accessed by an external table are managed by an external storage system. When the external storage system is unavailable, an error occurs when you query the external table.
Because data of an external table is stored in an external data source, queries may need to be performed across networks or file systems, which may affect query performance. Therefore, you must select an appropriate data source and optimization strategy when you create an external table to improve query efficiency.
Syntax
The syntax of CREATE EXTERNAL TABLE is as follows:
CREATE EXTERNAL TABLE table_name (column_options [AS (metadata$filecol{N})])
LOCATION = '<string>'
FORMAT = (format_options)
[PATTERN = '<regex_pattern>'];
;
where
table_namespecifies the name of the external table.column_optionsspecifies the column definition of the external table, including the column name and data type.[AS (metadata$filecol{N})]is used to manually define column mappings, which is optional.Note
By default, data columns in the external file automatically map columns defined for the external table in sequence. That is, the first column of the external table is also the first column in the external file. When the column order in the external file is different from that in the external table, you can use a pseudo column like
metadata$filecol{N}to map a column in the external table to the Nth column in the external file. The columns in the file are numbered from 1.LOCATIONspecifies the path for storing the external file.FORMATspecifies the format of the external file.PATTERNspecifies a regular pattern string for filtering files in theLOCATIONdirectory.
For more information about the CREATE EXTERNAL TABLE statement, see CREATE EXTERNAL TABLE.
Define the external table name
When you create an external table, you must name it. To distinguish between external tables and regular tables, we recommend that you follow specific naming rules or use a prefix or suffix when you name an external table. For example, you can use the suffix _csv when you name an external table.
Here is an example:
Create an external table named students_csv for storing student information.
CREATE EXTERNAL TABLE students_csv external_options
Notice
The preceding SQL statement cannot be executed because other properties of the external table are not added.
Define columns
Constraints such as DEFAULT, NOT NULL, UNIQUE, CHECK, PRIMARY KEY, and FOREIGN KEY cannot be defined for columns of external tables.
The column types supported by external tables are the same as those supported by regular tables. For more information about the data types supported in the MySQL mode of OceanBase Database, see Overview of data types.
Define LOCATION
The LOCATION option specifies the path for storing the files of an external table. Generally, the data files of an external table are stored in a separate directory. The folder can contain subdirectories. When you create an external table, the table automatically collects all files in the specified directory.
OceanBase Database supports the following two path formats:
Local location format:
LOCATION = '[file://] local_file_path'Notice
When you use the local location format, you must set the system variable
secure_file_privto specify an accessible path. For more information, see secure_file_priv.Remote location format:
LOCATION = '{oss|cos}://$ACCESS_ID:$ACCESS_KEY@$HOST/remote_file_path'In this format,
$ACCESS_ID,$ACCESS_KEY, and$HOSTare required for accessing Alibaba Cloud Object Storage Service (OSS) or Tencent Cloud Object Storage (COS). These sensitive access information is encrypted and stored in the system table of the database.
Define FORMAT
The FORMAT option specifies the format of an external file. The option contains the following parameters:
TYPE: the type of the external file. Only the CSV type is supported.LINE_DELIMITER: the line delimiter for the CSV file. The default value is'\n'.FIELD_DELIMITER: the column delimiter for the CSV file. The default value is'\t'.ESCAPE: the escape character for the CSV file, which can be only 1 byte in length. The default value is'\'.FIELD_OPTIONALLY_ENCLOSED_BY: the characters that enclose the field values in the CSV file. The default value is none.ENCODING: the character set encoding used by the file. For more information about all character sets supported in MySQL mode, see Character sets. If this parameter is not specified, the default valueUTF8MB4takes effect.NULL_IF: the strings to be treated asNULLvalues. If you do not set this parameter, all strings are treated as valid non-NULL values and loaded to the external table.SKIP_HEADER: specifies to skip the file header, and specifies the number of lines to skip.SKIP_BLANK_LINES: specifies whether to skip blank lines. The default value isFALSE, which indicates not to skip blank lines.TRIM_SPACE: specifies whether to remove leading and trailing spaces from fields in the file. The default value isFALSE, which indicates not to remove leading and trailing spaces from fields in the file.EMPTY_FIELD_AS_NULL: specifies whether to treat empty strings asNULLvalues. The default value isFALSE, which indicates not to treat empty strings asNULLvalues.
(Optional) Define PATTERN
The PATTERN option specifies a regular pattern string for filtering files in the LOCATION directory. For each file in the LOCATION directory, if the file path matches the pattern string, the external table accesses the file. Otherwise, the external table skips the file. If this parameter is not specified, all files in the LOCATION directory are accessible by default. The external table stores the list of the files that match PATTERN in the system table of the database. During the scan, the external table accesses external files based on this list. The file list can be updated automatically or manually.
Examples
Notice
IP addresses used in these examples are desensitized. Enter your actual server IP addresses during verification.
The following example shows how to create an external table whose data files are stored in a local directory in the MySQL mode of OceanBase Database.
Create an external file.
Create a file named
test_tbl1.csvin the/home/admindirectory of the server where the OBServer node you want to log on to resides.[admin@xxx /home/admin]# vi test_tbl1.csvThe file content is as follows:
1,'Emma','2021-09-01' 2,'William','2021-09-02' 3,'Olivia','2021-09-03'Set the import path of the external file.
Notice
For security reasons, you can only connect to the database through a local socket to execute the SQL statement for setting
secure_file_priv. For more information, see secure_file_priv.Log on to the server where the OBServer node resides.
ssh admin@10.10.10.1Connect to the tenant
mysql001through a local Unix socket.obclient -S /home/admin/oceanbase/run/sql.sock -uroot@mysql001 -p******Set the import path to
/home/admin.SET GLOBAL secure_file_priv = "/home/admin";
Reconnect to the tenant
mysql001.obclient -h10.10.10.1 -P2881 -uroot@mysql001 -p****** -A -DtestCreate an external table named
test_tbl1_csv.CREATE EXTERNAL TABLE test_tbl1_csv ( id INT, name VARCHAR(50), c_date DATE ) LOCATION = '/home/admin' FORMAT = ( TYPE = 'CSV' FIELD_DELIMITER = ',' FIELD_OPTIONALLY_ENCLOSED_BY ='\'' ) PATTERN = 'test_tbl1.csv';View data of the external table
test_tbl1_csv.SELECT * FROM test_tbl1_csv;The result is as follows:
+------+---------+------------+ | id | name | c_date | +------+---------+------------+ | 1 | Emma | 2021-09-01 | | 2 | William | 2021-09-02 | | 3 | Olivia | 2021-09-03 | +------+---------+------------+ 3 rows in set