This topic describes the definition of external tables, as well as the prerequisites, considerations, and examples for creating an external table by using an SQL statement.
About external tables
An external table is a logical table object. Its data is stored in an external storage system instead of the database.
Prerequisites
Before you create an external table, make sure that:
You have deployed an OceanBase cluster and created a MySQL-compatible tenant.
You have connected to the MySQL-compatible tenant of OceanBase Cloud.
You have created a database.
You have the
CREATEprivilege. If you do not have the required privileges, contact the administrator to obtain the privileges.
Considerations
External tables support only query operations, but not DML operations.
When you query an external table, if the external file accessed by the table is deleted, the system does not generate an error but returns empty rows.
Files accessed by an external table are managed by an external storage system. When the external storage system is unavailable, an error occurs when you query the external table.
Because data of an external table is stored in an external data source, queries may need to be performed across networks or file systems, which may affect query performance. Therefore, you must select an appropriate data source and optimization strategy when you create an external table to improve query efficiency.
Create an external table by using a statement
Use the CREATE EXTERNAL TABLE statement to create an external table.
Define the external table name
When you create an external table, you must define a name for it. To distinguish between external tables and regular tables, we recommend that you follow specific naming rules or use a prefix or suffix when you name an external table. For example, you can use the suffix _csv when you name an external table.
Here is an example:
Create an external table of student information and name it as students_csv.
CREATE EXTERNAL TABLE students_csv external_options
Notice
The preceding SQL statement cannot be executed because other attributes are not added for the external table.
Define columns
Constraints such as DEFAULT, NOT NULL, UNIQUE, CHECK, PRIMARY KEY, and FOREIGN KEY cannot be defined for columns of external tables.
The column types supported by external tables are the same as those supported by regular tables.
Define LOCATION
The LOCATION option specifies the path for storing the files of an external table. Generally, the data files of an external table are stored in a separate directory. The folder can contain subdirectories. When you create an external table, the table automatically collects all files in the specified directory.
OceanBase Cloud supports the following two path formats:
Local location format:
LOCATION = '[file://] local_file_path'Notice
When you use the local location format, you must set the system variable
secure_file_privto specify an accessible path.Remote location format:
LOCATION = '{oss|cos|S3}://$ACCESS_ID: $ACCESS_KEY@$HOST/remote_file_path'In this format,
$ACCESS_ID,$ACCESS_KEY, and$HOSTare required for accessing Alibaba Cloud Object Storage Service (OSS), Tencent Cloud Object Storage (COS), or Amazon Simple Storage Service (S3). Such sensitive access information is encrypted and stored in the system table of the database.
Define FORMAT
- The
FORMAT = ( TYPE = 'CSV'... )option specifies the format of a CSV external file. The option has the following parameters:TYPE: the type of the external file.LINE_DELIMITER: the line delimiter for the CSV file. The default value is'\n'.FIELD_DELIMITER: the field delimiter for the CSV file. The default value is'\t'.ESCAPE: the escape character for the CSV file, which can be only 1 byte in length. The default value is'\'.FIELD_OPTIONALLY_ENCLOSED_BY: the characters that enclose the field values in the CSV file. The default value is none.ENCODING: the character set encoding used by the file. If this parameter is not specified, the default valueUTF8MB4takes effect.NULL_IF: the strings to be treated asNULLvalues. If you do not set this parameter, all strings are treated as valid non-NULL values and loaded to the external table.SKIP_HEADER: specifies to skip the file header, and specifies the number of lines to skip.SKIP_BLANK_LINES: specifies whether to skip blank lines. The default value isFALSE, which specifies not to skip blank lines.TRIM_SPACE: specifies whether to remove leading and trailing spaces from fields in the file. The default value isFALSE, which specifies not to remove leading and trailing spaces from fields in the file.EMPTY_FIELD_AS_NULL: specifies whether to treat empty strings asNULLvalues. The default value isFALSE, which specifies not to treat empty strings asNULLvalues.
- The
FORMAT = ( TYPE = 'PARQUET'... )option specifies the format of a PARQUET external file.
(Optional) Define PATTERN
The PATTERN option specifies a regular pattern string for filtering files in the LOCATION directory. For each file in the LOCATION directory, if the file path matches the pattern string, the external table accesses the file. Otherwise, the external table skips the file. By default, if this parameter is not specified, all files in the LOCATION directory are accessible. The external table stores the list of the files that match PATTERN in the system table of the database. During the scan, the external table accesses external files based on this list.
(Optional) Define partitions of the external table
Automatically define partitions
The external table calculates and adds partitions based on the expression defined by the partitioning key. You can specify the value or range of the partitioning key in the query statement. In this case, partitions are pruned, and the external table reads only the files in the specified partition.
Manually define partitions
If you want to manually add and delete partitions, you can set PARTITION_TYPE to USER_SPECIFIED so that the external table does not automatically manage partitions.
Examples
Notice
IP addresses used in these examples are desensitized. Enter your actual server IP addresses during verification.
The following example shows how to create an external table whose data files are stored in a local directory in the MySQL compatible mode of OceanBase Cloud.
Create an external file.
Create a file named
test_tbl1.csvin the/home/admindirectory of the server where the OBServer node you want to log on to resides.[admin@xxx /home/admin]# vi test_tbl1.csvThe file content is as follows:
1,'Emma','2021-09-01' 2,'William','2021-09-02' 3,'Olivia','2021-09-03'Set the import path of the external file.
Notice
For security reasons, you can only connect to the database through a local socket to execute the SQL statement for setting
secure_file_priv.Log on to the server where the OBServer node resides.
ssh admin@10.10.10.1Connect to the
mysql001tenant through a local Unix socket.obclient -S /home/admin/oceanbase/run/sql.sock -uroot@mysql001 -p******Set the import path to
/home/admin.SET GLOBAL secure_file_priv = "/home/admin";
Reconnect to the
mysql001tenant.Here is an example:
obclient -h10.10.10.1 -P2881 -uroot@mysql001 -p****** -A -DtestCreate an external table named
test_tbl1_csv.CREATE EXTERNAL TABLE test_tbl1_csv ( id INT, name VARCHAR(50), c_date DATE ) LOCATION = '/home/admin' FORMAT = ( TYPE = 'CSV' FIELD_DELIMITER = ',' FIELD_OPTIONALLY_ENCLOSED_BY ='\'' ) PATTERN = 'test_tbl1.csv';View data in the external table
test_tbl1_csv.SELECT * FROM test_tbl1_csv;The return result is as follows:
+------+---------+------------+ | id | name | c_date | +------+---------+------------+ | 1 | Emma | 2021-09-01 | | 2 | William | 2021-09-02 | | 3 | Olivia | 2021-09-03 | +------+---------+------------+ 3 rows in set