This topic describes how to create an external table by using SQL statements. It also describes the prerequisites, overview, and considerations for creating an external table, and provides examples.
Overview
An external table is a logical table object. Its data is stored in an external storage system instead of the database.
For more information, see Overview of external tables.
Prerequisites
Before you create an external table, make sure that the following conditions are met:
You have deployed an OceanBase cluster and created a MySQL tenant. For more information, see Deployment overview.
You have connected to the MySQL tenant of OceanBase Database. For more information, see Overview of connection methods.
You have created a database. For more information about how to create a database, see Create a database.
You have the
CREATEprivilege. For more information about how to view your privileges, see View user privileges. If you do not have the required privileges, contact the administrator to obtain the privileges. For more information, see Grant direct privileges.
Considerations
An external table can only be queried, and DML operations are not supported.
If the external file accessed by the external table is deleted, the system does not return an error but instead returns empty rows.
If the external storage system is unavailable, an error occurs when you query the external table.
The query performance may be affected by network factors and file system factors because the data of external tables is stored in external data sources. Therefore, when you create an external table, select an appropriate data source and optimize the query statement to improve the query efficiency.
Create an external table
Use the CREATE EXTERNAL TABLE statement to create an external table.
Name an external table
When you create an external table, you must name it. To distinguish between external tables and regular tables, we recommend that you follow specific naming rules or use a prefix or suffix when you name an external table. For example, you can use the suffix _csv when you name an external table.
Here is an example:
When you create an external table that stores student information, you can name the table students_csv.
CREATE EXTERNAL TABLE students_csv external_options
Notice
The preceding SQL statement cannot be executed because no other attributes are specified.
Define columns
Constraints such as DEFAULT, NOT NULL, UNIQUE, CHECK, PRIMARY KEY, and FOREIGN KEY cannot be defined for columns of external tables.
The column types supported by external tables are the same as those supported by regular tables. For more information about the data types supported in the MySQL mode of OceanBase Database, see Overview of data types.
Define LOCATION
The LOCATION option specifies the path for storing the files of an external table. Generally, the data files of an external table are stored in a separate directory. The folder can contain subdirectories. When you create an external table, the table automatically collects all files in the specified directory.
OceanBase Database supports the following two path formats:
Local location format:
LOCATION = '[file://] local_file_path'Notice
If you use the local path format, set the system variable
secure_file_privto the path that can be accessed. For more information, see secure_file_priv.Remote path format:
LOCATION = '{oss|cos|S3}://$ACCESS_ID:$ACCESS_KEY@$HOST:s3_region/remote_file_path'In this format,
$ACCESS_ID,$ACCESS_KEY, and$HOSTare the access information required for accessing Alibaba Cloud Object Storage Service (OSS), Tencent Cloud Object Storage (COS), or Amazon Simple Storage Service (Amazon S3). Here,s3_regionis the region information selected when you use S3. These sensitive access information is encrypted and stored in the system tables of the database.
Define FORMAT
FORMAT = ( TYPE = 'CSV'... )specifies the CSV format for external files. The parameters are as follows:TYPE: the type of the external file.LINE_DELIMITER: the line delimiter for the CSV file. The default value isLINE_DELIMITER='\n'.FIELD_DELIMITER: the column delimiter for the CSV file. The default value isFIELD_DELIMITER='\t'.ESCAPE: the escape character for the CSV file. It can be only 1 byte. The default value isESCAPE ='\'.FIELD_OPTIONALLY_ENCLOSED_BY: the character that encloses field values in the CSV file. The default value is an empty string.ENCODING: the character set encoding format of the file. For more information, see Character sets. If not specified, the default value UTF8MB4 is used.NULL_IF: the strings that are treated asNULL. The default value is an empty string.SKIP_HEADER: the number of header lines to skip.SKIP_BLANK_LINES: specifies whether to skip blank lines. The default value isFALSE, indicating that blank lines are not skipped.TRIM_SPACE: specifies whether to remove leading and trailing spaces from fields in the file. The default value is FALSE, meaning that leading and trailing spaces in the fields are not removed.EMPTY_FIELD_AS_NULL: specifies whether to treat empty strings asNULL. The default value isFALSE, indicating that empty strings are not treated asNULL.
FORMAT = ( TYPE = 'PARQUET'... ): specifies the external file format to be PARQUET.FORMAT = ( TYPE = 'ORC'... ): specifies the external file format to be ORC.
(Optional) Define PATTERN
The PATTERN option specifies a regular pattern string for filtering files in the LOCATION directory. For each file in the LOCATION directory, if the file path matches the pattern string, the external table accesses the file. Otherwise, the external table skips the file. By default, if this parameter is not specified, all files in the LOCATION directory are accessible. The external table stores the list of the files that match PATTERN in the system table of the database. During the scan, the external table accesses external files based on this list.
(Optional) Define partitions of the external table
Automatically define partitions of the external table
The external table calculates and adds partitions based on the expression defined by the partitioning key. You can specify the value or range of the partitioning key in the query statement. In this case, partitions are pruned, and the external table reads only the files in the specified partition.
Manually define partitions of the external table
If you want to manually add and delete partitions, you can set PARTITION_TYPE to USER_SPECIFIED so that the external table does not automatically manage partitions.
Example
Notice
IP addresses used in these examples are desensitized. Enter your actual server IP addresses during verification.
The following example demonstrates how to create external tables in the MySQL mode of OceanBase Database using local or remote files. The steps are as follows:
Create an external file.
Execute the following command to create a file named
test_tbl1.csvin the/home/admindirectory of the server where you want to log in to.code-placeholder-f1adaf1c-459b-b63e-fb4046e4fdfa
The content of the file is as follows:
1,'Emma','2021-09-01' 2,'William','2021-09-02' 3,'Olivia','2021-09-03'Set the path of the imported file.
Notice
For security reasons, when you set the system variable
secure_file_priv, you can connect to the database only through a local socket to execute the SQL statement that modifies the global variable. For more information, see secure_file_priv.Execute the following command to log in to the server where the OBServer node is located.
ssh admin@10.10.10.1Connect to the
mysql001tenant through a local Unix socket.obclient -S /home/admin/oceanbase/run/sql.sock -uroot@mysql001 -p******Execute the following SQL command to set the import path to
/home/admin.SET GLOBAL secure_file_priv = "/home/admin";
Reconnect to the
mysql001tenant.Here is an example:
obclient -h10.10.10.1 -P2881 -uroot@mysql001 -p****** -A -DtestExecute the following SQL command to create an external table named
test_tbl1_csv.CREATE EXTERNAL TABLE test_tbl1_csv ( id INT, name VARCHAR(50), c_date DATE ) LOCATION = '/home/admin' FORMAT = ( TYPE = 'CSV' FIELD_DELIMITER = ',' FIELD_OPTIONALLY_ENCLOSED_BY ='\'' ) PATTERN = 'test_tbl1.csv';Execute the following SQL command to view the data of the external table named
test_tbl1_csv.SELECT * FROM test_tbl1_csv;The return result is as follows:
+------+---------+------------+ | id | name | c_date | +------+---------+------------+ | 1 | Emma | 2021-09-01 | | 2 | William | 2021-09-02 | | 3 | Olivia | 2021-09-03 | +------+---------+------------+ 3 rows in set
References
- You can drop an external table the same way you drop a normal table. For more information, see Drop a table.
- For more information about how to view and update files accessible to external tables, see Manage external files.
