Typically, table data in a database is stored in the database's own storage space. However, data in an external table is stored in an external storage service, such as a local file system, HDFS, OSS, S3, or an object storage service compatible with the S3 protocol. When you create an external table, you must specify the data file path and file format. After that, you can use standard SQL queries to access external data, just like querying a regular table.
Limitations
- External tables are read-only and support the
SELECTstatement, but do not support DML operations such asINSERT,UPDATE, andDELETE. - Constraints such as
DEFAULT,NOT NULL,UNIQUE,CHECK,PRIMARY KEY, andFOREIGN KEYcannot be defined. - Indexes cannot be created.
- Access to external tables is generally slower than access to regular tables due to remote I/O and parsing overhead.
Create a file external table
Syntax
CREATE EXTERNAL TABLE <table_name>
(
[ <col_name> <col_type> [AS <expr>] ]
[ , <col_name> <col_type> [AS <expr>] ]
[ , ... ]
)
[PARTITION_TYPE = USER_SPECIFIED]
LOCATION = '<string>'
[AUTO_REFRESH = { IMMEDIATE | OFF | INTERVAL }]
FORMAT = (
TYPE = '<string>',
LINE_DELIMITER = '<string>' | <expr>,
FIELD_DELIMITER = '<string>' | <expr>,
ESCAPE = '<character>' | <expr>,
FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>,
ENCODING = 'UTF8MB4|GBK|GB18030|BINARY',
NULL_IF = ('<string>' | <expr>, '<string>' | <expr> ...),
SKIP_HEADER = <int>,
SKIP_BLANK_LINES = { TRUE | FALSE },
TRIM_SPACE = { TRUE | FALSE },
EMPTY_FIELD_AS_NULL = { TRUE | FALSE },
COMPRESSION = AUTO | GZIP | ZSTD | DEFLATE | NONE
)
[ PATTERN = '<regex_pattern>' ]
[ PARTITION BY ( <part_col_name> [, <part_col_name> ... ] ) ]
Column definition
The columns of an external table must be defined with data types, but cannot be defined with constraints such as DEFAULT, NOT NULL, UNIQUE, CHECK, PRIMARY KEY, and FOREIGN KEY.
Default column mapping
By default, the data columns in the file are automatically mapped to the columns defined in the external table in sequence:
- Column 1 of the external table → Column 1 of the file
- Column 2 of the external table → Column 2 of the file
- And so on
Example:
CREATE EXTERNAL TABLE ext_t1 (
c1 INT,
c2 INT
)
...
For a CSV file, c1 is automatically mapped to the first column of the file, and c2 is mapped to the second column.
Manual column mapping
When the order of the file columns does not match the table definition, you can explicitly specify the mapping relationship by using the pseudo column metadata$filecol{N} (N starts from 1):
CREATE EXTERNAL TABLE ext_t1 (
c1 INT AS (metadata$filecol2), -- Column c1 of the external table ext_t1 corresponds to column 2 of the file.
c2 INT AS (metadata$filecol4) -- Column c2 of the external table ext_t1 corresponds to column 4 of the file.
)
...
Notice
Once manual mapping is used, all columns must be explicitly defined with mappings. The default sequential mapping will no longer be applied.
LOCATION
LOCATION specifies the storage path of the external data file. The system automatically recursively scans all files in the directory and its subdirectories.
Local LOCATION format
LOCATION = '[file://] local_file_path'
local_file_pathcan be a relative path or an absolute path.- The base directory for a relative path is the installation directory of the OBServer node.
You must specify a directory, not a single file. If you need to read a single file, specify its parent directory and use
PATTERNto filter it.The
secure_file_privparameter restricts the use oflocal_file_path.local_file_pathmust be a subdirectory of thesecure_file_privconfiguration path.
Remote LOCATION format
LOCATION = '{oss|cos|s3}://$ACCESS_ID:$ACCESS_KEY@$HOST/remote_file_path'
$ACCESS_ID,$ACCESS_KEY, and$HOSTare the credentials required to access cloud storage.- Sensitive information such as AccessKey is encrypted and stored in system tables by OceanBase Database, so it does not need to be exposed in plaintext.
FORMAT
CSV format
FORMAT = (
TYPE = 'CSV',
LINE_DELIMITER = '<string>' | <expr>,
FIELD_DELIMITER = '<string>' | <expr>,
ESCAPE = '<character>' | <expr>,
FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>,
ENCODING = 'UTF8MB4|GBK|GB18030|BINARY',
NULL_IF = ('<string>' | <expr>, '<string>' | <expr> ...),
SKIP_HEADER = <int>,
SKIP_BLANK_LINES = { TRUE | FALSE },
TRIM_SPACE = { TRUE | FALSE },
EMPTY_FIELD_AS_NULL = { TRUE | FALSE },
COMPRESSION = AUTO | GZIP | ZSTD | DEFLATE | NONE
)
Description of CSV format: All columns in the file are considered to be of the STRING type. During queries, the system converts the data to the column types defined in the external table.
Parquet format/ORC format
-- Specify the Parquet format.
FORMAT = ( TYPE = 'PARQUET' )
-- Specify the ORC format.
FORMAT = ( TYPE = 'ORC' )
- You only need to specify
TYPE = 'PARQUET'orTYPE = 'ORC'. - The schema is automatically inferred from the metadata of the Parquet file.
Note
In MySQL mode of OceanBase Database, external tables do not support the ZEROFILL column attribute.
Type mapping (Parquet/ORC → OceanBase Database)
Special notes on Hive Parquet:
- The BOOL type is not supported.
- The BINARY type is not supported in Oracle mode.
For more information, see Data type mapping in MySQL mode and Data type mapping in Oracle mode.
PATTERN
[ PATTERN = '<regex_pattern>' ]
- Specify a regular expression to filter files in the
LOCATIONdirectory. - Only files that match the regular expression will be read; unmatched files will be skipped.
- If
PATTERNis not specified, all files in the directory will be read by default.
Partitioning
Automatic partitioning
Automatic partitioning is implemented by using the PARTITION BY clause in combination with metadata$fileurl.
metadata$fileurl: The relative path of the file to which the current row belongs, withLOCATIONas the root.The partitioning expression must include
metadata$fileurl. The system dynamically calculates the partition based on the value of this expression.
Example: Partition by date string in the log path
CREATE EXTERNAL TABLE t0 (
c0 VARCHAR(500) AS (NVL(METADATA$FILECOL1, '')),
c1 FLOAT(5,4) AS (METADATA$FILECOL2),
date_part VARCHAR(100) AS (SUBSTR(METADATA$fileurl, INSTR(METADATA$fileurl, '2024'), 10)),
path VARCHAR(100) AS (METADATA$fileurl)
)
LOCATION = '~/log'
FORMAT = (TYPE = 'CSV')
PARTITION BY (date_part);
- When a new file is added, the system automatically places it into the corresponding partition based on the
date_partexpression during queries. - If the expression calculation fails (such as due to a type conversion error), an error will be returned.
Manual partitioning
Manual partitioning is enabled by using PARTITION_TYPE = USER_SPECIFIED. In this mode:
- The external table does not automatically discover partitions.
- Users must explicitly manage partitions by using
ALTER EXTERNAL TABLE ... ADD/DROP PARTITION LOCATION.
Example of creating a table:
CREATE EXTERNAL TABLE table_name (
date_part VARCHAR(100) AS (METADATA$PARTITION_LIST_COL1),
col2 INT AS (METADATA$filecol2)
)
PARTITION_TYPE = USER_SPECIFIED
AUTO_REFRESH = OFF
LOCATION = '~/log'
FORMAT = (TYPE = 'CSV')
PARTITION BY (date_part);
METADATA$PARTITION_LIST_COL1is a placeholder pseudo column for the partitioning column.- Initially, the external table does not have any file metadata, so partitions must be added manually.
Adding a partition:
ALTER EXTERNAL TABLE table_name
ADD PARTITION (date_part = '2024-06')
LOCATION '2024/06';
- All files under
~/log/2024/06are included in thedate_part = '2024-06'partition. - A partition can only be bound to one
LOCATION; however, a singleLOCATIONcan be bound to multiple partitions, which may result in duplicate data.
Dropping a partition:
ALTER EXTERNAL TABLE table_name DROP PARTITION LOCATION '2022/02';
- Drops all associated partitions and their file metadata under this path.
Pseudo columns
External tables support the following three pseudo columns:
Pseudo column |
Description |
|---|---|
METADATA$FILECOL{N} |
Maps to the Nth column (N ≥ 1) in the file. |
METADATA$FILEURL |
The relative path of the file to which the current row belongs. The relative path is based on the location specified in the CREATE TABLE statement. |
METADATA$PARTITION_LIST_COL{N} |
A placeholder that indicates the Nth partition key in manual partitioning mode. |
External file management
External tables store the list of files that match the PATTERN under the LOCATION in the system table. When queries are executed, they access the external files based on this list. The file list can be automatically or manually updated.
View external table files
- MySQL tenant:
SELECT * FROM oceanbase.DBA_OB_EXTERNAL_TABLE_FILES
WHERE table_schema = 'DATABASE4' AND table_name = 'T0';
- Oracle tenant:
SELECT * FROM DBA_OB_EXTERNAL_TABLE_FILES
WHERE owner = 'DATABASE4' AND table_name = 'T0';
- SYS tenant:
SELECT * FROM oceanbase.CDB_OB_EXTERNAL_TABLE_FILES;
Manually update the file list
ALTER EXTERNAL TABLE <table_name> REFRESH;
Automatic update strategy (AUTO_REFRESH)
Strategy |
Description |
|---|---|
IMMEDIATE |
Automatically refreshes the file list before each query. |
OFF |
Never automatically refreshes. Only manual refresh is supported. |
INTERVAL |
Sets a scheduled task to refresh the file list at specified intervals (unit: seconds) by using DBMS_EXTERNAL_TABLE.AUTO_REFRESH_EXTERNAL_TABLE(x). |
