Typically, table data in a database is stored in the database's own storage space. However, for external tables, the data is stored in external storage services such as a local file system, HDFS, OSS, S3, or an object storage service compatible with the S3 protocol. When creating an external table, you need to explicitly define the data file path and file format. After that, you can access external data using standard SQL query statements, just like querying a regular table.
Limitations
- External tables are read-only and support
SELECT, but do not support DML operations (INSERT,UPDATE, orDELETE). - Constraints (
DEFAULT,NOT NULL,UNIQUE,CHECK,PRIMARY KEY, orFOREIGN KEY) cannot be defined. - Indexes cannot be created.
- Access to external tables is generally slower than access to regular tables due to remote I/O and parsing overhead.
Create a file external table
Syntax
CREATE EXTERNAL TABLE <table_name>
(
[ <col_name> <col_type> [AS <expr>] ]
[ , <col_name> <col_type> [AS <expr>] ]
[ , ... ]
)
[PARTITION_TYPE = USER_SPECIFIED]
LOCATION = '<string>'
[AUTO_REFRESH = { IMMEDIATE | OFF | INTERVAL }]
FORMAT = (
TYPE = '<string>',
LINE_DELIMITER = '<string>' | <expr>,
FIELD_DELIMITER = '<string>' | <expr>,
ESCAPE = '<character>' | <expr>,
FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>,
ENCODING = 'UTF8MB4|GBK|GB18030|BINARY',
NULL_IF = ('<string>' | <expr>, '<string>' | <expr> ...),
SKIP_HEADER = <int>,
SKIP_BLANK_LINES = { TRUE | FALSE },
TRIM_SPACE = { TRUE | FALSE },
EMPTY_FIELD_AS_NULL = { TRUE | FALSE },
COMPRESSION = {GZIP | ZSTD | DEFLATE | NONE | SNAPPY_BLOCK}
PARALLEL_PARSE_ON_SINGLE_FILE = {TRUE | FALSE}
PARALLEL_PARSE_FILE_SIZE_THRESHOLD = <int>
MAX_ROW_LENGTH = <int>
)
[ PATTERN = '<regex_pattern>' ]
[ PARTITION BY ( <part_col_name> [, <part_col_name> ... ] ) ]
Column definition
The columns of an external table must be defined with data types, but cannot be defined with any constraints (DEFAULT, NOT NULL, UNIQUE, CHECK, PRIMARY KEY, or FOREIGN KEY).
Default column mapping
By default, the data columns in the file are automatically mapped to the columns defined in the external table in order:
- The first column in the external table corresponds to the first column in the file.
- The second column in the external table corresponds to the second column in the file.
- And so on.
Example:
CREATE EXTERNAL TABLE ext_t1 (
c1 INT,
c2 INT
)
...
For a CSV file, c1 is automatically mapped to the first column, and c2 is mapped to the second column.
Manual column mapping
When the order of the file columns does not match the table definition, you can explicitly specify the mapping relationship by using the pseudo column metadata$filecol{N} (N starts at 1):
CREATE EXTERNAL TABLE ext_t1 (
c1 INT AS (metadata$filecol2), -- The c1 column in the external table ext_t1 corresponds to the second column in the file.
c2 INT AS (metadata$filecol4) -- The c2 column in the external table ext_t1 corresponds to the fourth column in the file.
)
...
Notice
If you use manual mapping, you must explicitly define the mapping for all columns. Default sequential mapping will no longer apply.
LOCATION
LOCATION specifies the storage path of the external data file. The system automatically recursively scans all files in the directory and its subdirectories.
Local LOCATION format
LOCATION = '[file://] local_file_path'
local_file_pathcan be a relative or absolute path.- The base directory for a relative path is the installation directory of the OBServer node.
You must specify a directory, not a single file. If you want to read a single file, specify its parent directory and use
PATTERNto filter.secure_file_privlimits the use oflocal_file_path.local_file_pathmust be a subdirectory of thesecure_file_privconfiguration path.
Remote LOCATION format
LOCATION = '{oss|cos|s3}://$ACCESS_ID:$ACCESS_KEY@$HOST/remote_file_path'
$ACCESS_ID,$ACCESS_KEY, and$HOSTare the credentials required to access cloud storage.- Sensitive information such as AccessKey is encrypted and stored in system tables by OceanBase Database. You do not need to expose it in plaintext.
FORMAT
CSV format
FORMAT = (
TYPE = 'CSV',
LINE_DELIMITER = '<string>' | <expr>,
FIELD_DELIMITER = '<string>' | <expr>,
ESCAPE = '<character>' | <expr>,
FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>,
ENCODING = 'UTF8MB4|GBK|GB18030|BINARY',
NULL_IF = ('<string>' | <expr>, '<string>' | <expr> ...),
SKIP_HEADER = <int>,
SKIP_BLANK_LINES = { TRUE | FALSE },
TRIM_SPACE = { TRUE | FALSE },
EMPTY_FIELD_AS_NULL = { TRUE | FALSE },
COMPRESSION = {GZIP | ZSTD | DEFLATE | NONE | SNAPPY_BLOCK}
PARALLEL_PARSE_ON_SINGLE_FILE = {TRUE | FALSE}
PARALLEL_PARSE_FILE_SIZE_THRESHOLD = <int>
MAX_ROW_LENGTH = <int>
)
Description of CSV type: All columns in the file are considered to be of the STRING type. During queries, the system converts the data to the column types defined in the external table.
Parquet format/ORC format
-- Parquet format
FORMAT = ( TYPE = 'PARQUET' )
-- ORC format
FORMAT = ( TYPE = 'ORC' )
- Specify only
TYPE = 'PARQUET'orTYPE = 'ORC'. - The schema is automatically inferred from the metadata of the Parquet file.
Note
In MySQL mode of OceanBase Database, external tables do not support the ZEROFILL column attribute.
Type mapping (Parquet/ORC → OceanBase Database)
Special notes for Hive Parquet:
- The BOOL type is not supported.
- The BINARY type is not supported in Oracle mode.
For more information, see Data type mapping in MySQL mode and Data type mapping in Oracle mode.
PATTERN
[ PATTERN = '<regex_pattern>' ]
- Specify a regular expression to filter files in the
LOCATIONdirectory. - Only files that match the regular expression will be read. Files that do not match will be skipped.
- If you do not specify
PATTERN, all files in the directory will be read by default.
Partitioning
Automatic partitioning
Automatic partitioning is implemented by using the PARTITION BY clause in combination with metadata$fileurl.
metadata$fileurl: the relative path of the file to which the current row belongs, withLOCATIONas the root.The partitioning expression must include
metadata$fileurl. The system dynamically calculates the partition based on the value of this expression.
Example: Partition by the date string in the log path
CREATE EXTERNAL TABLE t0 (
c0 VARCHAR(500) AS (NVL(METADATA$FILECOL1, '')),
c1 FLOAT(5,4) AS (METADATA$FILECOL2),
date_part VARCHAR(100) AS (SUBSTR(METADATA$fileurl, INSTR(METADATA$fileurl, '2024'), 10)),
path VARCHAR(100) AS (METADATA$fileurl)
)
LOCATION = '~/log'
FORMAT = (TYPE = 'CSV')
PARTITION BY (date_part);
- When new files are added, queries will automatically assign them to the corresponding partition based on the
date_partexpression. - If the expression calculation fails (such as due to a type conversion error), an error will be returned.
Manual partitioning
Manual partitioning is enabled by using PARTITION_TYPE = USER_SPECIFIED. In this mode:
- The external table does not automatically discover partitions.
- You must explicitly manage partitions by using
ALTER EXTERNAL TABLE ... ADD/DROP PARTITION LOCATION.
Example of creating a table:
CREATE EXTERNAL TABLE table_name (
date_part VARCHAR(100) AS (METADATA$PARTITION_LIST_COL1),
col2 INT AS (METADATA$filecol2)
)
PARTITION_TYPE = USER_SPECIFIED
AUTO_REFRESH = OFF
LOCATION = '~/log'
FORMAT = (TYPE = 'CSV')
PARTITION BY (date_part);
METADATA$PARTITION_LIST_COL1is a placeholder pseudo column for the partitioning column.- Initially, the external table does not have file metadata, so you need to manually add partitions.
Adding a partition:
ALTER EXTERNAL TABLE table_name
ADD PARTITION (date_part = '2024-06')
LOCATION '2024/06';
- Add all files under
~/log/2024/06to thedate_part = '2024-06'partition. - A partition can only be bound to one
LOCATION, but oneLOCATIONcan be bound to multiple partitions, resulting in data duplication.
Dropping a partition:
ALTER EXTERNAL TABLE table_name DROP PARTITION LOCATION '2022/02';
- Drop all associated partitions and their file metadata under this path.
Pseudo columns
External tables support the following three pseudo columns:
| Pseudo column | Description |
|---|---|
METADATA$FILECOL{N} |
Maps to the Nth column (N ≥ 1) in the file |
METADATA$FILEURL |
The relative path of the file to which the current row belongs. The relative path is based on the root specified in the LOCATION clause of the CREATE TABLE statement. |
METADATA$PARTITION_LIST_COL{N} |
A placeholder for the Nth partition key in manual partitioning mode |
External file management
External tables store a list of files in the LOCATION directory that match the PATTERN in the system table. During queries, external files are accessed based on this list. The file list supports automatic or manual updates.
View external table files
- MySQL tenant:
SELECT * FROM oceanbase.DBA_OB_EXTERNAL_TABLE_FILES
WHERE table_schema = 'DATABASE4' AND table_name = 'T0';
- Oracle mode tenant:
SELECT * FROM DBA_OB_EXTERNAL_TABLE_FILES
WHERE owner = 'DATABASE4' AND table_name = 'T0';
- System tenant:
SELECT * FROM oceanbase.CDB_OB_EXTERNAL_TABLE_FILES;
Manually update the file list
ALTER EXTERNAL TABLE <table_name> REFRESH;
Automatic update strategy (AUTO_REFRESH)
| Strategy | Description |
|---|---|
IMMEDIATE |
Automatically refreshes the file list before each query |
OFF |
Never automatically refreshes; only manual refresh is supported |
INTERVAL |
Sets a scheduled task using DBMS_EXTERNAL_TABLE.AUTO_REFRESH_EXTERNAL_TABLE(x) (unit: seconds) |
