Typically, table data in a database is stored within the database's own storage space. In contrast, the data for an external table is stored in external storage services, such as a local file system, HDFS, OSS, S3, or other object storage compatible with the S3 protocol.
When creating an external table, you need to explicitly define the path to the data files and specify the file format. After that, you can access the external data using standard SQL queries, just like you would with a regular table.
Limitations
- External tables are read-only and support
SELECTqueries, but do not support DML operations likeINSERT,UPDATE, orDELETE. - You cannot define constraints such as
DEFAULT,NOT NULL,UNIQUE,CHECK,PRIMARY KEY, orFOREIGN KEYon external tables. - Creating indexes is not supported.
- Because external tables involve remote I/O and parsing overhead, accessing them is usually slower than querying regular tables.
Create a file external table
Syntax
CREATE EXTERNAL TABLE <table_name>
(
[ <col_name> <col_type> [AS <expr>] ]
[ , <col_name> <col_type> [AS <expr>] ]
[ , ... ]
)
[PARTITION_TYPE = USER_SPECIFIED]
LOCATION = '<string>'
[AUTO_REFRESH = { IMMEDIATE | OFF | INTERVAL }]
FORMAT = (
TYPE = '<string>',
LINE_DELIMITER = '<string>' | <expr>,
FIELD_DELIMITER = '<string>' | <expr>,
ESCAPE = '<character>' | <expr>,
FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>,
ENCODING = 'UTF8MB4|GBK|GB18030|BINARY',
NULL_IF = ('<string>' | <expr>, '<string>' | <expr> ...),
SKIP_HEADER = <int>,
SKIP_BLANK_LINES = { TRUE | FALSE },
TRIM_SPACE = { TRUE | FALSE },
EMPTY_FIELD_AS_NULL = { TRUE | FALSE },
COMPRESSION = AUTO | GZIP | ZSTD | DEFLATE | NONE
)
[ PATTERN = '<regex_pattern>' ]
[ PARTITION BY ( <part_col_name> [, <part_col_name> ... ] ) ]
Column definition
The columns of an external table must be defined with data types, but no constraints such as DEFAULT, NOT NULL, UNIQUE, CHECK, PRIMARY KEY, or FOREIGN KEY can be defined.
Default column mapping
By default, the data columns in the file are automatically mapped to the columns defined in the external table in sequence:
- The first column of the external table → the first column of the file
- The second column of the external table → the second column of the file
- And so on
Example:
CREATE EXTERNAL TABLE ext_t1 (
c1 INT,
c2 INT
)
...
For a CSV file, c1 is automatically mapped to the first column of the file, and c2 is mapped to the second column.
Manual column mapping
When the order of the columns in the file does not match the table definition, you can explicitly specify the mapping relationship by using the pseudo column metadata$filecol{N} (N starts from 1):
CREATE EXTERNAL TABLE ext_t1 (
c1 INT AS (metadata$filecol2), -- The c1 column of the external table ext_t1 corresponds to the second column of the file.
c2 INT AS (metadata$filecol4) -- The c2 column of the external table ext_t1 corresponds to the fourth column of the file.
)
...
Notice
If manual mapping is used, all columns must be explicitly defined with mapping. Default sequential mapping will no longer be applied.
LOCATION
LOCATION specifies the storage path of the external data files. The system automatically recursively scans all files in the directory and its subdirectories.
Local LOCATION format
LOCATION = '[file://] local_file_path'
local_file_pathcan be a relative or absolute path.- The base directory for a relative path is the installation directory of the OBServer.
You must specify a directory, not a single file. If you need to read a single file, specify its parent directory and use
PATTERNto filter it.The value of
local_file_pathmust be a subdirectory of the value of thesecure_file_privparameter.
Remote LOCATION format
LOCATION = '{oss|cos|s3}://$ACCESS_ID:$ACCESS_KEY@$HOST/remote_file_path'
$ACCESS_ID,$ACCESS_KEY, and$HOSTare the credentials required to access the cloud storage.- Sensitive information such as AccessKey is encrypted and stored in the system table by OceanBase, so you do not need to expose it in plain text.
FORMAT
CSV format
FORMAT = (
TYPE = 'CSV',
LINE_DELIMITER = '<string>' | <expr>,
FIELD_DELIMITER = '<string>' | <expr>,
ESCAPE = '<character>' | <expr>,
FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>,
ENCODING = 'UTF8MB4|GBK|GB18030|BINARY',
NULL_IF = ('<string>' | <expr>, '<string>' | <expr> ...),
SKIP_HEADER = <int>,
SKIP_BLANK_LINES = { TRUE | FALSE },
TRIM_SPACE = { TRUE | FALSE },
EMPTY_FIELD_AS_NULL = { TRUE | FALSE },
COMPRESSION = AUTO | GZIP | ZSTD | DEFLATE | NONE
)
Description of CSV format: All columns in the file are considered STRING type. During queries, the system converts the data to the column types defined in the external table.
Parquet format/ORC format
-- Specify the Parquet format.
FORMAT = ( TYPE = 'PARQUET' )
-- Specify the ORC format.
FORMAT = ( TYPE = 'ORC' )
- You only need to specify
TYPE = 'PARQUET'orTYPE = 'ORC'. - The schema is automatically inferred from the metadata of the Parquet file.
Note
In MySQL-compatible mode of OceanBase Database, the ZEROFILL column attribute is not supported for external tables.
Type mapping (Parquet/ORC → OceanBase)
Special notes for Hive Parquet:
- The BOOL type is not supported.
- The BINARY type is not supported in Oracle-compatible mode.
For more information, see Data type mapping in MySQL-compatible mode and Data type mapping in Oracle-compatible mode.
PATTERN
[ PATTERN = '<regex_pattern>' ]
- Specify a regular expression to filter files in the
LOCATIONdirectory. - Only files that match the regular expression are read; unmatched files are skipped.
- If you do not specify
PATTERN, all files in the directory are read by default.
Partitioning
Automatic partitioning
Automatic partitioning is implemented by using the PARTITION BY clause in combination with metadata$fileurl.
metadata$fileurl: the relative path of the file to which the current row belongs, withLOCATIONas the root.The partitioning expression must include
metadata$fileurl. The system dynamically calculates the partition based on the value of this expression.
Example: Partition by the date string in the log path
CREATE EXTERNAL TABLE t0 (
c0 VARCHAR(500) AS (NVL(METADATA$FILECOL1, '')),
c1 FLOAT(5,4) AS (METADATA$FILECOL2),
date_part VARCHAR(100) AS (SUBSTR(METADATA$fileurl, INSTR(METADATA$fileurl, '2024'), 10)),
path VARCHAR(100) AS (METADATA$fileurl)
)
LOCATION = '~/log'
FORMAT = (TYPE = 'CSV')
PARTITION BY (date_part);
- When a new file is added, the system automatically assigns it to the corresponding partition based on the
date_partexpression during queries. - If the expression calculation fails (for example, due to a type conversion error), an error is returned.
Manual partitioning
Manual partitioning is enabled by using PARTITION_TYPE = USER_SPECIFIED. In this mode:
- The system does not automatically discover partitions.
- You must explicitly manage partitions by using
ALTER EXTERNAL TABLE ... ADD/DROP PARTITION LOCATION.
Example of creating a table:
CREATE EXTERNAL TABLE table_name (
date_part VARCHAR(100) AS (METADATA$PARTITION_LIST_COL1),
col2 INT AS (METADATA$filecol2)
)
PARTITION_TYPE = USER_SPECIFIED
AUTO_REFRESH = OFF
LOCATION = '~/log'
FORMAT = (TYPE = 'CSV')
PARTITION BY (date_part);
METADATA$PARTITION_LIST_COL1is a placeholder pseudo column for the partitioning column.- Initially, the external table does not have any file metadata, so you need to manually add partitions.
Adding a partition:
ALTER EXTERNAL TABLE table_name
ADD PARTITION (date_part = '2024-06')
LOCATION '2024/06';
- All files under
~/log/2024/06are included in thedate_part = '2024-06'partition. - A partition can be bound to only one
LOCATION, but a singleLOCATIONcan be bound to multiple partitions, which may result in data duplication.
Dropping a partition:
ALTER EXTERNAL TABLE table_name DROP PARTITION LOCATION '2022/02';
- All associated partitions and their file metadata under this path are deleted.
Pseudo columns
External tables support the following three pseudo columns:
| Pseudo column | Description |
|---|---|
METADATA$FILECOL{N} |
Maps to the Nth column (N ≥ 1) of the file. |
METADATA$FILEURL |
The relative path of the file to which the current row belongs. The relative path is based on the root specified by the location clause in the CREATE TABLE statement. |
METADATA$PARTITION_LIST_COL{N} |
A placeholder that represents the Nth partitioning key in manual partitioning mode. |
External file management
External tables store a list of files that meet the PATTERN condition under the LOCATION directory in the system table. When queries are executed, the external files are accessed based on this list. The file list can be automatically or manually updated.
View external table files
- MySQL-compatible tenant:
SELECT * FROM oceanbase.DBA_OB_EXTERNAL_TABLE_FILES
WHERE table_schema = 'DATABASE4' AND table_name = 'T0';
- Oracle-compatible tenant:
SELECT * FROM DBA_OB_EXTERNAL_TABLE_FILES
WHERE owner = 'DATABASE4' AND table_name = 'T0';
- SYS tenant:
SELECT * FROM oceanbase.CDB_OB_EXTERNAL_TABLE_FILES;
Manually update the file list
ALTER EXTERNAL TABLE <table_name> REFRESH;
Automatic update strategy (AUTO_REFRESH)
| Strategy | Description |
|---|---|
IMMEDIATE |
Automatically refreshes the file list before each query. |
OFF |
Never automatically refreshes the file list. Manual refresh is supported. |
INTERVAL |
Sets a scheduled task to periodically refresh the file list using DBMS_EXTERNAL_TABLE.AUTO_REFRESH_EXTERNAL_TABLE(x), where x is the interval in seconds. |