Typically, data in a database table is stored in the database's own storage space, while data in an external table is stored in an external storage service (such as a local file system, HDFS, OSS, S3, or an object storage service compatible with the S3 protocol).
When you create an external table, you must explicitly define the data file path and file format. After that, you can access the external data using standard SQL query statements, just like querying a regular table.
Limitations
- External tables are read-only and support
SELECToperations only. DML operations such asINSERT,UPDATE, andDELETEare not supported. - Constraints such as
DEFAULT,NOT NULL,UNIQUE,CHECK,PRIMARY KEY, andFOREIGN KEYcannot be defined. - Indexes cannot be created.
- Access to external tables is generally slower than access to regular tables due to the overhead of remote I/O and parsing.
Create an external table based on a file
Syntax
CREATE EXTERNAL TABLE <table_name>
(
[ <col_name> <col_type> [AS <expr>] ]
[ , <col_name> <col_type> [AS <expr>] ]
[ , ... ]
)
[PARTITION_TYPE = USER_SPECIFIED]
LOCATION = '<string>'
[AUTO_REFRESH = { IMMEDIATE | OFF | INTERVAL }]
FORMAT = (
TYPE = '<string>',
LINE_DELIMITER = '<string>' | <expr>,
FIELD_DELIMITER = '<string>' | <expr>,
ESCAPE = '<character>' | <expr>,
FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>,
ENCODING = 'UTF8MB4|GBK|GB18030|BINARY',
NULL_IF = ('<string>' | <expr>, '<string>' | <expr> ...),
SKIP_HEADER = <int>,
SKIP_BLANK_LINES = { TRUE | FALSE },
TRIM_SPACE = { TRUE | FALSE },
EMPTY_FIELD_AS_NULL = { TRUE | FALSE },
COMPRESSION = {GZIP | ZSTD | DEFLATE | NONE | SNAPPY_BLOCK}
PARALLEL_PARSE_ON_SINGLE_FILE = {TRUE | FALSE}
PARALLEL_PARSE_FILE_SIZE_THRESHOLD = <int>
MAX_ROW_LENGTH = <int>
)
[ PATTERN = '<regex_pattern>' ]
[ PARTITION BY ( <part_col_name> [, <part_col_name> ... ] ) ]
Column definitions
For an external table, you must define data types for columns, but you cannot define constraints such as DEFAULT, NOT NULL, UNIQUE, CHECK, PRIMARY KEY, or FOREIGN KEY.
Default column mapping
By default, the data columns in the file are automatically mapped to the columns defined in the external table in the same order:
- The first column in the external table corresponds to the first column in the file.
- The second column in the external table corresponds to the second column in the file.
- And so on.
Example:
CREATE EXTERNAL TABLE ext_t1 (
c1 INT,
c2 INT
)
...
For a CSV file, c1 is automatically mapped to the first column in the file, and c2 is mapped to the second column.
Manual column mapping
If the order of columns in the file is different from the order of columns in the external table, you can explicitly specify the mapping by using the pseudo column metadata$filecol{N} (N starts at 1):
CREATE EXTERNAL TABLE ext_t1 (
c1 INT AS (metadata$filecol2), -- The c1 column in the ext_t1 table corresponds to the second column in the file.
c2 INT AS (metadata$filecol4) -- The c2 column in the ext_t1 table corresponds to the fourth column in the file.
)
...
Notice
Once you use manual mapping, you must explicitly define mappings for all columns. The default sequential mapping no longer applies.
LOCATION
LOCATION specifies the storage path of the external data file. The system automatically recursively scans all files in the directory and its subdirectories.
Local LOCATION format
LOCATION = '[file://] local_file_path'
local_file_pathcan be a relative or absolute path. For a relative path, the root directory is the installation directory of the OBServer node.You must specify a directory, not a single file. If you want to read a single file, specify its parent directory and use
PATTERNto filter the file.The
secure_file_privparameter restricts the use oflocal_file_path.local_file_pathmust be a subdirectory of the directory specified bysecure_file_priv.
Remote LOCATION format
LOCATION = '{oss|cos|s3}://$ACCESS_ID:$ACCESS_KEY@$HOST/remote_file_path'
$ACCESS_ID,$ACCESS_KEY, and$HOSTare the credentials required to access cloud storage.- Sensitive information such as AccessKey is encrypted and stored in system tables by OceanBase Database. You do not need to expose it in plain text.
FORMAT
CSV format
FORMAT = (
TYPE = 'CSV',
LINE_DELIMITER = '<string>' | <expr>,
FIELD_DELIMITER = '<string>' | <expr>,
ESCAPE = '<character>' | <expr>,
FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>,
ENCODING = 'UTF8MB4|GBK|GB18030|BINARY',
NULL_IF = ('<string>' | <expr>, '<string>' | <expr> ...),
SKIP_HEADER = <int>,
SKIP_BLANK_LINES = { TRUE | FALSE },
TRIM_SPACE = { TRUE | FALSE },
EMPTY_FIELD_AS_NULL = { TRUE | FALSE },
COMPRESSION = {GZIP | ZSTD | DEFLATE | NONE | SNAPPY_BLOCK}
PARALLEL_PARSE_ON_SINGLE_FILE = {TRUE | FALSE}
PARALLEL_PARSE_FILE_SIZE_THRESHOLD = <int>
MAX_ROW_LENGTH = <int>
)
Description of CSV format: All columns in the file are considered to be of the STRING type, and they are converted to the column types defined in the external table during queries.
Parquet format/ORC format
-- Specify the Parquet format.
FORMAT = ( TYPE = 'PARQUET' )
-- Specify the ORC format.
FORMAT = ( TYPE = 'ORC' )
- You must specify
TYPE = 'PARQUET'orTYPE = 'ORC'. - The schema is automatically inferred from the metadata of the Parquet file.
Note
In MySQL mode of OceanBase Database, the ZEROFILL column attribute is not supported for external tables.
Type mapping (Parquet/ORC → OceanBase Database)
For Hive Parquet files:
- The BOOL type is not supported.
- The BINARY type is not supported in Oracle mode.
For more information, see Data type mapping in MySQL mode and Data type mapping in Oracle mode.
PATTERN
[ PATTERN = '<regex_pattern>' ]
- Specifies a regular expression to filter files in the
LOCATIONdirectory. - Only files that match the regular expression are read. Files that do not match the regular expression are skipped.
- If you do not specify
PATTERN, all files in the directory are read by default.
Partitions
Automatic partitioning
Automatic partitioning is implemented by using the PARTITION BY clause in combination with metadata$fileurl.
metadata$fileurl: the relative path of the file to which the current row belongs, withLOCATIONas the root.The partitioning expression must include
metadata$fileurl. The system dynamically calculates the partition based on the value of this expression.
Example: Partition by date string in the log path
CREATE EXTERNAL TABLE t0 (
c0 VARCHAR(500) AS (NVL(METADATA$FILECOL1, '')),
c1 FLOAT(5,4) AS (METADATA$FILECOL2),
date_part VARCHAR(100) AS (SUBSTR(METADATA$fileurl, INSTR(METADATA$fileurl, '2024'), 10)),
path VARCHAR(100) AS (METADATA$fileurl)
)
LOCATION = '~/log'
FORMAT = (TYPE = 'CSV')
PARTITION BY (date_part);
- When a new file is added, queries automatically assign the file to the corresponding partition based on the
date_partexpression. - If the expression calculation fails (such as due to a type conversion error), an error is returned.
Manual partitioning
Manual partitioning is enabled by using PARTITION_TYPE = USER_SPECIFIED. In this mode:
- The external table does not automatically discover partitions.
- You must explicitly manage partitions by using
ALTER EXTERNAL TABLE ... ADD/DROP PARTITION LOCATION.
Example of creating an external table:
CREATE EXTERNAL TABLE table_name (
date_part VARCHAR(100) AS (METADATA$PARTITION_LIST_COL1),
col2 INT AS (METADATA$filecol2)
)
PARTITION_TYPE = USER_SPECIFIED
AUTO_REFRESH = OFF
LOCATION = '~/log'
FORMAT = (TYPE = 'CSV')
PARTITION BY (date_part);
METADATA$PARTITION_LIST_COL1is a placeholder pseudo column for the partitioning column.- Initially, the external table has no file metadata, so you must manually add partitions.
Adding partitions:
ALTER EXTERNAL TABLE table_name
ADD PARTITION (date_part = '2024-06')
LOCATION '2024/06';
- All files under
~/log/2024/06are added to thedate_part = '2024-06'partition. - A partition can be bound to only one
LOCATION, but a singleLOCATIONcan be bound to multiple partitions, which may result in data duplication.
Dropping partitions:
ALTER EXTERNAL TABLE table_name DROP PARTITION LOCATION '2022/02';
- All associated partitions and their file metadata under this path are deleted.
Pseudocolumns
External tables support the following pseudocolumns:
Pseudocolumn |
Description |
|---|---|
METADATA$FILECOL{N} |
Maps to the Nth column of the file (N ≥ 1) |
METADATA$FILEURL |
The relative path of the file to which the current row belongs (relative to the root specified in the location clause of the CREATE TABLE statement) |
METADATA$PARTITION_LIST_COL{N} |
A placeholder indicating the Nth partition key in manual partitioning mode |
External file management
External tables store the list of files that match the PATTERN under the LOCATION in the system table. During queries, these files are accessed based on this list. The file list supports both automatic and manual updates.
View external table files
- MySQL tenant:
SELECT * FROM oceanbase.DBA_OB_EXTERNAL_TABLE_FILES
WHERE table_schema = 'DATABASE4' AND table_name = 'T0';
- Oracle mode tenant:
SELECT * FROM DBA_OB_EXTERNAL_TABLE_FILES
WHERE owner = 'DATABASE4' AND table_name = 'T0';
- System tenant:
SELECT * FROM oceanbase.CDB_OB_EXTERNAL_TABLE_FILES;
Manually update the file list
ALTER EXTERNAL TABLE <table_name> REFRESH;
Automatic update strategy (AUTO_REFRESH)
Strategy |
Description |
|---|---|
IMMEDIATE |
Automatically refreshes the file list before each query |
OFF |
Never automatically refreshes; only manual refresh is supported |
INTERVAL |
Sets a scheduled task using DBMS_EXTERNAL_TABLE.AUTO_REFRESH_EXTERNAL_TABLE(x) (unit: seconds) |
