External tables|V4.6.0|OceanBase Database| docs|Distributed Database

External tables

Last Updated：2026-05-07 11:26:24 Updated

Typically, data in a database table is stored in the database's own storage space, while data in an external table is stored in an external storage service (such as a local file system, HDFS, OSS, S3, or an object storage service compatible with the S3 protocol).

When you create an external table, you must explicitly define the data file path and file format. After that, you can access the external data using standard SQL query statements, just like querying a regular table.

Limitations

External tables are read-only and support SELECT operations only. DML operations such as INSERT, UPDATE, and DELETE are not supported.
Constraints such as DEFAULT, NOT NULL, UNIQUE, CHECK, PRIMARY KEY, and FOREIGN KEY cannot be defined.
Indexes cannot be created.
Access to external tables is generally slower than access to regular tables due to the overhead of remote I/O and parsing.

Create an external table based on a file

Syntax

CREATE EXTERNAL TABLE <table_name>
(
    [ <col_name> <col_type> [AS <expr>] ]
    [ , <col_name> <col_type> [AS <expr>] ]
    [ , ... ]
)
[PARTITION_TYPE = USER_SPECIFIED]
LOCATION = '<string>'
[AUTO_REFRESH = { IMMEDIATE | OFF | INTERVAL }]
FORMAT = (
    TYPE = '<string>',
    LINE_DELIMITER = '<string>' | <expr>,
    FIELD_DELIMITER = '<string>' | <expr>,
    ESCAPE = '<character>' | <expr>,
    FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>,
    ENCODING = 'UTF8MB4|GBK|GB18030|BINARY',
    NULL_IF = ('<string>' | <expr>, '<string>' | <expr> ...),
    SKIP_HEADER = <int>,
    SKIP_BLANK_LINES = { TRUE | FALSE },
    TRIM_SPACE = { TRUE | FALSE },
    EMPTY_FIELD_AS_NULL = { TRUE | FALSE },
    COMPRESSION = {GZIP | ZSTD | DEFLATE | NONE | SNAPPY_BLOCK}
    PARALLEL_PARSE_ON_SINGLE_FILE = {TRUE | FALSE}
    PARALLEL_PARSE_FILE_SIZE_THRESHOLD = <int>
    MAX_ROW_LENGTH = <int>
)
[ PATTERN = '<regex_pattern>' ]
[ PARTITION BY ( <part_col_name> [, <part_col_name> ... ] ) ]

Column definitions

For an external table, you must define data types for columns, but you cannot define constraints such as DEFAULT, NOT NULL, UNIQUE, CHECK, PRIMARY KEY, or FOREIGN KEY.

Default column mapping

By default, the data columns in the file are automatically mapped to the columns defined in the external table in the same order:

The first column in the external table corresponds to the first column in the file.
The second column in the external table corresponds to the second column in the file.
And so on.

Example:

CREATE EXTERNAL TABLE ext_t1 (
    c1 INT,
    c2 INT
)
...

For a CSV file, c1 is automatically mapped to the first column in the file, and c2 is mapped to the second column.

Manual column mapping

If the order of columns in the file is different from the order of columns in the external table, you can explicitly specify the mapping by using the pseudo column metadata$filecol{N} (N starts at 1):

CREATE EXTERNAL TABLE ext_t1 (
    c1 INT AS (metadata$filecol2),  -- The c1 column in the ext_t1 table corresponds to the second column in the file.
    c2 INT AS (metadata$filecol4)   -- The c2 column in the ext_t1 table corresponds to the fourth column in the file.
)
...

Notice

Once you use manual mapping, you must explicitly define mappings for all columns. The default sequential mapping no longer applies.

LOCATION

LOCATION specifies the storage path of the external data file. The system automatically recursively scans all files in the directory and its subdirectories.

Local LOCATION format

LOCATION = '[file://] local_file_path'

local_file_path can be a relative or absolute path. For a relative path, the root directory is the installation directory of the OBServer node.
You must specify a directory, not a single file. If you want to read a single file, specify its parent directory and use PATTERN to filter the file.
The secure_file_priv parameter restricts the use of local_file_path. local_file_path must be a subdirectory of the directory specified by secure_file_priv.

Remote LOCATION format

LOCATION = '{oss|cos|s3}://$ACCESS_ID:$ACCESS_KEY@$HOST/remote_file_path'

$ACCESS_ID, $ACCESS_KEY, and $HOST are the credentials required to access cloud storage.
Sensitive information such as AccessKey is encrypted and stored in system tables by OceanBase Database. You do not need to expose it in plain text.

FORMAT

CSV format

FORMAT = (
    TYPE = 'CSV',
    LINE_DELIMITER = '<string>' | <expr>,
    FIELD_DELIMITER = '<string>' | <expr>,
    ESCAPE = '<character>' | <expr>,
    FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>,
    ENCODING = 'UTF8MB4|GBK|GB18030|BINARY',
    NULL_IF = ('<string>' | <expr>, '<string>' | <expr> ...),
    SKIP_HEADER = <int>,
    SKIP_BLANK_LINES = { TRUE | FALSE },
    TRIM_SPACE = { TRUE | FALSE },
    EMPTY_FIELD_AS_NULL = { TRUE | FALSE },
    COMPRESSION = {GZIP | ZSTD | DEFLATE | NONE | SNAPPY_BLOCK}
    PARALLEL_PARSE_ON_SINGLE_FILE = {TRUE | FALSE}
    PARALLEL_PARSE_FILE_SIZE_THRESHOLD = <int>
    MAX_ROW_LENGTH = <int>
)

Description of CSV format: All columns in the file are considered to be of the STRING type, and they are converted to the column types defined in the external table during queries.

Parquet format/ORC format

-- Specify the Parquet format.
FORMAT = ( TYPE = 'PARQUET' )

-- Specify the ORC format.
FORMAT = ( TYPE = 'ORC' )

You must specify TYPE = 'PARQUET' or TYPE = 'ORC'.
The schema is automatically inferred from the metadata of the Parquet file.

Note

In MySQL mode of OceanBase Database, the ZEROFILL column attribute is not supported for external tables.

Type mapping (Parquet/ORC → OceanBase Database)

For Hive Parquet files:

The BOOL type is not supported.
The BINARY type is not supported in Oracle mode.

For more information, see Data type mapping in MySQL mode and Data type mapping in Oracle mode.

PATTERN

[ PATTERN = '<regex_pattern>' ]

Specifies a regular expression to filter files in the LOCATION directory.
Only files that match the regular expression are read. Files that do not match the regular expression are skipped.
If you do not specify PATTERN, all files in the directory are read by default.

Partitions

Automatic partitioning

Automatic partitioning is implemented by using the PARTITION BY clause in combination with metadata$fileurl.

metadata$fileurl: the relative path of the file to which the current row belongs, with LOCATION as the root.
The partitioning expression must include metadata$fileurl. The system dynamically calculates the partition based on the value of this expression.

Example: Partition by date string in the log path

CREATE EXTERNAL TABLE t0 (
    c0 VARCHAR(500) AS (NVL(METADATA$FILECOL1, '')),
    c1 FLOAT(5,4) AS (METADATA$FILECOL2),
    date_part VARCHAR(100) AS (SUBSTR(METADATA$fileurl, INSTR(METADATA$fileurl, '2024'), 10)),
    path VARCHAR(100) AS (METADATA$fileurl)
)
LOCATION = '~/log'
FORMAT = (TYPE = 'CSV')
PARTITION BY (date_part);

When a new file is added, queries automatically assign the file to the corresponding partition based on the date_part expression.
If the expression calculation fails (such as due to a type conversion error), an error is returned.

Manual partitioning

Manual partitioning is enabled by using PARTITION_TYPE = USER_SPECIFIED. In this mode:

The external table does not automatically discover partitions.
You must explicitly manage partitions by using ALTER EXTERNAL TABLE ... ADD/DROP PARTITION LOCATION.

Example of creating an external table:

CREATE EXTERNAL TABLE table_name (
    date_part VARCHAR(100) AS (METADATA$PARTITION_LIST_COL1),
    col2 INT AS (METADATA$filecol2)
)
PARTITION_TYPE = USER_SPECIFIED
AUTO_REFRESH = OFF
LOCATION = '~/log'
FORMAT = (TYPE = 'CSV')
PARTITION BY (date_part);

METADATA$PARTITION_LIST_COL1 is a placeholder pseudo column for the partitioning column.
Initially, the external table has no file metadata, so you must manually add partitions.

Adding partitions:

ALTER EXTERNAL TABLE table_name
ADD PARTITION (date_part = '2024-06')
LOCATION '2024/06';

All files under ~/log/2024/06 are added to the date_part = '2024-06' partition.
A partition can be bound to only one LOCATION, but a single LOCATION can be bound to multiple partitions, which may result in data duplication.

Dropping partitions:

ALTER EXTERNAL TABLE table_name DROP PARTITION LOCATION '2022/02';

All associated partitions and their file metadata under this path are deleted.

Pseudocolumns

External tables support the following pseudocolumns:

Pseudocolumn	Description
`METADATA$FILECOL{N}`	Maps to the Nth column of the file (N ≥ 1)
`METADATA$FILEURL`	The relative path of the file to which the current row belongs (relative to the root specified in the location clause of the CREATE TABLE statement)
`METADATA$PARTITION_LIST_COL{N}`	A placeholder indicating the Nth partition key in manual partitioning mode

External file management

External tables store the list of files that match the PATTERN under the LOCATION in the system table. During queries, these files are accessed based on this list. The file list supports both automatic and manual updates.

View external table files

MySQL tenant:

SELECT * FROM oceanbase.DBA_OB_EXTERNAL_TABLE_FILES
WHERE table_schema = 'DATABASE4' AND table_name = 'T0';

Oracle mode tenant:

SELECT * FROM DBA_OB_EXTERNAL_TABLE_FILES
WHERE owner = 'DATABASE4' AND table_name = 'T0';

System tenant:

SELECT * FROM oceanbase.CDB_OB_EXTERNAL_TABLE_FILES;

Manually update the file list

ALTER EXTERNAL TABLE <table_name> REFRESH;

Automatic update strategy (AUTO_REFRESH)

Strategy	Description
`IMMEDIATE`	Automatically refreshes the file list before each query
`OFF`	Never automatically refreshes; only manual refresh is supported
`INTERVAL`	Sets a scheduled task using `DBMS_EXTERNAL_TABLE.AUTO_REFRESH_EXTERNAL_TABLE(x)` (unit: seconds)

OceanBase

Customer Stories

Documentation