CREATE EXTERNAL TABLE|V4.4.2|OceanBase Database| docs|Distributed Database

CREATE EXTERNAL TABLE

Last Updated：2026-04-23 09:04:06 Updated

Purpose

This statement is used to create a new external table in the database.

External tables are a key feature in a database management system. Typically, data in a database is stored in the database's storage space, while data in an external table is stored in an external storage service.

When creating an external table, you need to define the file path and file format for the data. Once created, users can read data from the external storage service through the external table. External tables are read-only and can be used in query statements, but they do not support DML operations. External tables also do not support defining constraints or creating indexes.

Syntax

CREATE EXTERNAL TABLE table_name (column_definition_list)
    LOCATION = {'file_path' | @location_name['/path']}
    FORMAT = (format_type_options)
    [PARTITION BY (column_name [, column_name ...])]
    [PARTITION_TYPE = USER_SPECIFIED]
    [PATTERN = '<regex_pattern>']
    [AUTO_REFRESH = 'xxx'];

column_definition_list:
    column_definition [, column_definition ...]

column_definition:
    column_name column_type [AS expr]

format_type_options:
    type_csv_option
    | type_parquet_option
    | type_orc_option

type_csv_option:
    TYPE = 'CSV'
    LINE_DELIMITER = '<string>' | <expr>
    FIELD_DELIMITER = '<string>' | <expr>
    ESCAPE = '<character>' | <expr>
    FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>
    ENCODING = 'charset'
    NULL_IF = ('<string>' | <expr>, '<string>' | <expr> ...)
    SKIP_HEADER = <int>
    SKIP_BLANK_LINES = {TRUE | FALSE}
    TRIM_SPACE = {TRUE | FALSE}
    EMPTY_FIELD_AS_NULL = {TRUE | FALSE}
    IGNORE_LAST_EMPTY_COLUMN = {TRUE | FALSE}
    COMPRESSION = {GZIP | ZSTD | DEFLATE | NONE | SNAPPY_BLOCK}
    PARALLEL_PARSE_ON_SINGLE_FILE = {TRUE | FALSE}
    PARALLEL_PARSE_FILE_SIZE_THRESHOLD = <int>
    MAX_ROW_LENGTH = <int>

type_parquet_option:
    TYPE = 'PARQUET'

type_orc_option:
    TYPE = 'ORC'

Parameters

Parameter	Description
table_name	The name of the external table to be created.
column_name	The name of the column in the external table. By default, the data columns in the file are automatically mapped to the columns defined in the external table in sequence.
column_type	The data type of the column in the external table. Constraints such as `DEFAULT`, `NOT NULL`, `UNIQUE`, `CHECK`, `PRIMARY KEY`, and `FOREIGN KEY` cannot be defined.
AS expr	Used to manually specify column mapping. When the order of columns in the file does not match the order of columns defined in the external table, you can use the pseudo column `metadata$filecol{N}` to specify the mapping between the external table column and the Nth column in the file. For example, `c2 INT AS (metadata$filecol4)` indicates that the `c2` column in the external table corresponds to the 4th column in the file. Note that if manual column mapping is specified, automatic mapping will be disabled, and all columns must be manually defined.
LOCATION	Specifies the path where the external table files are stored. Typically, the data files of the external table are stored in a separate directory, which may contain subdirectories. When creating the table, the external table automatically collects all files in this directory. Valid values: `file_path`: the specific path of the external table file. For more information, see file_path below. `@location_name['/path']`: references a Location object to create the external table. `['/path']` is an optional parameter indicating a subdirectory. For more information about creating a Location object, see CREATE LOCATION. Note For OceanBase Database V4.4.x, the `@location_name[/'path']` parameter is supported starting from V4.4.1.
FORMAT = (format_type_options)	Specifies the properties of the external file format. Use `TYPE` to specify the export format as `CSV` or `PARQUET`. `TYPE` cannot be empty. For more information, see format_type_options below.
PATTERN	Specifies a regular expression pattern to filter files in the `LOCATION` directory. For each file path in the `LOCATION` directory, if it matches the pattern, the external table will access the file; otherwise, it will skip the file. If this parameter is not specified, all files in the `LOCATION` directory are accessible by default. The external table will save the list of files that match the `PATTERN` in the database system table. When scanning the external table, it will access the external files based on this list.
PARTITION_TYPE = USER_SPECIFIED	Specifies `PARTITION_TYPE = USER_SPECIFIED` if you want to manually add and remove partitions instead of letting the external table automatically manage partitions.
AUTO_REFRESH = 'xxx'	Specifies whether to enable automatic refresh for the external table. Valid values: `OFF`: the default value, indicating that automatic refresh is disabled. `INTERVAL`: allows the external table to be managed for automatic refresh rules using `dbms_external_table`. `IMMEDIATE`: indicates that the external table metadata must be refreshed every time an SQL query accesses the external table.

file_path

The file path for external tables can be in the following formats:

If the file is stored locally, the LOCATION format is: LOCATION = '[file://] local_file_path', where local_file_path can be a relative or absolute path. If you specify a relative path, the current directory must be the installation directory of OceanBase Database. The secure_file_priv parameter specifies the file paths that OBServer nodes can access. local_file_path must be a subpath of the secure_file_priv path.
If the file is stored remotely, the LOCATION format is:

Notice

When using an object storage path, the parameters are separated by the & symbol. Ensure that your input parameters contain only uppercase and lowercase letters, numbers, /-_$+=, and wildcards. If you input any other characters, the settings may fail.
- If the file is stored on Alibaba Cloud OSS or AWS S3, the format is: LOCATION = '{oss\|s3}://$ACCESS_ID:$ACCESS_KEY@$HOST:s3_region/remote_file_path', where $ACCESS_ID, $ACCESS_KEY, and $HOST are the access credentials required to access Alibaba Cloud OSS, AWS S3, or an object storage service compatible with the S3 protocol. s3_region specifies the region selected for S3. These sensitive access credentials are stored in the system tables of the database in an encrypted format.
- If the file is stored on HDFS, the format is as follows:
  - To access the cluster using the IP address of a single NameNode (NN), the format is: LOCATION = hdfs://localhost:port/PATH, where localhost is the IP address of HDFS, port is the port number of HDFS, and PATH is the file path in HDFS.
    - For Kerberos authentication, the format is: LOCATION = 'hdfs://localhost:port/user?principal=xxx&keytab=xxx&krb5conf=xxx&configs=xxx', where:
      - principal: the user for login and authentication.
      - keytab: the path to the user's authentication key file.
      - krb5conf: the path to the Kerberos environment description file.
      - configs: additional HDFS configuration items. By default, this parameter is empty. However, in a Kerberos environment, this parameter typically has a value and needs to be configured, for example: dfs.data.transfer.protection=authentication,privacy, which specifies the data transfer protection level as authentication and privacy.
  - To access the cluster using Hadoop High Availability (HA) logical naming services, the format is: LOCATION = hdfs://nameserviceID/PATH, where nameserviceID is the ID of the Hadoop HA logical naming service, and PATH is the file path.
    
    Note
    
    Ensure that the client-side configuration of the OBServer node includes the nameservice definition and failover strategy for the HA cluster.
    - For Kerberos authentication, the format is: LOCATION = 'hdfs://nameserviceID/PATH?principal=xxx&keytab=xxx&krb5conf=xxx&configs=dfs.data.transfer.protection=${string}#dfs.nameservices=${nameservice id}#dfs.ha.namenodes.${nameservice id}=${namenode1}, ${namenode2}#dfs.namenode.rpc-address.${nameservice id}.${namenode1}=${namenode 1 address}#dfs.namenode.rpc-address.${nameservice id}.${namenode2}=${namenode 2 address}#dfs.ha.automatic-failover.enabled.${nameservice id}=true#dfs.client.failover.proxy.provider.${nameservice id}=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider', where:
      - principal: the user for login and authentication, set to the pricipal of a non-primary NameNode.
      - keytab and krb5conf: same as for a single NameNode.
      - configs: additional HDFS configuration items. If multiple configuration items are needed, they are related to HA and security configurations:
        
        dfs.data.transfer.protection=${string}: aligns with the cluster's dfs.data.transfer.protection configuration.
        
        dfs.nameservices=${nameservice id}: specifies the namesevice (alias) of the current HA cluster.
        
        dfs.ha.namenodes.${nameservice id}=${namenode1}, ${namenode2}: lists the standby IDs of the HA cluster's namenodes.
        
        dfs.namenode.rpc-address.${nameservice id}.${namenode1}=${namenode 1 address}: specifies the specific namenode for namenode1 to facilitate client routing.
        
        dfs.namenode.rpc-address.${nameservice id}.${namenode2}=${namenode 2 address}: specifies the specific namenode for namenode2 to facilitate client routing.
        
        dfs.ha.automatic-failover.enabled.${nameservice id}=true: enables the HA cluster to automatically obtain a available namenode to respond to requests.
        
        dfs.client.failover.proxy.provider.${nameservice id}=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider: specifies the logical tool class for primary/standby switching in the HA cluster. You can also customize and upload the required logic for the HA cluster.
    Notice
    
    HA-related configuration items are bound to the namespace. For example, in the following sample three, the mycluster is used. Ensure that the related configuration items are set correctly.

format_type_options

TYPE = 'CSV': specifies the external file format as CSV. It also includes the following fields:
- LINE_DELIMITER: specifies the line delimiter for the CSV file.
- FIELD_DELIMITER: specifies the column delimiter for the CSV file.
- ESCAPE: specifies the escape character for the CSV file. It must be a single byte.
- FIELD_OPTIONALLY_ENCLOSED_BY: specifies the symbol used to enclose field values in the CSV file. The default value is empty.
  
  Notice
  
  If the external table data file contains NULL values (non-string NULL, i.e., not "NULL"), you must explicitly configure the FIELD_OPTIONALLY_ENCLOSED_BY parameter, and its value cannot be empty.
- ENCODING: specifies the character set encoding format of the file. For information about all character sets supported in Oracle-compatible mode, see Character set. If not specified, the default value is UTF8MB4.
- NULL_IF: specifies the string to be treated as NULL. The default value is empty.
- SKIP_HEADER: skips the file header and specifies the number of lines to skip.
- SKIP_BLANK_LINES: specifies whether to skip blank lines. The default value is FALSE, indicating that blank lines are not skipped.
- TRIM_SPACE: specifies whether to remove leading and trailing spaces from fields in the file. The default value is FALSE, indicating that leading and trailing spaces are not removed.
- EMPTY_FIELD_AS_NULL: specifies whether to treat empty strings as NULL. The default value is FALSE, indicating that empty strings are not treated as NULL.
- IGNORE_LAST_EMPTY_COLUMN: specifies whether to ignore the last empty field at the end of a file line (i.e., when a column delimiter precedes the line delimiter). The default value is TRUE, indicating that the last empty field is ignored.
- COMPRESSION: optional. Specifies the file compression format. Valid values:
  - GZIP/DEFLATE: GZIP-compressed files.
  - ZSTD: ZSTD-compressed files.
  - NONE: indicates that the file is not compressed (default value).
  - SNAPPY_BLOCK: HADOOP-SNAPPY-compressed files.
    
    Note
    
    For V4.4.2, the SNAPPY_BLOCK compression format is supported starting from V4.4.2 BP1.
- PARALLEL_PARSE_ON_SINGLE_FILE: specifies whether to enable parallel parsing of a single CSV file. If not specified, the default value is TRUE.
- PARALLEL_PARSE_FILE_SIZE_THRESHOLD: specifies the file size threshold for parallel parsing, in bytes. The default value is 256 MB.
- MAX_ROW_LENGTH: specifies the maximum length of a single row of data, in bytes. The default value is 2 MB. It is used for the boundary determination phase of parallel parsing of a single CSV file and does not affect the accuracy of data reading.
Note

For V4.4.2, the FORMAT option supports the following parameters starting from V4.4.2 BP1: PARALLEL_PARSE_ON_SINGLE_FILE, PARALLEL_PARSE_FILE_SIZE_THRESHOLD, and MAX_ROW_LENGTH.
TYPE = 'PARQUET': specifies the external file format as PARQUET.
TYPE = 'ORC': specifies the external file format as ORC.

Considerations

When an external file is deleted, the external table ignores the missing file in the file list.
When an external file is modified, the external table accesses the latest data of the external file. If the modification of the external file and the query of the external table are performed concurrently, unexpected results may occur. Therefore, avoid modifying the external file while querying the external table.
When a new file is added to the external directory, the external table only accesses the files listed in the file list. If you want to add the new file to the file list of the external table, you need to execute the update external table file operation.

Examples

Example 1

Prepare the data. First, set the secure_file_priv parameter to /home/admin/ and place the CSV file extdata.csv corresponding to the external table data in the /home/admin/test directory of the current OBServer node.

Here is an example of setting the global secure file path:
```
obclient> SET GLOBAL secure_file_priv = "/home/admin/";
Query OK, 0 rows affected
obclinet> \q
Bye
```
Note

Since secure_file_priv is a GLOBAL variable, you need to execute \q to make it effective.

The content of the CSV file is as follows:
```
1,'Dave','Smith','dsmith@outlook.com','friend',32
2,'Xena','Johnson','xjonson@outlook.com','contact',45
3,'Fred','Jackon','fjackson@outlook.com','co-worker',19
4,'Alma','Tyler','atyler@outlook.com','friend',53
```

After logging in to the database as a user in the user tenant, create the external table contacts.

obclient> CREATE EXTERNAL TABLE contacts (
    id    INT,
    firstname  VARCHAR(100),
    lastname   VARCHAR(100),
    email      VARCHAR(255),
    category   CHAR(30),
    age        NUMBER )
   LOCATION = '/home/admin/test/'
   FORMAT = (
     TYPE = 'CSV'
     FIELD_DELIMITER = ','
     FIELD_OPTIONALLY_ENCLOSED_BY =''''
    )PATTERN ='extdata.csv';

Query the data in the external table contacts.

obclient> SELECT * FROM contacts;
+------+-----------+----------+----------------------+--------------------------------+------+
| ID   | FIRSTNAME | LASTNAME | EMAIL                | CATEGORY                       | AGE  |
+------+-----------+----------+----------------------+--------------------------------+------+
|    1 | Dave      | Smith    | dsmith@outlook.com   | friend                         |   32 |
|    2 | Xena      | Johnson  | xjonson@outlook.com  | contact                        |   45 |
|    3 | Fred      | Jackon   | fjackson@outlook.com | co-worker                      |   19 |
|    4 | Alma      | Tyler    | atyler@outlook.com   | friend                         |   53 |
+------+-----------+----------+----------------------+--------------------------------+------+
4 rows in set

Create an external table test_tbl1_csv with the FORMAT option and the COMPRESSION, PARALLEL_PARSE_ON_SINGLE_FILE, PARALLEL_PARSE_FILE_SIZE_THRESHOLD, and MAX_ROW_LENGTH parameters.

obclient> CREATE EXTERNAL TABLE test_tbl1_csv (
    col1 INT,
    col2 VARCHAR(100),
    col3 VARCHAR(100),
    col4 VARCHAR(255),
    col5 CHAR(30),
    col6 NUMBER)
    LOCATION = '/home/admin/test/'
    FORMAT = (
        TYPE = 'CSV'
        FIELD_DELIMITER = ','
        FIELD_OPTIONALLY_ENCLOSED_BY = ''''
        COMPRESSION = SNAPPY_BLOCK
        PARALLEL_PARSE_ON_SINGLE_FILE = TRUE
        PARALLEL_PARSE_FILE_SIZE_THRESHOLD = 1024
        MAX_ROW_LENGTH = 1024)
    PATTERN = 'extdata.csv';

Example 2

Assume that there is a CSV file in a certain file path on HDFS. The content of the file is as follows:

$hdfs dfs -cat /user/test_tbl1.csv
1,'Emma','2021-09-01'
2,'William','2021-09-02'
3,'Olivia','2021-09-03'

Create an external table on HDFS.

If Kerberos authentication is not enabled in the target HDFS environment

CREATE EXTERNAL TABLE test_tbl1_csv_oracle (
id INT,
name VARCHAR(50),
c_date DATE
)
LOCATION = 'hdfs://${hadoop_namenode_hostname}:${hadoop_namenode_port}/user'
FORMAT = (
    TYPE = 'CSV',
    FIELD_DELIMITER = ',',
    FIELD_OPTIONALLY_ENCLOSED_BY = '"'
)
PATTERN = 'test_tbl1.csv';

Notice

Here, hadoop_namenode_hostname and hadoop_namenode_port refer to the host name and port of the HDFS node, which need to be replaced with actual values.

If Kerberos authentication is enabled in the target HDFS environment

Notice

If Kerberos authentication is enabled, you need to deploy the keytab file (user authentication key file) and krb5conf file on the corresponding OBServer node.

CREATE EXTERNAL TABLE partsupp  ( PS_PARTKEY     INTEGER ,
PS_SUPPKEY     INTEGER ,
PS_AVAILQTY    INTEGER ,
PS_SUPPLYCOST  DECIMAL(15,2)  ,
PS_COMMENT     VARCHAR(199)
) LOCATION = 'hdfs://localhost:8020/tpch_csv?principal=principal_str&keytab=/path/to/keytab&krb5conf=/path/to/krb5conf_file&configs=xxx=xxx#xxx=xxx'
FORMAT = (
  TYPE = 'CSV'
  FIELD_DELIMITER = '|'
  FIELD_OPTIONALLY_ENCLOSED_BY ='"'
  )
PATTERN = 'partsupp.tbl';

Query the data in the external table.

select * from test_tbl1_csv_oracle;

The query result is as follows:

+----+----------+------------+
| id | name     | c_date     |
+----+----------+------------+
|  1 | Emma     | 2021-09-01 |
|  2 | William  | 2021-09-02 |
|  3 | Olivia   | 2021-09-03 |
+----+----------+------------+
3 rows in set

Example 3

Assume that an ORC file named data.orc is stored in the /home/admin/orc_test/ directory.
Create an external table in ORC format.
```
obclient> CREATE EXTERNAL TABLE ext_table_orc (
    id   NUMBER,
    name VARCHAR2(50)
)
LOCATION = '/home/admin/orc_test/'
FORMAT = (
  TYPE = 'ORC'
)
PATTERN = 'data.orc';
```
Note

You can specify FORMAT = ( TYPE = 'ORC' ) to let the external table directly read the ORC file. Make sure that the actual format of the external data file is consistent with the declared format, otherwise, the parsing will fail.

Example 4

Assume that there is a CSV file in a certain file path on HDFS. The content of the file is as follows:
```
$hdfs dfs -cat /hadoop_ha_test/test_simple.csv
1,lili,19
2,alic,20
3,solvi,21
```
Start the JNI configuration items related to the observer.

For more information about the JNI configuration, see Deploy the OceanBase Database Java SDK environment.

Create an external table on HDFS

obclient> CREATE EXTERNAL TABLE test_ha (
    id INT,
    r_name VARCHAR(100),
    age INT
    )
    LOCATION = 'hdfs://mycluster/hadoop_ha_test?principal=ha/xxx@xxx.com&keytab=/path/to/ha.keytab&krb5conf=/path/to/krb5conf_file&configs=dfs.data.transfer.protection=integrity#dfs.nameservices=mycluster#dfs.ha.namenodes.mycluster=nn1,nn2#dfs.namenode.rpc-address.mycluster.nn1=localhost1:port#dfs.namenode.rpc-address.mycluster.nn2=localhost2:port#dfs.ha.automatic-failover.enabled.mycluster=true#dfs.client.failover.proxy.provider.mycluster=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
    FORMAT = (
        TYPE = 'CSV'
        FIELD_DELIMITER = ','
        FIELD_OPTIONALLY_ENCLOSED_BY = '\''
        )
    PATTERN = 'test_simple.csv';

Query the data in the external table.

obclient> SELECT * FROM test_ha;

The query result is as follows:

+----+--------+-------+
| ID | R_NAME | AGE   |
+----+--------+-------+
|  1 | lili   |    19 |
|  2 | alic   |    20 |
|  3 | solvi  |    21 |
+----+--------+-------+
3 rows in set

References

Create an external table

Manage external files

Update the external table file list

OceanBase

Customer Stories

Documentation

CREATE EXTERNAL TABLE

Purpose

Syntax

Parameters

Note

file_path

Notice

Note

Notice

format_type_options

Notice

Note

Note

Considerations

Examples

Example 1

Note

Example 2

Notice

Notice

Example 3

Note

Example 4

References