CREATE EXTERNAL TABLE|V4.6.0|OceanBase Database| docs|Distributed Database

CREATE EXTERNAL TABLE

Last Updated：2026-05-07 11:26:25 Updated

This statement is used to create a new external table in the database.

External tables are a key feature of a database management system. Typically, tables in a database are stored in the database's storage space, while the data of external tables is stored in an external storage service.

When creating an external table, you need to define the file path and file format of the data. After that, users can read the data from the external storage service through the external table. External tables are read-only and can be used in query statements, but DML operations cannot be performed on them. External tables do not support defining constraints or creating indexes.

Syntax

CREATE EXTERNAL TABLE table_name (column_definition_list)
    LOCATION = {'file_path' | @location_name['/path']}
    FORMAT = (format_type_options)
    [PARTITION BY (column_name [, column_name ...])]
    [PARTITION_TYPE = USER_SPECIFIED]
    [PATTERN = '<regex_pattern>']
    [AUTO_REFRESH = 'xxx'];

column_definition_list:
    column_definition [, column_definition ...]

column_definition:
    column_name column_type [AS expr]

format_type_options:
    type_csv_option
    | type_parquet_option
    | type_orc_option

type_csv_option:
    TYPE = 'CSV'
    LINE_DELIMITER = '<string>' | <expr>
    FIELD_DELIMITER = '<string>' | <expr>
    ESCAPE = '<character>' | <expr>
    FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>
    ENCODING = 'charset'
    NULL_IF = ('<string>' | <expr>, '<string>' | <expr> ...)
    SKIP_HEADER = <int>
    SKIP_BLANK_LINES = {TRUE | FALSE}
    TRIM_SPACE = {TRUE | FALSE}
    EMPTY_FIELD_AS_NULL = {TRUE | FALSE}
    IGNORE_LAST_EMPTY_COLUMN = {TRUE | FALSE}
    COMPRESSION = {GZIP | ZSTD | DEFLATE | NONE | SNAPPY_BLOCK}
    PARALLEL_PARSE_ON_SINGLE_FILE = {TRUE | FALSE}
    PARALLEL_PARSE_FILE_SIZE_THRESHOLD = <int>
    MAX_ROW_LENGTH = <int>

type_parquet_option:
    TYPE = 'PARQUET'

type_orc_option:
    TYPE = 'ORC'

Parameters

Parameter	Description
table_name	The name of the external table to be created.
column_name	The name of the column in the external table. By default, the data columns in the file are automatically mapped to the columns defined in the external table in sequence.
column_type	Defines the column type of the external table, but cannot define constraints (e.g., `DEFAULT`, `NOT NULL`, `UNIQUE`, `CHECK`, `PRIMARY KEY`, `FOREIGN KEY`, etc.).
AS expr	Used to manually specify column mappings. When the order of columns in the file does not match the order of columns defined in the external table, you can use the pseudo column `metadata$filecol{N}` to specify the mapping between the external table's columns and the Nth column in the file. For example, `c2 INT AS (metadata$filecol4)` indicates that the `c2` column in the external table corresponds to the 4th column in the file. Note that if manual column mappings are specified, automatic mappings will be disabled, and all columns must be manually defined with mappings.
LOCATION	Specifies the path where the external table files are stored. Typically, the data files of the external table are stored in a dedicated directory, which may contain subdirectories. When creating the table, the external table will automatically collect all files in this directory. Valid values: `file_path`: The specific path of the external table file. For more information, see file_path below. `@location_name['/path']`: Indicates that the Location object is used to create the external table. `['/path']` is an optional parameter indicating a subdirectory. For more information about creating a Location object, see CREATE LOCATION. Note For OceanBase Database V4.4.x, the `@location_name[/'path']` parameter is supported starting from V4.4.1.
FORMAT = (format_type_options)	Specifies the properties of the external file format. Use `TYPE` to specify the export format as `CSV` or `PARQUET`. The `TYPE` parameter cannot be empty. For more information, see format_type_options below.
PATTERN	Specifies a regular expression pattern to filter files in the `LOCATION` directory. For each file path in the `LOCATION` directory, if it matches the pattern, the external table will access the file; otherwise, it will skip the file. If this parameter is not specified, the external table will access all files in the `LOCATION` directory by default. The external table will save the list of files in the `LOCATION` directory that match the `PATTERN` in the database system table. When the external table scans, it will access the external files based on this list.
PARTITION_TYPE = USER_SPECIFIED	Specifies `PARTITION_TYPE = USER_SPECIFIED` when you need to manually add and delete partitions instead of letting the external table automatically manage partitions.
AUTO_REFRESH = 'xxx'	Specifies whether to enable automatic refresh for the external table. Valid values: `OFF`: Default value, indicating that automatic refresh is disabled. `INTERVAL`: Indicates that the external table can be managed for automatic refresh rules via `dbms_external_table`. `IMMEDIATE`: Indicates that the external table's metadata needs to be refreshed every time an SQL query accesses the external table.

file_path

The following table describes the file path formats for external tables.

If the file is stored locally, the LOCATION parameter is in the LOCATION = '[file://] local_file_path' format, where local_file_path can be a relative or absolute path. If you specify a relative path, the current directory must be the installation directory of OceanBase Database. The secure_file_priv parameter specifies the file path that the OBServer node has access to. The local_file_path parameter can only be a subpath of the secure_file_priv path.
If the file is stored remotely, the LOCATION parameter is in the following format:

Notice

When you use an object storage path, the parameters of the object storage path are separated by the & symbol. Make sure that the parameter values contain only uppercase and lowercase letters, numbers, /-_$+=, and wildcards. If you enter other characters, the configuration may fail.
- If the file is stored on Alibaba Cloud OSS or AWS S3, the LOCATION parameter is in the LOCATION = '{oss\|s3}://$ACCESS_ID:$ACCESS_KEY@$HOST:s3_region/remote_file_path' format, where $ACCESS_ID, $ACCESS_KEY, and $HOST are the access information required to access Alibaba Cloud OSS, AWS S3, or an object storage service that is compatible with the S3 protocol, and s3_region specifies the region information when you use S3. These sensitive access information are stored in the system tables of the database in an encrypted manner.
- If the file is stored on HDFS, the LOCATION parameter is in the following format:
  - If you access the cluster by using the IP address of a single NameNode (NN), the LOCATION parameter is in the LOCATION = hdfs://localhost:port/PATH format, where localhost specifies the IP address of HDFS, port specifies the port number of HDFS, and PATH specifies the file path in HDFS.
    - If you access the cluster by using Kerberos authentication, the LOCATION parameter is in the LOCATION = 'hdfs://localhost:port/user?principal=xxx&keytab=xxx&krb5conf=xxx&configs=xxx' format, where:
      - principal: specifies the user for login authentication.
      - keytab: specifies the path of the key file for user authentication.
      - krb5conf: specifies the path of the description file for the Kerberos environment.
      - configs: specifies additional HDFS configurations. By default, this parameter is empty. However, in a Kerberos environment, this parameter usually has a value, which must be configured. For example, dfs.data.transfer.protection=authentication,privacy specifies the data transmission protection level as authentication and privacy.
  - If you access the cluster by using the logical name service of Hadoop high availability (HA), the LOCATION parameter is in the LOCATION = hdfs://nameserviceID/PATH format, where nameserviceID specifies the ID of the logical name service of the Hadoop HA cluster, and PATH specifies the file path.
    
    Note
    
    Make sure that the client side of the OBServer node contains the nameservice definition and failover strategy of the HA cluster.
    - If you access the cluster by using Kerberos authentication, the LOCATION parameter is in the LOCATION = 'hdfs://nameserviceID/PATH?principal=xxx&keytab=xxx&krb5conf=xxx&configs=dfs.data.transfer.protection=${string}#dfs.nameservices=${nameservice id}#dfs.ha.namenodes.${nameservice id}=${namenode1}, ${namenode2}#dfs.namenode.rpc-address.${nameservice id}.${namenode1}=${namenode 1 address}#dfs.namenode.rpc-address.${nameservice id}.${namenode2}=${namenode 2 address}#dfs.ha.automatic-failover.enabled.${nameservice id}=true#dfs.client.failover.proxy.provider.${nameservice id}=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider' format, where:
      - principal: specifies the user for login authentication. Set this parameter to the pricipal of a non-primary NN.
      - keytab and krb5conf: are the same as those in the single-node NN configuration.
      - configs: specifies additional HDFS configurations. If multiple configurations are set, only the HA configurations and security configurations are related to Kerberos:
        
        dfs.data.transfer.protection=${string}: specifies the dfs.data.transfer.protection configuration of the cluster.
        
        dfs.nameservices=${nameservice id}: specifies the nameservice (alias) of the current HA cluster.
        
        dfs.ha.namenodes.${nameservice id}=${namenode1}, ${namenode2}: specifies the list of standby IDs of the HA cluster.
        
        dfs.namenode.rpc-address.${nameservice id}.${namenode1}=${namenode 1 address}: specifies the namenode1 for the client to route.
        
        dfs.namenode.rpc-address.${nameservice id}.${namenode2}=${namenode 2 address}: specifies the namenode2 for the client to route.
        
        dfs.ha.automatic-failover.enabled.${nameservice id}=true: specifies that the HA cluster automatically obtains a valid namenode to respond to requests after the HA cluster receives the requests.
        
        dfs.client.failover.proxy.provider.${nameservice id}=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider: specifies the logical class for the HA cluster to switch between primary and standby nodes. You can also customize the package and upload the logic required by the HA cluster.
    Notice
    
    Some HA configurations are bound to the namespace. For example, in the following sample three, the mycluster is the namespace. Make sure that the related configurations are set.

format_type_options

TYPE = 'CSV': specifies the format of the external file as CSV. It also includes the following fields:
- LINE_DELIMITER: specifies the line delimiter for the CSV file.
- FIELD_DELIMITER: specifies the column delimiter for the CSV file.
- ESCAPE: specifies the escape character for the CSV file. It must be a single byte.
- FIELD_OPTIONALLY_ENCLOSED_BY: specifies the character used to enclose field values in the CSV file. The default value is empty.
  
  Notice
  
  When the external table data file contains NULL values (not the string NULL, i.e., not "NULL"), you must explicitly configure the FIELD_OPTIONALLY_ENCLOSED_BY parameter, and the value of this parameter cannot be empty.
- ENCODING: specifies the character set encoding format of the file. For information about all character sets supported in Oracle mode, see Character set. If not specified, the default value is UTF8MB4.
- NULL_IF: specifies the string to be treated as NULL. The default value is empty.
- SKIP_HEADER: skips the file header and specifies the number of rows to skip.
- SKIP_BLANK_LINES: specifies whether to skip blank lines. The default value is FALSE, indicating that blank lines are not skipped.
- TRIM_SPACE: specifies whether to remove leading and trailing spaces from fields in the file. The default value is FALSE, indicating that leading and trailing spaces are not removed from fields in the file.
- EMPTY_FIELD_AS_NULL: specifies whether to treat empty strings as NULL. The default value is FALSE, indicating that empty strings are not treated as NULL.
- IGNORE_LAST_EMPTY_COLUMN: specifies whether to ignore the last empty field at the end of a line in the file. The default value is TRUE, indicating that the last empty field is ignored.
- COMPRESSION: optional. Specifies the compression format of the file. Valid values:
  - GZIP/DEFLATE: GZIP-compressed file.
  - ZSTD: ZSTD-compressed file.
  - NONE: indicates that the file is not compressed (default value).
  - SNAPPY_BLOCK: HADOOP-SNAPPY-compressed file.
- PARALLEL_PARSE_ON_SINGLE_FILE: specifies whether to enable parallel parsing of a single CSV file. If not specified, the default value is TRUE.
- PARALLEL_PARSE_FILE_SIZE_THRESHOLD: specifies the threshold size for files that may be parsed in parallel. The unit is bytes, and the default value is 256 MB.
- MAX_ROW_LENGTH: specifies the maximum length of a single row of data, in bytes. The default value is 2 MB. It is used for the boundary determination stage of parallel parsing of a single CSV file and does not affect the accuracy of data reading.
TYPE = 'PARQUET': specifies the format of the external file as PARQUET.
TYPE = 'ORC': specifies the format of the external file as ORC.

Considerations

When an external file is deleted, the external table will ignore the file if it no longer exists in the file list.
When an external file is modified, the external table will access the latest data in the external file. If the external file is modified concurrently with a query on the external table, unexpected results may occur. Therefore, avoid modifying the external file while querying the external table.
When new files are added to the external directory, the external table will only access the files listed in the file list. To include the new files in the external table's file list, you need to execute an update operation on the external table's file list.

Examples

Example 1

Prepare data. First, set the secure_file_priv path to /home/admin/ and place the CSV file extdata.csv corresponding to the external table data in the /home/admin/test directory of the current OBServer node.

Here is an example of setting the global secure file path:
```
obclient> SET GLOBAL secure_file_priv = "/home/admin/";
Query OK, 0 rows affected
obclinet> \q
Bye
```
Note

Since secure_file_priv is a GLOBAL variable, you need to execute \q to exit and make it effective.

The content of the CSV file is as follows:
```
1,'Dave','Smith','dsmith@outlook.com','friend',32
2,'Xena','Johnson','xjonson@outlook.com','contact',45
3,'Fred','Jackon','fjackson@outlook.com','co-worker',19
4,'Alma','Tyler','atyler@outlook.com','friend',53
```

After logging in to the database as a user in the user tenant, create an external table named contacts.

obclient> CREATE EXTERNAL TABLE contacts (
    id    INT,
    firstname  VARCHAR(100),
    lastname   VARCHAR(100),
    email      VARCHAR(255),
    category   CHAR(30),
    age        NUMBER )
   LOCATION = '/home/admin/test/'
   FORMAT = (
     TYPE = 'CSV'
     FIELD_DELIMITER = ','
     FIELD_OPTIONALLY_ENCLOSED_BY =''''
    )PATTERN ='extdata.csv';

Query the data in the contacts external table.

obclient> SELECT * FROM contacts;
+------+-----------+----------+----------------------+--------------------------------+------+
| ID   | FIRSTNAME | LASTNAME | EMAIL                | CATEGORY                       | AGE  |
+------+-----------+----------+----------------------+--------------------------------+------+
|    1 | Dave      | Smith    | dsmith@outlook.com   | friend                         |   32 |
|    2 | Xena      | Johnson  | xjonson@outlook.com  | contact                        |   45 |
|    3 | Fred      | Jackon   | fjackson@outlook.com | co-worker                      |   19 |
|    4 | Alma      | Tyler    | atyler@outlook.com   | friend                         |   53 |
+------+-----------+----------+----------------------+--------------------------------+------+
4 rows in set

Create an external table named test_tbl1_csv with the FORMAT option that includes the COMPRESSION, PARALLEL_PARSE_ON_SINGLE_FILE, PARALLEL_PARSE_FILE_SIZE_THRESHOLD, and MAX_ROW_LENGTH parameters.

obclient> CREATE EXTERNAL TABLE test_tbl1_csv (
    col1 INT,
    col2 VARCHAR(100),
    col3 VARCHAR(100),
    col4 VARCHAR(255),
    col5 CHAR(30),
    col6 NUMBER)
    LOCATION = '/home/admin/test/'
    FORMAT = (
        TYPE = 'CSV'
        FIELD_DELIMITER = ','
        FIELD_OPTIONALLY_ENCLOSED_BY = ''''
        COMPRESSION = SNAPPY_BLOCK
        PARALLEL_PARSE_ON_SINGLE_FILE = TRUE
        PARALLEL_PARSE_FILE_SIZE_THRESHOLD = 1024
        MAX_ROW_LENGTH = 1024)
    PATTERN = 'extdata.csv';

Example 2

Assume that a CSV file exists in a certain file path on HDFS, with the following content:

$hdfs dfs -cat /user/test_tbl1.csv
1,'Emma','2021-09-01'
2,'William','2021-09-02'
3,'Olivia','2021-09-03'

Create an HDFS external table.

If Kerberos authentication is not enabled in the target HDFS environment

CREATE EXTERNAL TABLE test_tbl1_csv_oracle (
id INT,
name VARCHAR(50),
c_date DATE
)
LOCATION = 'hdfs://${hadoop_namenode_hostname}:${hadoop_namenode_port}/user'
FORMAT = (
    TYPE = 'CSV',
    FIELD_DELIMITER = ',',
    FIELD_OPTIONALLY_ENCLOSED_BY = '"'
)
PATTERN = 'test_tbl1.csv';

Notice

Here, hadoop_namenode_hostname and hadoop_namenode_port refer to the hostname and port of the HDFS node, which need to be replaced with actual values.

If Kerberos authentication is enabled in the target HDFS environment

Notice

If Kerberos authentication is enabled, you need to deploy the corresponding keytab (user authentication key file) and krb5conf file on the OBServer node.

CREATE EXTERNAL TABLE partsupp  ( PS_PARTKEY     INTEGER ,
PS_SUPPKEY     INTEGER ,
PS_AVAILQTY    INTEGER ,
PS_SUPPLYCOST  DECIMAL(15,2)  ,
PS_COMMENT     VARCHAR(199)
) LOCATION = 'hdfs://localhost:8020/tpch_csv?principal=principal_str&keytab=/path/to/keytab&krb5conf=/path/to/krb5conf_file&configs=xxx=xxx#xxx=xxx'
FORMAT = (
  TYPE = 'CSV'
  FIELD_DELIMITER = '|'
  FIELD_OPTIONALLY_ENCLOSED_BY ='"'
  )
PATTERN = 'partsupp.tbl';

Query the data in the external table.

select * from test_tbl1_csv_oracle;

The query result is as follows:

+----+----------+------------+
| id | name     | c_date     |
+----+----------+------------+
|  1 | Emma     | 2021-09-01 |
|  2 | William  | 2021-09-02 |
|  3 | Olivia   | 2021-09-03 |
+----+----------+------------+
3 rows in set

Example 3

Assume that an ORC file named data.orc is stored in the /home/admin/orc_test/ directory.
Create an ORC external table.
```
obclient> CREATE EXTERNAL TABLE ext_table_orc (
    id   NUMBER,
    name VARCHAR2(50)
)
LOCATION = '/home/admin/orc_test/'
FORMAT = (
  TYPE = 'ORC'
)
PATTERN = 'data.orc';
```
Note

By specifying FORMAT = ( TYPE = 'ORC' ), the external table can directly read the ORC data file. Ensure that the actual format of the external data file matches the declared format to avoid parsing failures.

Example 4

Assume that a CSV file exists in a certain file path on HDFS, with the following content:
```
$hdfs dfs -cat /hadoop_ha_test/test_simple.csv
1,lili,19
2,alic,20
3,solvi,21
```
Start the Observer-related JNI configuration items.

For more information about the JNI configuration, see Deploy the OceanBase Database JAVA SDK environment.

Create an HDFS external table

obclient> CREATE EXTERNAL TABLE test_ha (
    id INT,
    r_name VARCHAR(100),
    age INT
    )
    LOCATION = 'hdfs://mycluster/hadoop_ha_test?principal=ha/xxx@xxx.com&keytab=/path/to/ha.keytab&krb5conf=/path/to/krb5conf_file&configs=dfs.data.transfer.protection=integrity#dfs.nameservices=mycluster#dfs.ha.namenodes.mycluster=nn1,nn2#dfs.namenode.rpc-address.mycluster.nn1=localhost1:port#dfs.namenode.rpc-address.mycluster.nn2=localhost2:port#dfs.ha.automatic-failover.enabled.mycluster=true#dfs.client.failover.proxy.provider.mycluster=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
    FORMAT = (
        TYPE = 'CSV'
        FIELD_DELIMITER = ','
        FIELD_OPTIONALLY_ENCLOSED_BY = '\''
        )
    PATTERN = 'test_simple.csv';

Query the data in the external table.

obclient> SELECT * FROM test_ha;

The query result is as follows:

+----+--------+-------+
| ID | R_NAME | AGE   |
+----+--------+-------+
|  1 | lili   |    19 |
|  2 | alic   |    20 |
|  3 | solvi  |    21 |
+----+--------+-------+
3 rows in set

References

Create an external table

Manage external files

Update the external file list

OceanBase

Customer Stories

Documentation

CREATE EXTERNAL TABLE

Syntax

Parameters

Note

file_path

Notice

Note

Notice

format_type_options

Notice

Considerations

Examples

Example 1

Note

Example 2

Notice

Notice

Example 3

Note

Example 4

References