CREATE EXTERNAL TABLE|V4.3.5| docs|Distributed Database

CREATE EXTERNAL TABLE

Last Updated：2026-02-03 08:48:37 Updated

Purpose

This statement is used to create a new external table in the database.

External tables are a key feature of a database management system. Typically, data in a database is stored in the database's storage space, while data in an external table is stored in an external storage service.

When creating an external table, you need to define the file path and file format of the data. After that, users can read data from the external storage service through the external table. External tables are read-only and can be used in query statements, but they do not support DML operations. External tables do not support defining constraints or creating indexes.

Syntax

CREATE EXTERNAL TABLE table_name (column_definition_list)
    LOCATION = 'file_name'
    FORMAT = (format_type_options)
    [PARTITION BY (column_name [, column_name ...])]
    [PARTITION_TYPE = USER_SPECIFIED]
    [PATTERN = '<regex_pattern>']
    [AUTO_REFRESH = 'xxx'];

column_definition_list:
    column_definition [, column_definition ...]

column_definition:
    column_name column_type [AS expr]

format_type_options:
    type_csv_option
    | type_parquet_option
    | type_orc_option

type_csv_option:
    TYPE = 'CSV'
    LINE_DELIMITER = '<string>' | <expr>
    FIELD_DELIMITER = '<string>' | <expr>
    ESCAPE = '<character>' | <expr>
    FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>
    ENCODING = 'charset'
    NULL_IF = ('<string>' | <expr>, '<string>' | <expr> ...)
    SKIP_HEADER = <int>
    SKIP_BLANK_LINES = {TRUE | FALSE}
    TRIM_SPACE = {TRUE | FALSE}
    EMPTY_FIELD_AS_NULL = {TRUE | FALSE}
    IGNORE_LAST_EMPTY_COLUMN = {TRUE | FALSE}

type_parquet_option:
    TYPE = 'PARQUET'

type_orc_option:
    TYPE = 'ORC'

Parameters

Parameter	Description
table_name	The name of the external table to be created.
column_name	The name of the column in the external table. By default, the data columns in the file are automatically mapped to the columns defined in the external table in sequence.
column_type	Defines the column types of the external table, but cannot define constraints (such as `DEFAULT`, `NOT NULL`, `UNIQUE`, `CHECK`, `PRIMARY KEY`, and `FOREIGN KEY`).
AS expr	Manually specifies the column mapping. When the order of the columns in the file does not match the order of the columns defined in the external table, you can use the pseudo column `metadata$filecol{N}` to specify the mapping between the external table columns and the Nth column in the file. For example, `c2 INT AS (metadata$filecol4)` indicates that the `c2` column of the external table corresponds to the 4th column in the file. Note that if you manually specify column mappings, automatic mapping relationships will be disabled, and all columns must be manually defined.
LOCATION = 'file_name'	Specifies the path where the external table files are stored. Typically, the data files of the external table are stored in a separate directory, which can contain subdirectories. When creating the table, the external table automatically collects all files in this directory. For more information, see file_name below.
FORMAT = (format_type_options)	Specifies the attributes of the external file format. Use `TYPE` to specify the export file format, such as `CSV` or `PARQUET`. `TYPE` cannot be empty. For more information, see format_type_options below.
PATTERN	Specifies a regular expression pattern to filter files in the `LOCATION` directory. For each file path in the `LOCATION` directory, if it matches the pattern, the external table accesses the file; otherwise, it skips the file. If this parameter is not specified, all files in the `LOCATION` directory are accessible by default. The external table saves the list of files in the specified `LOCATION` path that match the `PATTERN` in the database system table. When the external table scans, it accesses the external files based on this list.
PARTITION_TYPE = USER_SPECIFIED	If you need to manually add and delete partitions instead of letting the external table automatically manage partitions, specify the `PARTITION_TYPE = USER_SPECIFIED` parameter.
AUTO_REFRESH = 'xxx'	Specifies whether to enable automatic refresh for the external table. Valid values: `OFF`: Default value. Automatic refresh is disabled. `INTERVAL`: The external table can be managed by `dbms_external_table` to set automatic refresh rules. `IMMEDIATE`: Each time SQL accesses the external table, the SQL-related external table metadata must be refreshed.

file_name

The path for storing the external table file is in the following format:

If the file is stored locally, the LOCATION format is LOCATION = '[file://] local_file_path', where local_file_path can be a relative or absolute path. If a relative path is specified, the current directory must be the installation directory of OceanBase Database. The secure_file_priv parameter specifies the file paths that OBServer nodes are allowed to access. local_file_path must be a subpath of the secure_file_priv path.
If the file is stored remotely, the LOCATION format is:

Notice

When you specify an object storage path, the parameters are separated by the & symbol. Make sure that the parameter values contain only uppercase and lowercase letters, digits, /-_$+=, and wildcards. Otherwise, the setting may fail.
- If the file is stored on Alibaba Cloud OSS or AWS S3, the format is LOCATION = '{oss\|s3}://$ACCESS_ID:$ACCESS_KEY@$HOST:s3_region/remote_file_path', where $ACCESS_ID, $ACCESS_KEY, and $HOST are the access information required for accessing Alibaba Cloud OSS, AWS S3, and object storage services compatible with the S3 protocol, respectively. s3_region specifies the region information when you use S3. These sensitive access information are stored in the system tables of the database in an encrypted manner.
- If the file is stored on HDFS, the format is as follows:
  - If you access the cluster based on the address of a single-node NameNode (NN), the format is LOCATION = hdfs://localhost:port/PATH, where localhost specifies the address of HDFS, port specifies the port number of HDFS, and PATH specifies the file path in HDFS.
    - If Kerberos authentication is used, the format is LOCATION = 'hdfs://localhost:port/user?principal=xxx&keytab=xxx&krb5conf=xxx&configs=xxx', where:
      - principal: specifies the user for login authentication.
      - keytab: specifies the path of the key file for user authentication.
      - krb5conf: specifies the path of the description file for the Kerberos environment.
      - configs: specifies additional HDFS configurations. By default, this parameter is empty. However, in a Kerberos environment, this parameter usually has a value and needs to be configured, for example, dfs.data.transfer.protection=authentication,privacy, which specifies the data transmission protection level as authentication and privacy.
  - If you access the cluster based on the logical naming service of Hadoop high availability (HA), the format is LOCATION = hdfs://nameserviceID/PATH, where nameserviceID specifies the ID of the logical naming service of Hadoop HA, and PATH specifies the file path.
    
    Note
    
    Make sure that the client side of OBServer contains the nameservice definition and failover strategy of the HA cluster.
    - If Kerberos authentication is used, the format is LOCATION = 'hdfs://nameserviceID/PATH?principal=xxx&keytab=xxx&krb5conf=xxx&configs=dfs.data.transfer.protection=${string}#dfs.nameservices=${nameservice id}#dfs.ha.namenodes.${nameservice id}=${namenode1}, ${namenode2}#dfs.namenode.rpc-address.${nameservice id}.${namenode1}=${namenode 1 address}#dfs.namenode.rpc-address.${nameservice id}.${namenode2}=${namenode 2 address}#dfs.ha.automatic-failover.enabled.${nameservice id}=true#dfs.client.failover.proxy.provider.${nameservice id}=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider', where:
      - principal: specifies the user for login authentication. Set this parameter to the pricipal of a non-primary NN.
      - keytab and krb5conf: are the same as those in the single-node NN setting.
      - configs: specifies additional HDFS configurations. If multiple configurations are set, only the HA and security configurations are related:
        
        dfs.data.transfer.protection=${string}: aligns with the dfs.data.transfer.protection configuration of the cluster.
        
        dfs.nameservices=${nameservice id}: specifies the namesevice (alias) of the current HA cluster.
        
        dfs.ha.namenodes.${nameservice id}=${namenode1}, ${namenode2}: specifies the list of namenode IDs in the HA cluster.
        
        dfs.namenode.rpc-address.${nameservice id}.${namenode1}=${namenode 1 address}: specifies the specific namenode of namenode1 for client routing.
        
        dfs.namenode.rpc-address.${nameservice id}.${namenode2}=${namenode 2 address}: specifies the specific namenode of namenode2 for client routing.
        
        dfs.ha.automatic-failover.enabled.${nameservice id}=true: allows the HA cluster to automatically obtain a valid namenode to respond to requests after receiving them.
        
        dfs.client.failover.proxy.provider.${nameservice id}=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider: specifies the logical class for switching between primary and standby nodes in the HA cluster. You can also customize and upload the required logic to the HA cluster.
    Notice
    
    Some HA configurations are bound to the namespace. For example, in the following sample three, the mycluster is bound to the HA configurations. Make sure that the related configurations are set correctly.

format_type_options

TYPE = 'CSV': Specifies the format of the external file as CSV. It also includes the following fields:
- LINE_DELIMITER: Specifies the line delimiter for the CSV file.
- FIELD_DELIMITER: Specifies the column delimiter for the CSV file.
- ESCAPE: Specifies the escape character for the CSV file. It must be a single byte.
- FIELD_OPTIONALLY_ENCLOSED_BY: Specifies the symbol used to enclose field values in the CSV file. The default value is an empty string.
  
  Notice
  
  If the external table data file contains NULL values (not the string NULL, i.e., not "NULL"), you must explicitly configure the FIELD_OPTIONALLY_ENCLOSED_BY parameter, and its value cannot be empty.
- ENCODING: Specifies the character set encoding format of the file. For all character sets supported in Oracle mode, see Character sets. If not specified, the default value is UTF8MB4.
- NULL_IF: Specifies the string to be treated as NULL. The default value is an empty string.
- SKIP_HEADER: Skips the file header and specifies the number of lines to skip.
- SKIP_BLANK_LINES: Specifies whether to skip blank lines. The default value is FALSE, indicating that blank lines are not skipped.
- TRIM_SPACE: Specifies whether to remove leading and trailing spaces from fields in the file. The default value is FALSE, indicating that leading and trailing spaces are not removed.
- EMPTY_FIELD_AS_NULL: Specifies whether to treat empty strings as NULL. The default value is FALSE, indicating that empty strings are not treated as NULL.
- IGNORE_LAST_EMPTY_COLUMN: Specifies whether to ignore the last empty field at the end of a line in the file (i.e., if the line ends with a column delimiter before the line delimiter). The default value is TRUE, indicating that the last empty field is ignored.
  
  Note
  
  For V4.3.5, support for IGNORE_LAST_EMPTY_COLUMN was introduced starting from V4.3.5 BP2.
TYPE = 'PARQUET': Specifies the format of the external file as PARQUET.
TYPE = 'ORC': Specifies the format of the external file as ORC.

Considerations

When an external file is deleted, the external table ignores the file that no longer exists in the file list.
When an external file is modified, the external table accesses the latest data of the external file. If the modification of the external file and the query of the external table are performed concurrently, unexpected results may occur. Therefore, avoid modifying the external file while querying the external table.
When a new file is added to the external directory, the external table only accesses the files in the file list. If you want to add the new file to the file list of the external table, you need to execute the update external table file operation.

Examples

Example 1

Prepare the data. First, set the secure_file_priv parameter to /home/admin/, and place the CSV file extdata.csv corresponding to the external table data in the /home/admin/test directory of the current OBServer node.

Here is an example of setting the global secure file path.
```
obclient> SET GLOBAL secure_file_priv = "/home/admin/";
Query OK, 0 rows affected
obclinet> \q
Bye
```
Note

Since secure_file_priv is a GLOBAL variable, you must execute \q to make the setting effective.

The content of the CSV file is as follows:
```
1,'Dave','Smith','dsmith@outlook.com','friend',32
2,'Xena','Johnson','xjonson@outlook.com','contact',45
3,'Fred','Jackon','fjackson@outlook.com','co-worker',19
4,'Alma','Tyler','atyler@outlook.com','friend',53
```

After logging in to the database as the user in the user tenant, create the contacts external table.

obclient> CREATE EXTERNAL TABLE contacts (
    id    INT,
    firstname  VARCHAR(100),
    lastname   VARCHAR(100),
    email      VARCHAR(255),
    category   CHAR(30),
    age        NUMBER )
   LOCATION = '/home/admin/test/'
   FORMAT = (
     TYPE = 'CSV'
     FIELD_DELIMITER = ','
     FIELD_OPTIONALLY_ENCLOSED_BY =''''
    )PATTERN ='extdata.csv';

Query the data in the contacts external table.

obclient> SELECT * FROM contacts;
+------+-----------+----------+----------------------+--------------------------------+------+
| ID   | FIRSTNAME | LASTNAME | EMAIL                | CATEGORY                       | AGE  |
+------+-----------+----------+----------------------+--------------------------------+------+
|    1 | Dave      | Smith    | dsmith@outlook.com   | friend                         |   32 |
|    2 | Xena      | Johnson  | xjonson@outlook.com  | contact                        |   45 |
|    3 | Fred      | Jackon   | fjackson@outlook.com | co-worker                      |   19 |
|    4 | Alma      | Tyler    | atyler@outlook.com   | friend                         |   53 |
+------+-----------+----------+----------------------+--------------------------------+------+
4 rows in set

Example 2

Assume that there is a CSV file in a certain file path on HDFS. The content of the file is as follows:

$hdfs dfs -cat /user/test_tbl1.csv
1,'Emma','2021-09-01'
2,'William','2021-09-02'
3,'Olivia','2021-09-03'

Create an HDFS external table.

If Kerberos authentication is not enabled in the target HDFS environment

CREATE EXTERNAL TABLE test_tbl1_csv_oracle (
id INT,
name VARCHAR(50),
c_date DATE
)
LOCATION = 'hdfs://${hadoop_namenode_hostname}:${hadoop_namenode_port}/user'
FORMAT = (
    TYPE = 'CSV',
    FIELD_DELIMITER = ',',
    FIELD_OPTIONALLY_ENCLOSED_BY = '"'
)
PATTERN = 'test_tbl1.csv';

Notice

Here, hadoop_namenode_hostname and hadoop_namenode_port refer to the host name and port of the HDFS node, which must be replaced with the actual values.

If Kerberos authentication is enabled in the target HDFS environment

Notice

If Kerberos authentication is enabled, you must deploy the corresponding keytab (user authentication key file) and krb5conf file on the OBServer node.

CREATE EXTERNAL TABLE partsupp  ( PS_PARTKEY     INTEGER ,
PS_SUPPKEY     INTEGER ,
PS_AVAILQTY    INTEGER ,
PS_SUPPLYCOST  DECIMAL(15,2)  ,
PS_COMMENT     VARCHAR(199)
) LOCATION = 'hdfs://localhost:8020/tpch_csv?principal=principal_str&keytab=/path/to/keytab&krb5conf=/path/to/krb5conf_file&configs=xxx=xxx#xxx=xxx'
FORMAT = (
  TYPE = 'CSV'
  FIELD_DELIMITER = '|'
  FIELD_OPTIONALLY_ENCLOSED_BY ='"'
  )
PATTERN = 'partsupp.tbl';

Query the data in the external table.

select * from test_tbl1_csv_oracle;

The query result is as follows:

+----+----------+------------+
| id | name     | c_date     |
+----+----------+------------+
|  1 | Emma     | 2021-09-01 |
|  2 | William  | 2021-09-02 |
|  3 | Olivia   | 2021-09-03 |
+----+----------+------------+
3 rows in set

Example 3

Assume that the ORC-formatted data file data.orc is stored in the /home/admin/orc_test/ directory.
Create an ORC-formatted external table.
```
obclient> CREATE EXTERNAL TABLE ext_table_orc (
    id   NUMBER,
    name VARCHAR2(50)
)
LOCATION = '/home/admin/orc_test/'
FORMAT = (
  TYPE = 'ORC'
)
PATTERN = 'data.orc';
```
Note

By specifying FORMAT = ( TYPE = 'ORC' ), the external table can directly read the ORC-formatted data file. Make sure that the actual format of the external data file matches the declared format, otherwise parsing will fail.

Example 4

Assume that there is a CSV file in a certain file path on HDFS. The content of the file is as follows:
```
$hdfs dfs -cat /hadoop_ha_test/test_simple.csv
1,lili,19
2,alic,20
3,solvi,21
```
Start the Observer-related JNI configuration items.

For more information about the JNI configuration items, see Deploy the OceanBase Database Java SDK environment.

Create an HDFS external table

obclient> CREATE EXTERNAL TABLE test_ha (
    id INT,
    r_name VARCHAR(100),
    age INT
    )
    LOCATION = 'hdfs://mycluster/hadoop_ha_test?principal=ha/xxx@xxx.com&keytab=/path/to/ha.keytab&krb5conf=/path/to/krb5conf_file&configs=dfs.data.transfer.protection=integrity#dfs.nameservices=mycluster#dfs.ha.namenodes.mycluster=nn1,nn2#dfs.namenode.rpc-address.mycluster.nn1=localhost1:port#dfs.namenode.rpc-address.mycluster.nn2=localhost2:port#dfs.ha.automatic-failover.enabled.mycluster=true#dfs.client.failover.proxy.provider.mycluster=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
    FORMAT = (
        TYPE = 'CSV'
        FIELD_DELIMITER = ','
        FIELD_OPTIONALLY_ENCLOSED_BY = '\''
        )
    PATTERN = 'test_simple.csv';

Query the data in the external table.

obclient> SELECT * FROM test_ha;

The query result is as follows:

+----+--------+-------+
| ID | R_NAME | AGE   |
+----+--------+-------+
|  1 | lili   |    19 |
|  2 | alic   |    20 |
|  3 | solvi  |    21 |
+----+--------+-------+
3 rows in set

References

Create an external table

Manage external files

Update the external table file list

CREATE EXTERNAL TABLE

Purpose

Syntax

Parameters

file_name

Notice

Note

Notice

format_type_options

Notice

Note

Considerations

Examples

Example 1

Note

Example 2

Notice

Notice

Example 3

Note

Example 4

References