Create an external table|V4.6.0|OceanBase Database| docs|Distributed Database

Create an external table

Last Updated：2026-05-07 11:26:24 Updated

This topic describes how to create an external table by using SQL statements. It also provides the prerequisites, overview, and considerations for creating an external table, and gives some examples.

Overview

An external table is a logical table object. The actual data of an external table is stored in an external storage service, not in the database.

For more information about external tables, see Overview.

Prerequisites

Before you create an external table, make sure that the following conditions are met:

You have deployed an OceanBase cluster and created an Oracle tenant. For more information, see Overview.
You have connected to an Oracle tenant of OceanBase Database. For more information, see Overview.
You have the CREATE TABLE privilege. For more information about how to query the privileges of the current user, see Query the privileges of the current user. If you do not have the privilege, contact the administrator to grant the privilege. For more information, see Grant privileges.

Considerations

Externals only support query operations and do not support DML operations.
If the external file accessed by the external table is deleted, the system does not return an error but returns an empty row.
Since the files accessed by the external table are managed by the external storage system, if the external storage is unavailable, querying the external table will result in an error.
Since the data of the external table is stored in an external data source, querying it involves factors such as network and file system, which may affect query performance. Therefore, when creating an external table, you need to select an appropriate data source and optimization strategy to improve query efficiency.

Create an external table by using the command-line tool

Use the CREATE EXTERNAL TABLE statement to create an external table.

Define the name of the external table

When you create an external table, you must specify its name. To avoid confusion and ambiguity, we recommend that you use specific naming rules or prefixes to distinguish external tables from regular tables. For example, you can add the _csv suffix to the name of an external table.

Here is an example:

When you create an external table that stores information about students, you can name it students_csv.

CREATE EXTERNAL TABLE students_csv external_options

Notice

Because no other attributes are specified for the external table, the preceding SQL statement cannot be executed.

Define columns

You cannot define constraints such as DEFAULT, NOT NULL, UNIQUE, CHECK, PRIMARY KEY, and FOREIGN KEY for columns of an external table.

The column types supported by external tables are the same as those supported by regular tables. For more information about the data types supported by OceanBase Database in Oracle mode, see Built-in data types.

Define LOCATION

The LOCATION option specifies the path where the external table files are stored. Typically, the data files of an external table are stored in a separate directory, which can contain subdirectories. When you create an external table, it automatically collects all files in the directory.

OceanBase Database supports the following two path formats:

Local location format: LOCATION = '[file://] local_file_path'

Notice

For the local location format, you must set the secure_file_priv system variable to specify the accessible path. For more information, see secure_file_priv.
Remote location format:

Notice

When you use an object storage path, the parameters of the object storage path are separated by the & symbol. Make sure that the parameter values you enter contain only uppercase and lowercase letters, digits, /-_$+=, and wildcards. If you enter other characters, the setting may fail.
- LOCATION = '{oss|S3}://$ACCESS_ID:$ACCESS_KEY@$HOST:s3_region/remote_file_path', where $ACCESS_ID, $ACCESS_KEY, and $HOST are the access information required for accessing Alibaba Cloud OSS, AWS S3, and object storage services compatible with the S3 protocol, and s3_region is the region information selected when using S3. These sensitive access information are stored in the system tables of the database in an encrypted manner.
- When the files are stored on HDFS, the following formats are supported:
  - Single-node NameNode (NN) access format: LOCATION = hdfs://localhost:port/PATH, where localhost specifies the address of HDFS, port specifies the port number of HDFS, and PATH specifies the file path in HDFS.
    - Kerberos authentication format: LOCATION = 'hdfs://localhost:port/user?principal=xxx&keytab=xxx&krb5conf=xxx&configs=xxx', where:
      - principal: specifies the user for login and authentication.
      - keytab: specifies the path of the key file for user authentication.
      - krb5conf: specifies the path of the description file for the Kerberos environment.
      - configs: specifies additional HDFS configurations. By default, this parameter is empty. However, in a Kerberos environment, this parameter usually has a value and needs to be configured. For example, dfs.data.transfer.protection=authentication,privacy specifies the data transmission protection level as authentication and privacy.
  - Hadoop high availability (HA) logical name service access format: LOCATION = hdfs://nameserviceID/PATH, where nameserviceID specifies the ID of the logical name service in Hadoop HA, and PATH specifies the file path.
    
    Note
    
    Make sure that the client OBServer configurations include the nameservice definition and failover strategy of the HA cluster.
    - Kerberos authentication format: LOCATION = 'hdfs://nameserviceID/PATH?principal=xxx&keytab=xxx&krb5conf=xxx&configs=dfs.data.transfer.protection=${string}#dfs.nameservices=${nameservice id}#dfs.ha.namenodes.${nameservice id}=${namenode1}, ${namenode2}#dfs.namenode.rpc-address.${nameservice id}.${namenode1}=${namenode 1 address}#dfs.namenode.rpc-address.${nameservice id}.${namenode2}=${namenode 2 address}#dfs.ha.automatic-failover.enabled.${nameservice id}=true#dfs.client.failover.proxy.provider.${nameservice id}=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider', where:
      - principal: specifies the user for login and authentication. Set this parameter to the pricipal of a non-primary NN.
      - keytab and krb5conf: same as in single-node NN settings.
      - configs: specifies additional HDFS configurations. You need to set multiple configurations, which are related to HA and security configurations:
        
        dfs.data.transfer.protection=${string}: aligns with the dfs.data.transfer.protection configuration of the cluster.
        
        dfs.nameservices=${nameservice id}: specifies the namesevice (alias) of the current HA cluster.
        
        dfs.ha.namenodes.${nameservice id}=${namenode1}, ${namenode2}: specifies the list of namenode backup IDs of the HA cluster.
        
        dfs.namenode.rpc-address.${nameservice id}.${namenode1}=${namenode 1 address}: specifies the specific namenode of namenode1 for client routing.
        
        dfs.namenode.rpc-address.${nameservice id}.${namenode2}=${namenode 2 address}: specifies the specific namenode of namenode2 for client routing.
        
        dfs.ha.automatic-failover.enabled.${nameservice id}=true: enables the HA cluster to automatically obtain an available namenode to respond to requests.
        
        dfs.client.failover.proxy.provider.${nameservice id}=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider: specifies the logical class for switching between primary and backup nodes in the HA cluster. You can also customize and upload the required logic for the HA cluster.
    Notice
    
    HA-related parameters are bound to the namespace. For example, in the following sample three, mycluster is bound to the namespace. Please note the relevant parameters when setting them.

Define FORMAT

FORMAT = ( TYPE = 'CSV'... ) specifies the format of the external file as CSV. The parameters are as follows:
- TYPE: specifies the type of the external file.
- LINE_DELIMITER: specifies the line delimiter for the CSV file. The default value is LINE_DELIMITER='\n'.
- FIELD_DELIMITER: specifies the column delimiter for the CSV file. The default value is FIELD_DELIMITER='\t'.
- ESCAPE: specifies the escape character for the CSV file. It must be a single byte. The default value is ESCAPE ='\'.
- FIELD_OPTIONALLY_ENCLOSED_BY: specifies the character used to enclose field values in the CSV file. The default value is an empty string.
  
  Notice
  
  If the external table data file contains NULL values (not string NULL, that is, not "NULL"), you must explicitly configure the FIELD_OPTIONALLY_ENCLOSED_BY parameter, and the value of this parameter cannot be empty.
- ENCODING: specifies the character set encoding format of the file. For information about all character sets supported in Oracle mode, see Character sets and collations. If you do not specify this parameter, the default value is UTF8MB4.
- NULL_IF: specifies the string to be treated as NULL. The default value is an empty string.
- SKIP_HEADER: skips the file header and specifies the number of lines to skip.
- SKIP_BLANK_LINES: specifies whether to skip blank lines. The default value is FALSE, indicating that blank lines are not skipped.
- TRIM_SPACE: specifies whether to remove leading and trailing spaces from fields in the file. The default value is FALSE, indicating that leading and trailing spaces are not removed from fields in the file.
- EMPTY_FIELD_AS_NULL: specifies whether to treat empty strings as NULL. The default value is FALSE, indicating that empty strings are not treated as NULL.
- COMPRESSION: optional. Specifies the compression format of the file. Valid values are as follows:
  - GZIP/DEFLATE: GZIP-compressed file.
  - ZSTD: ZSTD-compressed file.
  - NONE: indicates that the file is not compressed (default value).
  - SNAPPY_BLOCK: HADOOP-SNAPPY-compressed file.
- PARALLEL_PARSE_ON_SINGLE_FILE: specifies whether to enable parallel parsing of a single CSV file. If not specified, the default value is TRUE.
- PARALLEL_PARSE_FILE_SIZE_THRESHOLD: specifies the file size threshold for parallel parsing, in bytes. The default value is 256 MB.
- MAX_ROW_LENGTH: specifies the maximum length of a single row of data, in bytes. The default value is 2 MB. This parameter is used in the boundary determination phase of parallel parsing of a single CSV file and does not affect the accuracy of data reading.
FORMAT = ( TYPE = 'PARQUET'... ) specifies the format of the external file as PARQUET.
FORMAT = ( TYPE = 'ORC'... ) specifies the format of the external file as ORC.

Note

Starting from OceanBase Database V4.6.0, the external tables for Parquet and ORC file formats support reading complex data types: Array, Map, and JSON. The data types supported by the external tables must be consistent with those supported by OceanBase Database. For more information, see Overview of built-in data types.
OceanBase Database external tables for Parquet and ORC file formats support reading Bloom filters to filter data based on query predicates, reducing unnecessary I/O and accelerating queries.

(Optional) Define PATTERN

The PATTERN option specifies a regular expression pattern to filter files in the LOCATION directory. For each file path in the LOCATION directory, if it matches the pattern, the external table accesses the file; otherwise, it skips the file. If you do not specify this parameter, the default is to access all files in the LOCATION directory. The external table saves the list of files in the LOCATION directory that match the PATTERN in the database system table. When the external table scans, it accesses the external files based on this list.

(Optional) Define partitions for the external table

Automatically define partitions for the external table

The external table calculates and adds partitions based on the partition key expression. When you query, you can specify the value or range of the partition key. In this case, partition pruning is performed, and the external table only reads files in the specified partition.

Manually define partitions for the external table

If you need to manually add and delete partitions instead of letting the external table manage them automatically, you must specify the PARTITION_TYPE = USER_SPECIFIED field.

Example 1

Notice

The IP addresses in the commands in this example have been desensitized. You must replace them with the actual IP address of your server when you verify the commands.

The following example describes how to create an external table in the Oracle mode of OceanBase Database based on an external file stored in the local file system and an external file stored in the database.

Prepare an external file.

Execute the following command to create a file named test_tbl1.csv in the /home/admin/external_csv directory on the server where the OBServer node is located.
```
[admin@xxx /home/admin/external_csv]# vi test_tbl1.csv
```
The content of the file is as follows:
```
1,'Emma'
2,'William'
3,'Olivia'
```
Set the import file path.

Notice

For security reasons, you can only modify the secure_file_priv system variable by using a local socket connection to the database. For more information, see secure_file_priv.
1. Execute the following command to log in to the server where the OBServer node is located.
```
ssh admin@10.10.10.1
```
2. Execute the following command to connect to the oracle001 tenant by using a local Unix socket connection.
```
obclient -S /home/admin/oceanbase/run/sql.sock -usys@oracle001 -p******
```
3. Execute the following SQL statement to set the import path to /home/admin/external_csv.
```
SET GLOBAL secure_file_priv = "/home/admin/external_csv";
```

Reconnect to the oracle001 tenant.

Here is an example:

obclient -h10.10.10.1 -P2881 -usys@oracle001 -p****** -A

Execute the following SQL statement to create an external table named test_tbl1_csv.

CREATE EXTERNAL TABLE test_tbl1_csv ( 
    id INT, 
    name VARCHAR(50)
    )
    LOCATION = '/home/admin/external_csv'
    FORMAT = (
      TYPE = 'CSV'
      FIELD_DELIMITER = ','
      FIELD_OPTIONALLY_ENCLOSED_BY =''''
      )
    PATTERN = 'test_tbl1.csv';

Execute the following SQL statement to query data from the test_tbl1_csv external table.

SELECT * FROM test_tbl1_csv;

The returned result is as follows:

+------+---------+
| ID   | NAME    |
+------+---------+
|    1 | Emma    |
|    2 | William |
|    3 | Olivia  |
+------+---------+
3 rows in set

Example 2

Example of creating an HDFS external table

With Kerberos:

CREATE EXTERNAL TABLE ext_data(ID NUMBER(32), NAME VARCHAR2(30),SCORE NUMBER(32))
FORMAT = (
TYPE = 'CSV'
FIELD_DELIMITER = ','
FIELD_OPTIONALLY_ENCLOSED_BY ='"'
)
LOCATION = 'hdfs://localhost:8020/user?principal=principal_str&keytab=/path/to/keytab&krb5conf=/path/to/krb5conf_file&configs=xxx=xxx#xxx=xxx'
PATTERN = 'data.csv';

where:

principal: specifies the user for login authentication.
keytab: specifies the path of the key file for user authentication.
krb5conf: specifies the path of the description file for the Kerberos environment.
configs: specifies additional HDFS configuration items. By default, this parameter is empty. However, in a Kerberos environment, this parameter usually has a value and needs to be configured. For example, dfs.data.transfer.protection=authentication,privacy specifies the data transmission protection level as authentication and privacy.

Without Kerberos:

CREATE EXTERNAL TABLE ext_data(ID NUMBER(32), NAME VARCHAR2(30),SCORE NUMBER(32))
FORMAT = (
  TYPE = 'CSV'
  FIELD_DELIMITER = ','
  FIELD_OPTIONALLY_ENCLOSED_BY ='"'
)
LOCATION = 'hdfs://localhost:8020/user'
PATTERN = 'test_tbl1.csv';

References

For more information about viewing and updating external files, see Manage external files.

OceanBase

Customer Stories

Documentation

Create an external table

Overview

Prerequisites

Considerations

Create an external table by using the command-line tool

Define the name of the external table

Notice

Define columns

Define LOCATION

Notice

Notice

Note

Notice

Define FORMAT

Notice

Note

(Optional) Define PATTERN

(Optional) Define partitions for the external table

Automatically define partitions for the external table

Manually define partitions for the external table

Example 1

Notice

Notice

Example 2

References