Create an external table|V4.4.2|OceanBase Database| docs|Distributed Database

Create an external table

Last Updated：2026-07-17 03:39:40 Updated

Use the CREATE EXTERNAL TABLE statement to create an external table. When you create an external table, you need to specify the path and format of the data files so that data can be read from the external files.

Privilege requirements

To create an external table, the current user must have the CREATE TABLE privilege. For more information about how to view the privileges of the current user, see View user privileges. If you do not have the CREATE TABLE privilege, contact your administrator to grant you the privilege. For more information about how to grant privileges to a user, see Directly grant privileges.

Create an external table

The following example shows how to create an external table:

CREATE EXTERNAL TABLE table_name
( col_name col_type [AS (metadata$filecol{N})] 
  [ , col_name col_type [AS (metadata$filecol{N})] ]
  [ , ... ] )
LOCATION = '<string>'
FORMAT = (
  TYPE = 'CSV'
  LINE_DELIMITER = '<string>' | <expr>
  FIELD_DELIMITER = '<string>' | <expr>
  ESCAPE = '<character>' | <expr>
  FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>
  ENCODING = 'charset'
  NULL_IF = ('<string>' | <expr>, '<string>' | <expr> ...)
  SKIP_HEADER = <int>
  SKIP_BLANK_LINES = { TRUE | FALSE }
  TRIM_SPACE = { TRUE | FALSE }
  EMPTY_FIELD_AS_NULL = { TRUE | FALSE }
  IGNORE_LAST_EMPTY_COLUMN = {TRUE | FALSE}
  COMPRESSION = {GZIP | ZSTD | DEFLATE | NONE | SNAPPY_BLOCK}
  PARALLEL_PARSE_ON_SINGLE_FILE = {TRUE | FALSE}
  PARALLEL_PARSE_FILE_SIZE_THRESHOLD = <int>
  MAX_ROW_LENGTH = <int>
)
[ PATTERN = '<regex_pattern>' ]

The following table describes the parameters in the preceding example.

col_name col_type [AS (metadata$filecol{N})]: defines a column. AS (metadata$filecol{N}) specifies the mapping of the column in the external table to the Nth column in the external file.

The column types supported by an external table are the same as those supported by a regular table. For more information about the data types supported by OceanBase Database in Oracle mode, see Data types.

By default, the data columns in the external file are automatically mapped to the columns in the external table in order. That is, the first column in the external file is mapped to the first column in the external table.

For example, in the following example, the C1 column in the ext_t1 table is automatically mapped to the first column in the external file, and the C2 column is automatically mapped to the second column in the external file.
```
CREATE EXTERNAL TABLE ext_t1 (
  C1 int,
  C2 int
  )
  LOCATION = 'oss://$ACCESS_ID:$ACCESS_KEY@$HOST/tpch_1g_data/lineitem/'
  FORMAT = (
  TYPE = 'CSV'
  FIELD_DELIMITER = '|'
  );
```
If the order of columns in the external file is different from the order of columns in the external table, you can use a pseudo column in the format of metadata$filecol{N} to specify the mapping of the column in the external table to the Nth column in the external file. The columns in the external file are numbered starting from 1.

For example, in the following example, C1 int AS (metadata$filecol2) indicates that the C1 column in the ext_t2 table is mapped to the second column in the external file. C2 int AS (metadata$filecol4) indicates that the C2 column in the ext_t2 table is mapped to the fourth column in the external file.
```
CREATE EXTERNAL TABLE ext_t2 (
            C1 int AS (metadata$filecol2),
            C2 int AS (metadata$filecol4)
  )
  LOCATION = 'oss://$ACCESS_ID:$ACCESS_KEY@$HOST/tpch_1g_data/lineitem/'
  FORMAT = (
  TYPE = 'CSV'
  FIELD_DELIMITER = '|'
  );
```
Notice

If you manually define the mapping of columns, the automatic column mapping feature is disabled. You must manually define the mapping of all columns.
LOCATION = '<string>': specifies the path where the external file is stored. Usually, the data files of an external table are stored in a separate directory, which can contain subdirectories. When you create an external table, the system automatically collects all files in the directory.

The following two formats are supported:
- Local location format: LOCATION = '[file://] local_file_path'
  - local_file_path: can be a relative path or an absolute path. If you specify a relative path, the current directory must be the installation directory of OceanBase Database.
    
    Notice
    
    local_file_path must be a directory, not a file. If you want to specify a single file, you must specify the directory that contains the file in LOCATION, and then use the PATTERN parameter to specify the file.
  - For the local location format, when you configure the system variable secure_file_priv to specify the file path that OceanBase Database can access, the value of secure_file_priv must be the directory that contains the local_file_path directory. In other words, local_file_path must be a subdirectory of the secure_file_priv directory.
    
    The tenant-level system variable secure_file_priv specifies the file path that OceanBase Database can access when you import or export data to or from a file. For more information about secure_file_priv, see secure_file_priv.
- Remote location format:
  
  Notice
  
  When you specify an object storage path, the parameters in the object storage path are separated by the & symbol. Make sure that the parameter values contain only uppercase and lowercase letters, digits, /-_$+=, and wildcards. If you specify other characters, the configuration may fail.
  - If the file is stored in Alibaba Cloud OSS or AWS S3, the format is: LOCATION = '{oss\|s3}://$ACCESS_ID:$ACCESS_KEY@$HOST:s3_region/remote_file_path', where $ACCESS_ID, $ACCESS_KEY, and $HOST are the access information required for accessing Alibaba Cloud OSS, AWS S3, or an object storage service that is compatible with the S3 protocol, and s3_region is the region selected when you use S3. These sensitive access information are stored in the system table of the database in an encrypted manner.
  - If the file is stored in HDFS, the format is as follows:
    - If you access the cluster based on the address of a single-node NameNode (NN), the format is: LOCATION = hdfs://localhost:port/PATH, where localhost specifies the address of HDFS, port specifies the port number of HDFS, and PATH specifies the file path in HDFS.
      - If you access the cluster with Kerberos authentication, the format is: LOCATION = 'hdfs://localhost:port/user?principal=xxx&keytab=xxx&krb5conf=xxx&configs=xxx', where:
        
        principal: specifies the user for login authentication.
        
        keytab: specifies the path of the key file for user authentication.
        
        krb5conf: specifies the path of the description file for the Kerberos environment.
        
        configs: specifies additional HDFS configurations. The default value is an empty string. However, in a Kerberos environment, this parameter usually has a value, which needs to be configured. For example: dfs.data.transfer.protection=authentication,privacy, which specifies the data transmission protection level as authentication and privacy.
    - If you access the cluster based on the logical naming service of Hadoop high availability (HA), the format is: LOCATION = hdfs://nameserviceID/PATH, where nameserviceID specifies the ID of the logical naming service of Hadoop HA, and PATH specifies the file path.
      
      Note
      
      Make sure that the client OBServer is configured with the nameservice definition and failover strategy of the HA cluster.
      - The format with Kerberos authentication is: LOCATION = 'hdfs://nameserviceID/PATH?principal=xxx&keytab=xxx&krb5conf=xxx&configs=dfs.data.transfer.protection=${string}#dfs.nameservices=${nameservice id}#dfs.ha.namenodes.${nameservice id}=${namenode1}, ${namenode2}#dfs.namenode.rpc-address.${nameservice id}.${namenode1}=${namenode 1 address}#dfs.namenode.rpc-address.${nameservice id}.${namenode2}=${namenode 2 address}#dfs.ha.automatic-failover.enabled.${nameservice id}=true#dfs.client.failover.proxy.provider.${nameservice id}=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider', where:
        
        principal: specifies the user for Kerberos authentication. Set it to the pricipal of a non-primary namenode (NN).
        
        keytab and krb5conf: same as in a single-node NN.
        
        configs: specifies additional HDFS configurations. You can set only HA configurations and security configurations.
        
        dfs.data.transfer.protection=${string}: specifies the dfs.data.transfer.protection parameter of the cluster.
        
        dfs.nameservices=${nameservice id}: specifies the nameservice alias of the current HA cluster.
        
        dfs.ha.namenodes.${nameservice id}=${namenode1}, ${namenode2}: specifies the list of namenode IDs in the HA cluster.
        
        dfs.namenode.rpc-address.${nameservice id}.${namenode1}=${namenode 1 address}: specifies the address of the namenode1 to facilitate client routing.
        
        dfs.namenode.rpc-address.${nameservice id}.${namenode2}=${namenode 2 address}: specifies the address of the namenode2 to facilitate client routing.
        
        dfs.ha.automatic-failover.enabled.${nameservice id}=true: specifies that the HA cluster automatically obtains a service from an available namenode after receiving a request.
        
        dfs.client.failover.proxy.provider.${nameservice id}=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider: specifies the logic tool class for primary/standby switching in the HA cluster. You can also customize and upload the logic required by the HA cluster.
      Notice
      
      HA configurations are bound to the namespace. For example, in the following sample three, mycluster is the namespace. Make sure to configure the related parameters correctly.
FORMAT = ( TYPE = 'CSV'...): specifies the external file format as CSV.
- TYPE: specifies the type of the external file.
- LINE_DELIMITER: specifies the line delimiter of the file.
- FIELD_DELIMITER: specifies the column delimiter of the file.
- ESCAPE: specifies the escape character of the file.
- FIELD_OPTIONALLY_ENCLOSED_BY: specifies the symbol for enclosing field values. For example, ESCAPE = '"' indicates that the value is enclosed in double quotation marks. If you do not specify this parameter, the default value is empty.
  
  Notice
  
  If the external file contains NULL values (not the string NULL, that is, not "NULL"), you must explicitly configure the FIELD_OPTIONALLY_ENCLOSED_BY parameter and the value of this parameter cannot be empty.
- ENCODING: specifies the character set encoding format of the file. For information about all character sets supported in Oracle mode, see Character sets. If you do not specify this parameter, the default value is UTF8MB4.
- NULL_IF: specifies the strings to be treated as NULL. If you do not specify this parameter, the default value is empty.
- SKIP_HEADER: specifies the number of header rows to skip. If you do not specify this parameter, the default value is 0, indicating that the header rows are not skipped.
- SKIP_BLANK_LINES: specifies whether to skip blank lines. If you do not specify this parameter, the default value is FALSE.
- TRIM_SPACE: specifies whether to remove leading and trailing spaces in fields. If you do not specify this parameter, the default value is FALSE.
- EMPTY_FIELD_AS_NULL: specifies whether to treat empty strings as NULL. If you do not specify this parameter, the default value is FALSE.
- IGNORE_LAST_EMPTY_COLUMN: specifies whether to ignore the last empty field at the end of a line. The default value is TRUE, indicating that the last empty field is ignored.
- COMPRESSION: optional. Specifies the compression format of the file. Valid values:
  - GZIP/DEFLATE: specifies a GZIP-compressed file.
  - ZSTD: specifies a ZSTD-compressed file.
  - NONE: specifies an un-compressed file (default value).
  - SNAPPY_BLOCK: specifies a HADOOP-SNAPPY-compressed file.
    
    Note
    
    For V4.4.2, the compression format SNAPPY_BLOCK is supported starting from V4.4.2 BP1.
- PARALLEL_PARSE_ON_SINGLE_FILE: specifies whether to enable parallel parsing of a single CSV file. If you do not specify this parameter, the default value is TRUE.
- PARALLEL_PARSE_FILE_SIZE_THRESHOLD: specifies the file size threshold for parallel parsing, in bytes. The default value is 256 MB.
- MAX_ROW_LENGTH: specifies the maximum length of a single row, in bytes. The default value is 2 MB. This parameter is used for the boundary determination stage of parallel parsing of a single CSV file and does not affect the correctness of data reading.
Note

For V4.4.2, the FORMAT option supports the following parameters starting from V4.4.2 BP1: PARALLEL_PARSE_ON_SINGLE_FILE, PARALLEL_PARSE_FILE_SIZE_THRESHOLD, and MAX_ROW_LENGTH.
FORMAT = ( TYPE = 'PARQUET'...): specifies the external file format as PARQUET.
PATTERN: specifies a regular expression pattern to filter files in the LOCATION directory. For each file in the LOCATION directory, if the file can match the pattern, the external table can access the file; otherwise, it skips the file. If you do not specify this parameter, the external table can access all files in the LOCATION directory by default.

Assume that the /home/admin/oceanbase/ directory on your local machine contains a data.csv file, and the content of the file is as follows.

1,"lin",98
2,"hei",90
3,"ali",95

Connect to the Oracle tenant of the cluster by using a local Unix socket as the tenant administrator on an OBServer node.

Here is an example:
```
obclient -S /home/admin/oceanbase/run/sql.sock -uroot@sys -p********
```
For more information about how to connect to OceanBase Database by using a local Unix socket, see secure_file_priv.
Configure the path /home/admin/oceanbase/ that the database can access.
```
SET GLOBAL secure_file_priv = "/home/admin/oceanbase/";
```
After the command is executed, you must restart the session for the modification to take effect.

Connect to the database again and create the external table ext_t3.

CREATE EXTERNAL TABLE ext_t3(ID NUMBER(32), NAME VARCHAR2(30),SCORE NUMBER(32))
LOCATION = '/home/admin/oceanbase/'
FORMAT = (
TYPE = 'CSV'
FIELD_DELIMITER = ','
FIELD_OPTIONALLY_ENCLOSED_BY ='"'
)
PATTERN = 'data.csv';

After the external table is created, you can execute the SHOW CREATE TABLE statement to view the table definition, just like a regular table.

SHOW CREATE TABLE ext_t3;

The query result is as follows:

+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| TABLE  | CREATE TABLE                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXT_T3 | CREATE EXTERNAL TABLE "EXT_T3" (
  "ID" NUMBER(32) GENERATED ALWAYS AS (METADATA$FILECOL1),
  "NAME" VARCHAR2(30) GENERATED ALWAYS AS (METADATA$FILECOL2),
  "SCORE" NUMBER(32) GENERATED ALWAYS AS (METADATA$FILECOL3)
)
LOCATION='file:///home/admin/oceanbase/'
PATTERN='data.csv'
FORMAT (
  TYPE = 'CSV',
  FIELD_DELIMITER = ',',
  FIELD_OPTIONALLY_ENCLOSED_BY = '"',
  ENCODING = 'utf8mb4'
)COMPRESS FOR ARCHIVE REPLICA_NUM = 1 BLOCK_SIZE = 16384 USE_BLOOM_FILTER = FALSE TABLET_SIZE = 134217728 PCTFREE = 0 |
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set

You can also access the external table like a regular table. When you query an external table, the system reads the external file directly through the driver layer of the external table, parses the file according to the file format, converts the data into internal data types of OceanBase Database, and returns the data rows. Here is an example of querying the external table lineitem that you just created.

SELECT * FROM ext_t3;

The query result is as follows:

+----+------+-------+
| ID | NAME | SCORE |
+----+------+-------+
|  1 | lin  |    98 |
|  2 | hei  |    90 |
|  3 | ali  |    95 |
+----+------+-------+
3 rows in set

In addition, you can perform combined queries between an external table and a regular table. Assume that the current database contains a regular table named info with the following data:

+------+--------+------+
| NAME | SEX    | AGE  |
+------+--------+------+
| lin  | male   |    8 |
| hei  | male   |    9 |
| li   | female |    8 |
+------+--------+------+
3 rows in set

Here is an example of performing a combined query between the external table ext_t3 and the regular table info.

SELECT info.* FROM info, ext_t3 WHERE info.name = ext_t3.name AND ext_t3.score > 90;

The query result is as follows:

+------+--------+------+
| NAME | SEX    | AGE  |
+------+--------+------+
| lin  | male   |    8 |
| li   | female |    8 |
+------+--------+------+
2 rows in set

For more information about query operations, see Read data.

Considerations

You can query external tables but cannot perform DML operations on them.
If the external file accessed by an external table is deleted, no error is returned when you query the external table. Instead, an empty row is returned.
If the external storage system is unavailable, an error is returned when you query an external table.

What to do next

When you create an external table, OceanBase Database saves the list of files that match the PATTERN in the specified path of the LOCATION parameter to a system table. When you scan the external table, it accesses the external files based on the list. If other files are added to the external directory, you must update the external table files to add the new files to the list. For more information, see Manage external files.

After you create an external table, you can also drop it. The statement for dropping an external table is the same as that for dropping a regular table. You can execute the DROP TABLE statement to drop an external table. For more information, see Drop a table.

OceanBase

Customer Stories

Documentation

Create an external table

Privilege requirements

Create an external table

Notice

Notice

Notice

Note

Notice

Notice

Note

Note

Considerations

What to do next

References