Create an external table|V4.6.0|OceanBase Database| docs|Distributed Database

Create an external table

Last Updated：2026-05-07 11:26:25 Updated

You can execute the CREATE EXTERNAL TABLE statement to create an external table. When you create an external table, you must specify the path and the format of the data files to read data from the external files.

Permissions

To create an external table, the current user must have the CREATE permission. For more information about how to view the permissions of the current user, see View user permissions.

Create an external table

The SQL statement for creating an external table is as follows:

CREATE EXTERNAL TABLE table_name (column_definition_list)
    LOCATION = 'file_name'
    {FORMAT = (format_type_options)
     | PROPERTIES = (properties_type_options)}
    [PARTITION BY (column_name [, column_name ...])]
    [PARTITION_TYPE = USER_SPECIFIED]
    [PATTERN = 'regex_pattern']
    [AUTO_REFRESH = 'xxx'];

column_definition_list:
    column_definition [, column_definition ...]

column_definition:
    column_name column_type [AS metadata$filecol{N} | external$filepos{N}]

format_type_options:
    type_csv_option
    | type_parquet_option
    | type_orc_option

type_csv_option:
    TYPE = 'CSV'
    LINE_DELIMITER = '<string>' | <expr>
    FIELD_DELIMITER = '<string>' | <expr>
    ESCAPE = '<character>' | <expr>
    FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>
    ENCODING = 'charset'
    NULL_IF = ('<string>' | <expr>, '<string>' | <expr> ...)
    SKIP_HEADER = <int>
    SKIP_BLANK_LINES = {TRUE | FALSE}
    TRIM_SPACE = {TRUE | FALSE}
    EMPTY_FIELD_AS_NULL = {TRUE | FALSE}
    IGNORE_LAST_EMPTY_COLUMN = {TRUE | FALSE}
    COMPRESSION = {GZIP | ZSTD | DEFLATE | NONE | SNAPPY_BLOCK}
    PARALLEL_PARSE_ON_SINGLE_FILE = {TRUE | FALSE}
    PARALLEL_PARSE_FILE_SIZE_THRESHOLD = <int>
    MAX_ROW_LENGTH = <int>

type_parquet_option:
    TYPE = 'PARQUET'
    [COLUMN_INDEX_TYPE = 'POSITION, NAME']

type_orc_option:
    TYPE = 'ORC'
    [COLUMN_INDEX_TYPE = 'POSITION, NAME']

properties_type_options:
    type_odps_option

type_odps_option:
    TYPE = 'ODPS'
    ACCESSID = '<string>',
    ACCESSKEY = '<string>',
    ENDPOINT = '<string>',
    PROJECT_NAME = '<string>',
    SCHEMA_NAME = '<string>',
    TABLE_NAME = '<string>',
    QUOTA_NAME = '<string>',
    COMPRESSION_CODE = '<string>',
    API_MODE = {"tunnel_api" | "storage_api"},
    SPLIT = {"byte" | "row"}

The following table describes the related parameters.

col_name col_type [AS (metadata$filecol{N} | external$filepos{N})]: defines a column. AS (metadata$filecol{N}) is used to manually define column mapping. AS (external$filepos{N}) is used to index columns by position.

The column types supported by external tables are the same as those supported by regular tables. For more information about the data types supported by OceanBase Database in MySQL mode, see Data types.

By default, the data columns in an external file are automatically mapped to the columns defined in the external table in order. That is, the first column of the external table corresponds to the first column of the external file.

For example, in the following example, the C1 column of the external table ext_t1 is automatically mapped to the first column of the external file, and the C2 column is automatically mapped to the second column of the external file.
```
CREATE EXTERNAL TABLE ext_t1 (
  C1 int,
  C2 int
  )
  LOCATION = 'oss://$ACCESS_ID:$ACCESS_KEY@$HOST/tpch_1g_data/lineitem/'
  FORMAT = (
  TYPE = 'CSV'
  FIELD_DELIMITER = '|'
  );
```
When the order of columns in an external file is different from the order of columns defined in the external table, you can specify column mapping in the following two ways:
- Use the metadata$filecol{N} syntax: This syntax is applicable to CSV files. It specifies that the columns of the external table correspond to the Nth column of the external file. In this syntax, the columns of the file are numbered starting from 1.
- Use the external$filepos{N} syntax: This syntax is applicable to Parquet and ORC files. It indexes columns by position. You must specify the COLUMN_INDEX_TYPE = 'POSITION' parameter.
For example, in the following example, C1 int AS (metadata$filecol2) specifies that the C1 column of the external table ext_t2 corresponds to the second column of the external file. C2 int AS (metadata$filecol4) specifies that the C2 column of the external table ext_t2 corresponds to the fourth column of the external file.
```
CREATE EXTERNAL TABLE ext_t2 (
            C1 int AS (metadata$filecol2),
            C2 int AS (metadata$filecol4)
  )
  LOCATION = 'oss://$ACCESS_ID:$ACCESS_KEY@$HOST/tpch_1g_data/lineitem/'
  FORMAT = (
  TYPE = 'CSV'
  FIELD_DELIMITER = '|'
  );
```
For Parquet and ORC files, you can use the external$filepos{N} syntax:
```
CREATE EXTERNAL TABLE ext_t3 (
            c1 INT AS (external$filepos1),
            c2 INT AS (external$filepos2),
            c3 VARCHAR(50) AS (external$filepos3),
            c4 DATE AS (external$filepos4)
  )
  LOCATION = 'path/to/data/'
  FORMAT = (TYPE = 'PARQUET' COLUMN_INDEX_TYPE = 'POSITION')
  PATTERN = "000000_0$";
```
Notice

If you manually define column mapping, the automatic column mapping feature will be disabled, and all columns must be manually defined.

Notice

When you use the external$filepos{N} syntax, you must specify the COLUMN_INDEX_TYPE = 'POSITION' parameter.
LOCATION = '<string>': specifies the path where the external file is stored. Usually, the data files of an external table are stored in a separate directory, which can contain subdirectories. When you create an external table, the table automatically collects all files in the directory.

The following two formats are supported:
- Local location format: LOCATION = '[file://] local_file_path'
  - local_file_path: can be a relative path or an absolute path. If you specify a relative path, the current directory must be the installation directory of OceanBase Database.
    
    Notice
    
    local_file_path must be a directory, not a file. If you want to specify a single file, you must specify the upper-level directory of the file in LOCATION, and specify the file by setting the PATTERN attribute.
  - In the local location format, when you configure the secure_file_priv system variable to specify the file path that OceanBase Database can access, secure_file_priv must be the upper-level directory of local_file_path, that is, local_file_path must be a subdirectory of secure_file_priv.
    
    The secure_file_priv system variable specifies the path that OceanBase Database can access when importing or exporting data to or from a file. For more information about secure_file_priv, see secure_file_priv.
- Remote location format:
  
  Notice
  
  When you use an object storage path, the parameters of the object storage path are separated by the & symbol. Make sure that the parameter values you enter contain only uppercase and lowercase letters, numbers, /-_$+=, and wildcards. If you enter other characters, the settings may fail.
  - When the file is stored in Alibaba Cloud OSS or AWS S3 or an object storage service compatible with the S3 protocol, the format is: LOCATION = '{oss\|s3}://$ACCESS_ID:$ACCESS_KEY@$HOST:s3_region/remote_file_path', where $ACCESS_ID, $ACCESS_KEY, and $HOST are the access information required to access Alibaba Cloud OSS, AWS S3, or an object storage service compatible with the S3 protocol. s3_region is the region information selected when you use S3. These sensitive access information are stored in the system table of the database in an encrypted manner.
  - When the file is stored in HDFS, the following formats are supported:
    - Single-node NameNode (NN) address-based access format: LOCATION = hdfs://localhost:port/PATH, where localhost specifies the address of HDFS, port specifies the port number of HDFS, and PATH specifies the file path in HDFS.
      - Kerberos authentication-based format: LOCATION = 'hdfs://localhost:port/user?principal=xxx&keytab=xxx&krb5conf=xxx&configs=xxx', where:
        
        principal: specifies the user for login authentication.
        
        keytab: specifies the path of the key file for user authentication.
        
        krb5conf: specifies the path of the description file for the Kerberos environment.
        
        configs: specifies additional HDFS configurations. By default, this parameter is empty. However, in a Kerberos environment, this parameter usually has a value, which needs to be configured. For example, dfs.data.transfer.protection=authentication,privacy specifies the data transmission protection level as authentication and privacy.
    - For a Hadoop HA cluster, the format is LOCATION = hdfs://nameserviceID/PATH, where nameserviceID specifies the logical name service ID of the Hadoop HA cluster, and PATH specifies the file path.
      
      Note
      
      Make sure that the client OBServer node is configured with the nameservice definition and failover strategy of the HA cluster.
      - For a Kerberos-authenticated cluster, the format is LOCATION = 'hdfs://nameserviceID/PATH?principal=xxx&keytab=xxx&krb5conf=xxx&configs=dfs.data.transfer.protection=${string}#dfs.nameservices=${nameservice id}#dfs.ha.namenodes.${nameservice id}=${namenode1}, ${namenode2}#dfs.namenode.rpc-address.${nameservice id}.${namenode1}=${namenode 1 address}#dfs.namenode.rpc-address.${nameservice id}.${namenode2}=${namenode 2 address}#dfs.ha.automatic-failover.enabled.${nameservice id}=true#dfs.client.failover.proxy.provider.${nameservice id}=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider', where:
        
        principal: specifies the login user. Set it to the pricipal of a non-NameNode node.
        
        keytab and krb5conf: the same as for a single-node NameNode.
        
        configs: specifies additional HDFS configurations. You can set only HA and security configurations.
        
        dfs.data.transfer.protection=${string}: specifies the dfs.data.transfer.protection parameter of the cluster.
        
        dfs.nameservices=${nameservice id}: specifies the nameservice (alias) of the HA cluster.
        
        dfs.ha.namenodes.${nameservice id}=${namenode1}, ${namenode2}: specifies the IDs of the NameNodes in the HA cluster.
        
        dfs.namenode.rpc-address.${nameservice id}.${namenode1}=${namenode 1 address}: specifies the address of the namenode1 to facilitate client routing.
        
        dfs.namenode.rpc-address.${nameservice id}.${namenode2}=${namenode 2 address}: specifies the address of the namenode2 to facilitate client routing.
        
        dfs.ha.automatic-failover.enabled.${nameservice id}=true: specifies that the HA cluster automatically obtains a valid NameNode to respond to requests.
        
        dfs.client.failover.proxy.provider.${nameservice id}=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider: specifies the logic class for switching the primary and standby nodes in the HA cluster. You can also customize and upload the logic required by the HA cluster.
      Notice
      
      Some HA configurations are bound to the namespace. For example, in the following sample, the mycluster is bound to the HA configurations. Make sure that the HA configurations are set properly.
FORMAT = ( TYPE = 'CSV'...): specifies the external file format as CSV.
- TYPE: specifies the type of the external file.
- LINE_DELIMITER: specifies the line delimiter. The default value is LINE_DELIMITER='\n' if you do not specify it.
- FIELD_DELIMITER: specifies the column delimiter. The default value is FIELD_DELIMITER='\t' if you do not specify it.
- ESCAPE: specifies the escape character. For example, ESCAPE ='*' specifies the asterisk (*) as the escape character to replace the default escape character (). The default value is ESCAPE ='\' if you do not specify it.
- FIELD_OPTIONALLY_ENCLOSED_BY: specifies the character that encloses field values. For example, ESCAPE = '"' specifies that values are enclosed in double quotation marks. The default value is empty if you do not specify it.
  
  Notice
  
  If the external table data file contains NULL values (not the string NULL, that is, not "NULL"), you must explicitly configure the FIELD_OPTIONALLY_ENCLOSED_BY parameter, and the value of this parameter cannot be empty.
- ENCODING: specifies the character set encoding format of the file. For information about all character sets supported in MySQL mode, see Character sets. The default value is UTF8MB4 if you do not specify it.
- NULL_IF: specifies which strings are treated as NULL. The default value is empty if you do not specify it.
- SKIP_HEADER: specifies the number of lines to skip from the header. The default value is not to skip the header.
- SKIP_BLANK_LINES: specifies whether to skip blank lines. The default value is FALSE if you do not specify it.
- TRIM_SPACE: specifies whether to remove leading and trailing spaces in fields. The default value is FALSE if you do not specify it.
- EMPTY_FIELD_AS_NULL: specifies whether to treat empty strings as NULL. The default value is FALSE if you do not specify it.
- IGNORE_LAST_EMPTY_COLUMN: specifies whether to ignore the last empty field at the end of a line. The default value is TRUE, which means that the last empty field is ignored.
- COMPRESSION: optional. Specifies the compression format of the file. Valid values:
  - GZIP/DEFLATE: specifies a GZIP-compressed file.
  - ZSTD: specifies a ZSTD-compressed file.
  - NONE: specifies that the file is not compressed. This is the default value.
  - SNAPPY_BLOCK: specifies a HADOOP-SNAPPY-compressed file.
- PARALLEL_PARSE_ON_SINGLE_FILE: specifies whether to enable parallel parsing of a single CSV file. The default value is TRUE if this parameter is not specified.
- PARALLEL_PARSE_FILE_SIZE_THRESHOLD: specifies the file size threshold for parallel parsing. The unit is bytes. The default value is 256 MB.
- MAX_ROW_LENGTH: specifies the maximum length of a single row of data. The unit is bytes. The default value is 2 MB. This parameter is used for the boundary determination stage of parallel parsing of a single CSV file and does not affect the correctness of data reading.
FORMAT = ( TYPE = 'PARQUET'...): specifies the external file format as PARQUET.
- TYPE: specifies the external file type as PARQUET.
- COLUMN_INDEX_TYPE: specifies the column index type. The supported values are 'POSITION' and 'NAME', which indicate to index columns by position or by name, respectively. This parameter is required when you use the external$filepos{N} syntax.
FORMAT = ( TYPE = 'ORC'...): specifies the external file format as ORC.
- TYPE: specifies the external file type as ORC.
- COLUMN_INDEX_TYPE: specifies the column index type. The supported value is 'POSITION', which indicates to index columns by position. This parameter is required when you use the external$filepos{N} syntax.

Note

Starting from OceanBase Database V4.6.0, the following features are supported for Parquet and ORC file format external tables:

Reading complex data types such as Array, Map, and JSON. The data types of the external table must be consistent with those supported by OceanBase Database. For more information, see Data types.
Reading Bloom Filter, which filters data based on the query predicate to reduce unnecessary I/O and accelerate queries.

TYPE = 'ODPS': specifies the external file format as ODPS. The following parameters are also supported:
- ACCESSID: specifies the AccessKey ID of the Alibaba Cloud account for identity authentication.
- ACCESSKEY: specifies the AccessKey secret corresponding to the AccessKey ID for identity authentication.
- ENDPOINT: specifies the endpoint of the ODPS service.
- PROJECT_NAME: specifies the name of the ODPS project to be accessed.
- SCHEMA_NAME: specifies the name of the schema in ODPS. This parameter is optional.
- TABLE_NAME: specifies the name of the target table in ODPS.
- QUOTA_NAME: specifies the name of the quota to be used. This parameter is optional.
- COMPRESSION_CODE: specifies the compression format of the data source. The supported values are ZLIB, ZSTD, LZ4, and ODPS_LZ4. If this parameter is not specified, compression is not enabled.
- API_MODE: specifies the API mode for calling ODPS. Valid values:
  - tunnel_api (default):
    - No special network configuration is required: this option is suitable for all deployment scenarios, where OceanBase Database and MaxCompute do not need to be deployed in the same VPC (Virtual Private Cloud).
    - No additional MaxCompute permissions are required: only the AccessID and AccessKey are required for authentication, and no MaxCompute Storage API permissions are required.
    - Applicable scenarios:
      - OceanBase Database and MaxCompute are not deployed in the same VPC.
      - MaxCompute Storage API is not enabled.
      - Data transmission has low latency requirements.
  - storage_api:
    - Network dependency: OceanBase Database and MaxCompute must be deployed in the same VPC to achieve low-latency, high-throughput data transmission.
    - Permission dependency: you must enable the Storage API in MaxCompute and ensure that the AccessKey has the corresponding permissions.
    - Applicable scenarios:
      - OceanBase Database and MaxCompute are deployed in the same VPC.
      - MaxCompute Storage API is enabled.
      - The data volume is extremely large or the real-time requirement is high.
- SPLIT: specifies the unit for task splitting when storage_api is used. Valid values: byte and row. If the data size of each row in a table varies greatly, set this parameter to byte. Otherwise, set it to row.
PATTERN: specifies the regular expression pattern to filter files in the LOCATION directory. For each file in the LOCATION directory, if it matches the pattern, the external table can access the file. Otherwise, it skips the file. If this parameter is not specified, the external table can access all files in the LOCATION directory by default.

Assume that a data.csv file is stored in the /home/admin/oceanbase/ directory on the local machine. The content of the file is as follows.

1,"lin",98
2,"hei",90
3,"ali",95

Example of an external table in the Parquet format

Connect to the MySQL tenant of the cluster by using the local Unix socket on the OBServer node.

Here is an example:
```
obclient -S /home/admin/oceanbase/run/sql.sock -uroot@sys -p********
```
For more information about how to connect to OceanBase Database by using the local Unix socket, see secure_file_priv.
Configure the database to access the /home/admin/oceanbase/ directory.
```
SET GLOBAL secure_file_priv = "/home/admin/oceanbase/";
```
After the command is executed, you must restart the session for the modification to take effect.

Connect to the database again and create an external table named ext_t3.

CREATE EXTERNAL TABLE ext_t3(id int, name char(10),score int)
LOCATION = '/home/admin/oceanbase/'
FORMAT = (
TYPE = 'CSV'
FIELD_DELIMITER = ','
FIELD_OPTIONALLY_ENCLOSED_BY ='"'
)
PATTERN = 'data.csv';

After the external table is created, you can use the SHOW CREATE TABLE statement to view the table definition, just like a regular table.

SHOW CREATE TABLE ext_t3;

The query result is as follows:

+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table  | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ext_t3 | CREATE EXTERNAL TABLE `ext_t3` (
  `id` int(11) GENERATED ALWAYS AS (metadata$filecol1),
  `name` char(10) GENERATED ALWAYS AS (metadata$filecol2),
  `score` int(11) GENERATED ALWAYS AS (metadata$filecol3)
)
LOCATION='file:///home/admin/oceanbase/'
PATTERN='data.csv'
FORMAT (
  TYPE = 'CSV',
  FIELD_DELIMITER = ',',
  FIELD_OPTIONALLY_ENCLOSED_BY = '"',
  ENCODING = 'utf8mb4'
)DEFAULT CHARSET = utf8mb4 ROW_FORMAT = DYNAMIC COMPRESSION = 'zstd_1.3.8' REPLICA_NUM = 1 BLOCK_SIZE = 16384 USE_BLOOM_FILTER = FALSE TABLET_SIZE = 134217728 PCTFREE = 0 |
+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set

You can also access the external table like a regular table. When you query an external table, the system reads the external file through the driver layer of the external table, parses the file in the specified format, converts the data to internal data types of OceanBase Database, and returns the data row. Here is an example of querying the external table lineitem.

SELECT * FROM ext_t3;

The query result is as follows:

+----+------+-------+
| id | name | score |
+----+------+-------+
|  1 | lin  |    98 |
|  2 | hei  |    90 |
|  3 | ali  |    95 |
+----+------+-------+
3 rows in set

In addition, you can perform combination queries between an external table and a regular table. Assume that the current database contains a regular table named info, and the data in the table is as follows:

+------+--------+------+
| name | sex    | age  |
+------+--------+------+
| lin  | male   |    8 |
| hei  | male   |    9 |
| li   | female |    8 |
+------+--------+------+
3 rows in set

Here is an example of combining the external table ext_t3 with the regular table info.

SELECT info.* FROM info, ext_t3 WHERE info.name = ext_t3.name AND ext_t3.score > 90;

The query result is as follows:

+------+--------+------+
| name | sex    | age  |
+------+--------+------+
| lin  | male   |    8 |
| li   | female |    8 |
+------+--------+------+
2 rows in set

For more information about how to query data, see Read data.

Create an ODPS Java SDK external table

To create an ODPS Java SDK external table, OceanBase Database must be deployed in an environment that supports the Java SDK. For more information about how to deploy the Java SDK environment, see Deploy the OceanBase Database Java SDK environment.

Here is an example of creating an ODPS Java SDK external table:

CREATE EXTERNAL TABLE lineitem(
    l_orderkey BIGINT,
    l_partkey BIGINT, 
    l_suppkey BIGINT,
    l_linenumber BIGINT, 
    l_quantity DECIMAL(15,2), 
    l_extendedprice DECIMAL(15,2),
    l_discount DECIMAL(15,2),
    l_tax DECIMAL(15,2),
    l_returnflag CHAR(1),
    l_linestatus CHAR(1), 
    l_shipdate DATE, 
    l_commitdate DATE, 
    l_receiptdate DATE,
    l_shipinstruct CHAR(25),
    l_shipmode CHAR(10),
    l_comment VARCHAR(44))
    PROPERTIES = (
        TYPE = 'ODPS'
        ACCESSID = '***********'
        ACCESSKEY = '***********'
        ENDPOINT = 'http://service.cn-hangzhou.maxcompute.aliyun.com/api',
        PROJECT_NAME = 'bigdata_public_dataset',
        SCHEMA_NAME = 'tpch_10g',
        TABLE_NAME = 'lineitem',
        QUOTA_NAME = '',
        COMPRESSION_CODE = '',
        API_MODE = "storage_api",
        SPLIT = "byte"
        );

Considerations for using external tables

External tables can only be queried and cannot be used for DML operations.
If the external file accessed by the external table is deleted, the system does not return an error but instead returns an empty row.
Since the files accessed by the external table are managed by an external storage system, if the external storage is unavailable, querying the external table will result in an error.

What to do next

When you create an external table, the system saves the list of files that match the PATTERN in the specified path of the LOCATION parameter in the system table of OceanBase Database. When you scan the external table, the system accesses the external files based on the list. If new files are added to the external directory, you must update the external table file list to add the new files. For more information, see Manage external files.

After you create an external table, you can also drop it. The statement for dropping an external table is the same as that for dropping a regular table. You can use the DROP TABLE statement to drop an external table. For more information, see Drop a table.

OceanBase

Customer Stories

Documentation

Create an external table

Permissions

Create an external table

Notice

Notice

Notice

Notice

Note

Notice

Notice

Note

Example of an external table in the Parquet format

Create an ODPS Java SDK external table

Considerations for using external tables

What to do next

References