CREATE EXTERNAL CATALOG|V4.4.2|OceanBase Database| docs|Distributed Database

CREATE EXTERNAL CATALOG

Last Updated：2026-07-17 03:39:41 Updated

Purpose

This statement is used to create an external data catalog (External Catalog) in the database, which is used to connect to external data sources and retrieve metadata about external data. It allows direct querying of external data without the need for data import or migration.

Limitations and considerations

Currently, only ODPS and HMS types of External Catalogs are supported.
Before creating an HMS Catalog, if the HMS is expected to access file data stored in HDFS, the Java SDK environment must be deployed. For detailed information on configuring the Java SDK environment, see Deploy the OceanBase Database Java SDK environment.

Privilege requirements

To execute the CREATE EXTERNAL CATALOG statement, the current user must have the CREATE CATALOG privilege. For more information about OceanBase Database privileges, see Privilege types in MySQL-compatible mode.

Syntax

CREATE EXTERNAL CATALOG [IF NOT EXISTS] external_catalog_name
    PROPERTIES [=] (properties_type_options);

properties_type_options:
    odps_type_list
    | hms_type_list

odps_type_list:
    TYPE = 'ODPS',
    [ACCESSTYPE = 'accesstype_string',]
    ACCESSID = 'string',
    ACCESSKEY = 'string',
    STSTOKEN = 'string',
    ENDPOINT = 'string',
    TUNNEL_ENDPOINT = 'string',
    PROJECT_NAME = 'string',
    [QUOTA_NAME = 'string',]
    [COMPRESSION = 'compression_string',]
    REGION = 'string'

hms_type_list:
    TYPE = 'HMS',
    URI = "string"
    [, KRB5CONF = "string"]
    [, KEYTAB = "string"]
    [, PRINCIPAL = "string"]
    [, MAX_CLIENT_POOL_SIZE = 20]
    [, SOCKET_TIMEOUT = 10000000]

accesstype_string:
    aliyun
    | sts
    | app

compression_string:
    zlib
    | zstd
    | lz4
    | odps_lz4

Parameters

Parameter	Description
IF NOT EXISTS	Optional. If this clause is specified, no error is returned even if the external data catalog to be created already exists in the current tenant. Instead, a warning is returned. If this clause is not specified, an error is returned.
external_catalog_name	The name of the external data catalog to be created.
PROPERTIES [=] (properties_type_options)	The properties of the external data catalog. For more information, see properties_type_options.

properties_type_options

odps_type_list: The properties of the ODPS external data catalog (ODPS Catalog). For more information, see odps_type_list.
hms_type_list: The properties of the HMS external data catalog (HMS Catalog). For more information, see hms_type_list.

odps_type_list

TYPE: The type of the external data catalog. Valid value: ODPS. This value indicates that the external table reads and writes data from and to MaxCompute.
ACCESSTYPE: Optional. The type of the MaxCompute account. Default value: aliyun. Valid values (case-insensitive):
- aliyun
- sts
- app
ACCESSID: The AccessKey ID of the account. If the value of ACCESSTYPE is aliyun or app, the AccessKey ID of the aliyun or app account or the RAM user with MaxCompute access permissions.
ACCESSKEY: The AccessKey secret of the account. If the value of ACCESSTYPE is aliyun or app, the AccessKey secret of the aliyun or app account or the RAM user with MaxCompute access permissions.
STSTOKEN: The token. If the value of ACCESSTYPE is sts, the token for accessing MaxCompute.
ENDPOINT: The endpoint of MaxCompute.
TUNNEL_ENDPOINT: The endpoint of Tunnel. MaxCompute Catalog uses Tunnel SDK to obtain data.
PROJECT_NAME: The name of the project space in MaxCompute. A project space is the basic organizational unit of MaxCompute, similar to a database or schema in a traditional database.
QUOTA_NAME: Optional. The name of the quota. A quota in MaxCompute represents a resource pool for computing, access, and writing. If you have configured a quota, you can use this parameter to specify the quota.
COMPRESSION: Optional. The compression format of the data source. If this parameter is not specified, compression is not enabled. Valid values (case-insensitive):
- zlib
- zstd
- lz4
- odps_lz4
REGION: The region where MaxCompute is enabled.

hms_type_list

TYPE: The type of the external data catalog. Valid value: HMS. This value indicates that the external table reads data from Hive-managed data.
URI: The Thrift URI of the HMS service. The format is thrift://$host:$port, where:
- $host: the IP address of the thrift server.
- $port: the port of the thrift server. The default port for HMS is 9083.
KRB5CONF: Optional. The path of the Kerberos configuration file required to access the HMS service that uses Kerberos authentication.

Notice

You must set the KRB5CONF parameter only when Kerberos authentication is enabled for HMS.
KEYTAB: Optional. The path of the KEYTAB key file required to access the HMS service that uses Kerberos authentication. If OceanBase Database is deployed in a distributed manner, the corresponding OBServer node must have this file.

Notice

You must set the KEYTAB parameter only when Kerberos authentication is enabled for HMS.
PRINCIPAL: Optional. The name of the Kerberos principal required to access the HMS service that uses Kerberos authentication. The name is usually in the service/HOST@REGION.com format, for example, hive/hadoop@QA.COM.

Notice

You must set the PRINCIPAL parameter only when Kerberos authentication is enabled for HMS.
MAX_CLIENT_POOL_SIZE: Optional. The maximum number of available client queues for the current catalog. Default value: 20. This indicates that the current HMS catalog can start up to 20 clients to connect to the HMS service.
SOCKET_TIMEOUT: Optional. The timeout period for client access to the current catalog. Default value: 10000000 (10 seconds).

Examples

Create an external data catalog of the ODPS type.

obclient> CREATE EXTERNAL CATALOG test_odps_catalog
    PROPERTIES = (
        TYPE = 'ODPS',
        ACCESSID = '$odps_accessid',
        ACCESSKEY = '$odps_accesskey',
        ENDPOINT = '$odps_endpoint',
        TUNNEL_ENDPOINT = 'http://xxx.maxcompute.aliyun.com',
        PROJECT_NAME = 'mysqltest_regression_sqlqa',
        QUOTA_NAME = '',
        COMPRESSION_CODE = ''
    );

Create an external data catalog of the HMS type.

Create an HMS catalog with simple authentication. No location authentication is required.

obclient> CREATE EXTERNAL CATALOG test_hms_catalog 
    PROPERTIES = (
        TYPE = 'HMS',
        URI = "thrift://xxx.xxx.xxx.xxx:xxxx"
    );

Create an HMS catalog with Kerberos authentication.

obclient> CREATE EXTERNAL CATALOG test_hms_catalog_kerberos
    PROPERTIES = (
        TYPE = 'HMS',
        URI = "thrift://xxx.xxx.xxx.xxx:xxxx",
        PRINCIPAL = "hive/xxx@xxx.COM",
        KEYTAB = "/xxx/xxx/xxx/hadoop.keytab",
        KRB5CONF = "/etc/xxx.conf"
    );

Location authentication

OceanBase Database HMS Catalog supports key management through External Location to ensure secure access to external storage. When accessing tables in HMS, the system automatically matches the corresponding External Location based on the table path. If a match is found, it uses the associated key for authentication.

Location authentication scenarios

Based on the storage type and security requirements, Location authentication is divided into the following scenarios:

Storage type	Authentication method
OSS	AK/SK authentication.
HDFS	HDFS supports the following authentication methods: No authentication (no Kerberos required, and no HDFS user needs to be specified). User authentication (no Kerberos required, but access must be made as a specific HDFS user). Kerberos authentication, with HDFS operating in a single NameNode (non-HA mode). Kerberos authentication, with HDFS operating in high availability (HA) mode. No Kerberos enabled, but HDFS operating in high availability (HA) mode.

Storage type

Authentication method

OSS

AK/SK authentication.

HDFS

HDFS supports the following authentication methods:

No authentication (no Kerberos required, and no HDFS user needs to be specified).
User authentication (no Kerberos required, but access must be made as a specific HDFS user).
Kerberos authentication, with HDFS operating in a single NameNode (non-HA mode).
Kerberos authentication, with HDFS operating in high availability (HA) mode.
No Kerberos enabled, but HDFS operating in high availability (HA) mode.

Configuration steps

Determine the Location authentication scenario.

Based on the storage type and security requirements, select the appropriate authentication scenario.
Create a Location object.

Use the CREATE LOCATION statement to create a Location object for the selected scenario. For more information, see CREATE LOCATION.
Associate the catalog.

No additional configuration is required when creating the catalog. The system automatically matches the Location based on the storage path of the corresponding table.

Example of configuring Location authentication scenarios

OSS storage authentication

Create a Location object for accessing Alibaba Cloud OSS object storage.

CREATE LOCATION oss_credential
    URL = 'oss://bucket-name/path'
    CREDENTIAL = (
        ACCESSID = 'your-access-key-id'
        ACCESSKEY = 'your-access-key-secret'
        HOST = 'oss-region.aliyuncs.com'
    );

HDFS storage authentication

Scenario 1: No authentication mode.

No Location needs to be created. This is suitable for development or testing environments where Kerberos authentication is not enabled in the HDFS cluster (i.e., hadoop.security.authentication=simple).
Scenario 2: User authentication mode.

No Kerberos is required, but access must be made as a specific HDFS user.

Note

This is suitable for clusters where Kerberos authentication is not enabled, but specific HDFS users are required to access certain paths.

Create a Location: Specify the HDFS user.
```
CREATE LOCATION hdfs_user
    URL = 'hdfs://namenode:8020/'
    CREDENTIAL (
        USER = 'hdfs_user_name'
    );
```

Scenario 3: Kerberos enabled, HDFS operating in single NameNode (non-HA) mode.

Note

The dfs.data.transfer.protection parameter in CONFIGS must match the configuration of the HDFS cluster.

Create a Location: Kerberos authentication + single-point HDFS.

CREATE LOCATION hdfs_kerberos_single
    URL = 'hdfs://namenode.example.com:8020/'
    CREDENTIAL (
        PRINCIPAL = "hdfs/xxx@xxx.COM",
        KEYTAB = "/data/hdfs.keytab",
        KRB5CONF = "/data/krb5.conf",
        CONFIGS = 'dfs.data.transfer.protection=integrity'
    );

Scenario 4: Kerberos enabled, HDFS operating in high availability (HA) mode.

CREATE LOCATION hdfs_kerberos_ha
    URL = 'hdfs://${nameservice id}'
    CREDENTIAL (
        PRINCIPAL = "hdfs/xxx@xxx.COM",
        KEYTAB = "/etc/ob/hdfs.keytab",
        KRB5CONF = "/etc/krb5.conf",
        CONFIGS = 'dfs.data.transfer.protection=${string}#dfs.nameservices=${nameservice id}#dfs.ha.namenodes.${nameservice id}=${namenode1}, ${namenode2}#dfs.namenode.rpc-address.${nameservice id}.${namenode1}=${namenode 1 address}#dfs.namenode.rpc-address.${nameservice id}.${namenode2}=${namenode 2 address}#dfs.ha.automatic-failover.enabled.${nameservice id}=true#dfs.client.failover.proxy.provider.${nameservice id}=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
        );

Scenario 5: Kerberos not enabled, but HDFS operating in high availability (HA) mode.

Note

If Kerberos is not enabled, but HDFS is operating in high availability (HA) mode, the PRINCIPAL, KEYTAB, and KRB5CONF parameters do not need to be set.

CREATE LOCATION hdfs_location_ha
    URL = 'hdfs://${nameservice id}'
    CREDENTIAL (
        CONFIGS = 'dfs.nameservices=${nameservice id}#dfs.ha.namenodes.${nameservice id}=${namenode1}, ${namenode2}#dfs.namenode.rpc-address.${nameservice id}.${namenode1}=${namenode 1 address}#dfs.namenode.rpc-address.${nameservice id}.${namenode2}=${namenode 2 address}#dfs.ha.automatic-failover.enabled.${nameservice id}=true#dfs.client.failover.proxy.provider.${nameservice id}=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
        );

OceanBase

Customer Stories

Documentation

CREATE EXTERNAL CATALOG

Purpose

Limitations and considerations

Privilege requirements

Syntax

Parameters

properties_type_options

odps_type_list

hms_type_list

Notice

Notice

Notice

Examples

Location authentication

Location authentication scenarios

Configuration steps

Example of configuring Location authentication scenarios

OSS storage authentication

HDFS storage authentication

Note

Note

Note

References