Create a catalog|V4.6.0|OceanBase Database| docs|Distributed Database

Create a catalog

Last Updated：2026-05-07 11:26:25 Updated

This topic describes how to create an external catalog by using SQL statements.

Permissions

To create an external catalog, you must have the CREATE CATALOG privilege. For more information about the privileges of OceanBase Database, see Privilege types in MySQL mode.

Syntax

The SQL statement for creating an external catalog is as follows:

CREATE EXTERNAL CATALOG [IF NOT EXISTS] external_catalog_name
    PROPERTIES [=] (properties_type_options);

properties_type_options:
    odps_type_list
    | hms_type_list

odps_type_list:
    TYPE = 'ODPS',
    [ACCESSTYPE = 'accesstype_string',]
    ACCESSID = 'string',
    ACCESSKEY = 'string',
    STSTOKEN = 'string',
    ENDPOINT = 'string',
    TUNNEL_ENDPOINT = 'string',
    PROJECT_NAME = 'string',
    [QUOTA_NAME = 'string',]
    [COMPRESSION = 'compression_string',]
    REGION = 'string'

hms_type_list:
    TYPE = 'HMS',
    URI = "string"
    [, KRB5CONF = "string"]
    [, KEYTAB = "string"]
    [, PRINCIPAL = "string"]
    [, MAX_CLIENT_POOL_SIZE = 20]
    [, SOCKET_TIMEOUT = 10000000]

accesstype_string:
    aliyun
    | sts
    | app

compression_string:
    zlib
    | zstd
    | lz4
    | odps_lz4

Parameters:

IF NOT EXISTS: Optional. If you specify this clause, the system will not return an error even if an external catalog with the specified name already exists in the current tenant. Instead, it will return a warning. If you do not specify this clause, the system will return an error.
external_catalog_name: The name of the external catalog to be created.
PROPERTIES [=] (properties_type_options): Specifies the properties of the external catalog.
odps_type_list: The properties of an ODPS external catalog.
hms_type_list: The properties of an HMS external catalog.

For more information about the parameters of the CREATE EXTERNAL CATALOG statement, see CREATE EXTERNAL CATALOG.

Create an ODPS external catalog

When reading from or writing to MaxCompute data, set the value to ODPS.

You can use the CREATE EXTERNAL CATALOG statement to create a connection to MaxCompute (ODPS) and directly query and write data in ODPS tables without manually creating mapping tables.

Here is an example:

obclient> CREATE EXTERNAL CATALOG test_odps_catalog
    PROPERTIES = (
        TYPE = 'ODPS',
        ACCESSID = '$odps_accessid',
        ACCESSKEY = '$odps_accesskey',
        ENDPOINT = '$odps_endpoint',
        TUNNEL_ENDPOINT = 'http://xxx.maxcompute.aliyun.com',
        PROJECT_NAME = 'mysqltest_regression_sqlqa',
        QUOTA_NAME = '',
        COMPRESSION_CODE = ''
    );

Create an HMS external catalog

OceanBase Database uses the HMS catalog feature to centrally manage Hive metadata (Metastore). With HMS catalog, OceanBase Database can directly access the data stored in the HDFS storage path corresponding to a Hive table, supporting efficient data lake queries and analysis.

Notice

Before you create an HMS catalog, if the HMS is expected to access file data stored in HDFS, you must deploy the Java SDK environment. For more information, see Deploy the OceanBase Database Java SDK environment.

HMS catalog support

The current HMS catalog supports Hive and Iceberg table formats and can read statistics from Hive and Iceberg tables to optimize SQL plans.

Hive format support

The current HMS catalog supports the Parquet, ORC, Textfile, and OpenCSV formats.

Note

The current HMS catalog only supports the ARRAY type in the Parquet format for complex types.

Hive version support

OceanBase Database supports the following Hive versions in HMS catalogs:

Hive 1.2.x
Hive 2.3.x
Hive 3.1.x
Hive 4.x

Iceberg support

The current HMS catalog supports Iceberg V1, V2, and V3.

OceanBase Database supports reading Iceberg tables that have undergone schema evolution and partition transformation.

The HMS catalog supports deletion vectors and changes to Iceberg V3 tables based on row-level deletion:

Deletion vectors are introduced in V3 and stored using the Puffin deletion-vector-v1 blob type.
Manifests are updated to track deletion vectors:
- The referenced_data_file field is added. This field is required for deletion vectors but optional for V2 position delete files that contain only one data file deletion.
- The content_offset field is added. This field must correspond to the offset of the deletion vector blob in the Puffin file.
- The content_size_in_bytes field is added. This field must correspond to the length of the deletion vector blob in the Puffin file.
Deletion vectors must be maintained synchronously: the write side must merge deletion vectors (DV) and earlier position delete files to ensure that each data file corresponds to at most one DV.
- If a data file has a deletion vector, the read side can safely ignore position delete files.
The write side cannot add new position delete files to V3 tables.
Existing position delete files are still valid in tables upgraded from V2.
- When a deletion vector is created for a data file, these position delete files must be merged into the DV.
- If a position delete file contains deletions for multiple data files, these files must be retained in the table's metadata until all deletion operations are replaced by the DV.
- After a table is upgraded from V2 to V3, the original position delete files remain valid and are still referenced during queries. However, when a deletion vector (DV) is first generated for a data file, all position delete records for that file are merged into the corresponding DV. After this, the deletion information for the file is managed solely by the DV, and the relevant parts of the original position delete files can be cleared, provided they do not contain deletion records for other files.

Authentication type support

OceanBase Database supports the following two authentication methods for HMS catalogs:

Authentication type	Description	Configuration requirements
SIMPLE	Simple authentication based on username and password (default).	The `PROPERTIES` clause does not need to specify the `KRB5CONF`, `KEYTAB`, and `PRINCIPAL` parameters.
KERBEROS	Strong authentication based on the Kerberos protocol (suitable for enterprise-level security environments).	The `PROPERTIES` clause must specify the `KRB5CONF`, `KEYTAB`, and `PRINCIPAL` parameters.

Here are some examples:

Create an HMS catalog with simple authentication, without location authentication.

obclient> CREATE EXTERNAL CATALOG test_hms_catalog 
    PROPERTIES = (
    TYPE = 'HMS',
    URI = "thrift://xxx.xxx.xxx.xxx:xxxx"
    );

Create an HMS catalog with Kerberos authentication.

obclient> CREATE EXTERNAL CATALOG test_hms_catalog_kerberos
    PROPERTIES = (
        TYPE = 'HMS',
        URI = "thrift://xxx.xxx.xxx.xxx:xxxx",
        PRINCIPAL = "hive/xxx@xxx.COM",
        KEYTAB = "/xxx/xxx/xxx/hadoop.keytab",
        KRB5CONF = "/etc/xxx.conf"
    );

Location authentication

OceanBase Database HMS Catalog supports key management through External Location to enable secure access to external storage. When accessing a table in HMS, the system automatically matches the corresponding External Location based on the table path. If a match is found, it uses the associated key for authentication.

Location authentication scenarios

Based on the storage type and security requirements, location authentication can be categorized into the following scenarios:

Storage Type	Authentication Method
OSS	AK/SK authentication.
HDFS	HDFS supports the following authentication methods: No authentication (no Kerberos required, and no HDFS user needs to be specified). User authentication (no Kerberos required, but access must be made as a specific HDFS user). Kerberos authentication, with HDFS operating in single NameNode mode (non-HA mode). Kerberos authentication, with HDFS operating in high availability (HA) mode. Kerberos is not enabled, but HDFS operates in high availability (HA) mode.

Storage Type

Authentication Method

OSS

AK/SK authentication.

HDFS

HDFS supports the following authentication methods:

No authentication (no Kerberos required, and no HDFS user needs to be specified).
User authentication (no Kerberos required, but access must be made as a specific HDFS user).
Kerberos authentication, with HDFS operating in single NameNode mode (non-HA mode).
Kerberos authentication, with HDFS operating in high availability (HA) mode.
Kerberos is not enabled, but HDFS operates in high availability (HA) mode.

Configuration steps

Determine the location authentication scenario.

Select the appropriate authentication scenario based on the storage type and security requirements.
Create a location object.

Use the CREATE LOCATION statement to create a location object for the selected scenario. For more information about creating a location, see CREATE LOCATION.
Associate the catalog.

When creating a catalog, no additional configuration is required. The system automatically matches the location based on the storage path of the corresponding table.

Example of configuring location authentication scenarios

OSS storage authentication

Create a location object for accessing Alibaba Cloud OSS object storage.

CREATE LOCATION oss_credential
    URL = 'oss://bucket-name/path'
    CREDENTIAL = (
        ACCESSID = 'your-access-key-id'
        ACCESSKEY = 'your-access-key-secret'
        HOST = 'oss-region.aliyuncs.com'
    );

HDFS storage authentication

Scenario 1: No authentication mode.

No location object needs to be created. This is suitable for development or testing environments where Kerberos authentication is not enabled for the HDFS cluster (i.e., hadoop.security.authentication=simple).
Scenario 2: User authentication mode.

No Kerberos is required, but access must be made as a specific HDFS user.

Note

This applies to clusters where Kerberos authentication is not enabled, but specific HDFS users are required to access certain paths.

Create a location object: Specify the HDFS user.
```
CREATE LOCATION hdfs_user01
    URL = 'hdfs://namenode:8020/'
    CREDENTIAL (
        USER = 'hdfs_user_name'
    );
```

Scenario 3: Kerberos enabled, HDFS in single NameNode mode (non-HA mode).

Note

The dfs.data.transfer.protection parameter in CONFIGS must match the configuration of the HDFS cluster.

Create a location object: Kerberos authentication + single-point HDFS.

CREATE LOCATION hdfs_kerberos_single
    URL = 'hdfs://namenode.example.com:8020/'
    CREDENTIAL (
        PRINCIPAL = "hdfs/xxx@xxx.COM",
        KEYTAB = "/data/hdfs.keytab",
        KRB5CONF = "/data/krb5.conf",
        CONFIGS = 'dfs.data.transfer.protection=integrity'
    );

Scenario 4: Kerberos enabled, HDFS in high availability (HA) mode.

CREATE LOCATION hdfs_kerberos_ha
    URL = 'hdfs://${nameservice id}'
    CREDENTIAL (
        PRINCIPAL = "hdfs/xxx@xxx.COM",
        KEYTAB = "/etc/ob/hdfs.keytab",
        KRB5CONF = "/etc/krb5.conf",
        CONFIGS = 'dfs.data.transfer.protection=${string}#dfs.nameservices=${nameservice id}#dfs.ha.namenodes.${nameservice id}=${namenode1}, ${namenode2}#dfs.namenode.rpc-address.${nameservice id}.${namenode1}=${namenode 1 address}#dfs.namenode.rpc-address.${nameservice id}.${namenode2}=${namenode 2 address}#dfs.ha.automatic-failover.enabled.${nameservice id}=true#dfs.client.failover.proxy.provider.${nameservice id}=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
        );

Scenario 5: Kerberos not enabled, but HDFS in high availability (HA) mode.

Note

Kerberos is not enabled, but HDFS operates in high availability (HA) mode. No need to set the PRINCIPAL, KEYTAB, and KRB5CONF parameters.

CREATE LOCATION hdfs_location_ha
    URL = 'hdfs://${nameservice id}'
    CREDENTIAL (
        CONFIGS = 'dfs.nameservices=${nameservice id}#dfs.ha.namenodes.${nameservice id}=${namenode1}, ${namenode2}#dfs.namenode.rpc-address.${nameservice id}.${namenode1}=${namenode 1 address}#dfs.namenode.rpc-address.${nameservice id}.${namenode2}=${namenode 2 address}#dfs.ha.automatic-failover.enabled.${nameservice id}=true#dfs.client.failover.proxy.provider.${nameservice id}=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
        );

References

View a catalog

CREATE EXTERNAL CATALOG

OceanBase

Customer Stories

Documentation

Create a catalog

Permissions

Syntax

Create an ODPS external catalog

Create an HMS external catalog

Notice

HMS catalog support

Hive format support

Note

Hive version support

Iceberg support

Authentication type support

Location authentication

Location authentication scenarios

Configuration steps

Example of configuring location authentication scenarios

OSS storage authentication

HDFS storage authentication

Note

Note

Note

References