HMS Catalog|V4.4.2|OceanBase Database| docs|Distributed Database

HMS Catalog

Last Updated：2026-04-02 06:23:56 Updated

OceanBase Database V4.4.1 and later allow you to connect to Hive Metastore (HMS) and create an HMS Catalog. This feature enables you to access tables managed by HMS, including traditional Hive tables and Iceberg tables whose metadata is registered in HMS. You can query data managed by HMS in the Hadoop ecosystem without data migration, making it ideal for accelerating data lake queries.

Unified metadata management: Automatically synchronize table structures from HMS.
Federated queries: Join and analyze data with internal OceanBase tables.
Read-only security: Prevent accidental modifications to source data.

OceanBase Database V4.4.1 and later support the HMS Catalog feature. This document provides usage instructions for the AP capabilities of OceanBase Database in the V4.3.5 documentation set to help you get started.

Supported features

Read-only access: All objects under HMS Catalog are read-only. Currently, INSERT, UPDATE, and DROP TABLE operations are not supported.
Supported table types:
- Hive tables: ORC, Parquet, TextFile, and CSV.
  - In the current version, only the ARRAY type in Parquet format is supported for complex types.
  - For more information about the ARRAY type, see Overview of array element types.
- Iceberg tables: V1 and V2. We recommend Parquet format.
Advanced features:
- Iceberg schema evolution (column addition and deletion)
- Iceberg partition transformation (such as bucket() and year() partition expressions)
Hive version compatibility
- Supported versions: Hive 1.2.x, Hive 2.3.x, Hive 3.1.x, and Hive 4.x.
- Compatibility with Iceberg tables:
  - Hive 4.x and later: Hive natively supports Iceberg tables. You can create and update metadata of Iceberg tables by using Hive. OceanBase Database can directly access Iceberg tables through HMS Catalog.
  - Hive 1.2.x, 2.3.x, and 3.1.x: Hive does not support Iceberg tables. In such environments, Iceberg tables are usually created and written to by computing engines such as Spark and Flink, and then registered in the Hive Metastore. OceanBase Database identifies and reads Iceberg tables based on the metadata information stored in HMS.
    
    Therefore, as long as the metadata of Iceberg tables is correctly registered in the Hive Metastore (regardless of the engine used for writing), OceanBase Database can access them.

Prerequisites

Environment dependencies: If the underlying storage is HDFS, you must deploy a Java SDK environment. For more information, see Deploy the OceanBase Java SDK environment.
Privilege requirements
- HMS service: The OceanBase cluster must be able to access Hive Metastore (Thrift protocol). The address and port of the HMS service must be provided by the customer's O&M team. By default, open-source Hive uses port 9083, but this may vary in actual deployment. Please refer to the on-site configuration.
- Storage system: All OBServer nodes must be readable for HDFS, S3, and OSS.

Syntax for creating an HMS Catalog

CREATE EXTERNAL CATALOG [IF NOT EXISTS] catalog_name
PROPERTIES (
    TYPE = 'HMS',
    URI = 'thrift://host:port',
    [PRINCIPAL = '...'],
    [KEYTAB = '...'],
    [KRB5CONF = '...'],
    [MAX_CLIENT_POOL_SIZE = 20],
    [SOCKET_TIMEOUT = 10000000]
);

Parameters

Parameter	Required	Description
TYPE	Yes	Fixed to `'HMS'`.
URI	Yes	The Thrift address of HMS. Format: `thrift://$host:$port` `$host`: specifies the IP address of Thrift. `$port`: specifies the port of Thrift. The IP address and port of HMS are provided by the customer's O&M team. In an open-source Hive cluster, the default port is 9083, but the actual port may vary. Please refer to the on-site configuration. If high availability mode is enabled for your HMS, you can specify multiple HMS addresses separated by commas, for example: `"thrift://<HMS IP address 1>:<HMS port number 1>,thrift://<HMS IP address 2>:<HMS port number 2>,thrift://<HMS IP address 3>:<HMS port number 3>"`.
PRINCIPAL	No	The Kerberos principal, usually in the `service/HOST@REGION.com` format. Example: `hive/hadoop@QA.COM`.
KEYTAB	No	The path of the KEYTAB key file required to access the HMS service with Kerberos authentication. If OceanBase Database is deployed in a distributed manner, the file must exist on all nodes of the corresponding OBServer node. Notice: This parameter is required only when Kerberos authentication is enabled for HMS.
KRB5CONF	No	The path of the Kerberos configuration file, such as `/etc/krb5.conf`. If OceanBase Database is deployed in a distributed manner, the file must exist on all nodes of the corresponding OBServer node.
MAX_CLIENT_POOL_SIZE	No	The HMS client connection pool size (default is 20, meaning the current HMS Catalog can start up to 20 clients connecting to the HMS service).
SOCKET_TIMEOUT	No	The timeout for connecting to the Hive metastore (in microseconds). Default value: 10,000,000 (10s).

Create an HMS Catalog with SIMPLE authentication

In the SIMPLE authentication mode, no Location authentication is required.

obclient> CREATE EXTERNAL CATALOG test_hms_catalog
    PROPERTIES = (
        TYPE = 'HMS',
        URI = "thrift://xxx.xxx.xxx.xxx:xxxx"
    );

For more information about Hadoop authentication modes, see the "Authentication mechanism of HMS Catalog and HDFS storage" section below.

Create an HMS Catalog with Kerberos authentication

For more information about Hadoop authentication modes, see the section titled "Authentication mechanism of HMS Catalog and HDFS storage" below.

Step 1: Create an HMS Catalog with Kerberos authentication (metadata layer)

CREATE EXTERNAL CATALOG hms_kerberos
PROPERTIES (
    TYPE = 'HMS',
    URI = 'thrift://hms.example.com:9083',
    PRINCIPAL = 'hive/hms.example.com@EXAMPLE.COM',
    KEYTAB = '/etc/ob/hive.keytab',
    KRB5CONF = '/etc/krb5.conf'
);

Step 2: Create an HDFS location with Kerberos authentication (data layer)

CREATE LOCATION hdfs_kerberos_ha
URL = 'hdfs://namenode:8020/'
CREDENTIAL (
    PRINCIPAL = "ob_hdfs@EXAMPLE.COM",
    KEYTAB   = "/etc/ob/hdfs.keytab",
    KRB5CONF = "/etc/krb5.conf",
    CONFIGS  = '...'  -- HA configurations as described above
);

The HMS Catalog can discover the table structure, and the location can securely read HDFS data files.

Switch Catalog

SET CATALOG hms_catalog;

Query external data sources through Catalog

-- Hive table
SELECT city, COUNT(*) FROM hive_db.customer WHERE dt >= '2025-04-01' GROUP BY city;

-- Iceberg table
SELECT product_id, SUM(sales) FROM iceberg_db.sales_iceberg WHERE event_time >= '2025-04-01' GROUP BY product_id;

-- Federation query
SELECT o.order_id, h.city
FROM internal.test_db.orders o
JOIN hms_catalog.hive_db.customer h ON o.user_id = h.id;

Authentication mechanism of HMS Catalog and HDFS storage

In OceanBase Database, accessing Hive Metastore (HMS) involves two independent authentication layers:

Catalog layer: Used to connect to Hive Metastore (to retrieve table structure, partition, and other metadata).
Location layer: Used to read data files stored in file systems such as HDFS.

The authentication methods for these two layers must be configured separately and must align with the security policies of the Hadoop cluster.

The Hadoop authentication mode is determined by the server

The Hadoop cluster defines its security mode through the hadoop.security.authentication configuration in hdfs-site.xml:

<property>
  <name>hadoop.security.authentication</name>
  <value>kerberos</value> <!-- or simple -->
</property>

kerberos: Enables Kerberos authentication. All clients, including HMS Client and HDFS Client, must pass Kerberos authentication.
simple: The Hadoop server trusts the client's declared operating system username and does not perform password or credential verification. Permission control is based solely on the owner, group, and POSIX permission bits (e.g., drwxr-xr-x (755)) of HDFS files and directories, without Kerberos authentication.

Notice

This configuration is located on the Hadoop server ($HADOOP_HOME/etc/hadoop/hdfs-site.xml). OceanBase does not directly read this file but adapts to this mode through its own Catalog and Location configurations.

Authentication configuration of OceanBase Database

Hadoop mode	HMS Catalog configuration	HDFS Location configuration
SIMPLE	No authentication method is specified when creating a catalog (default is SIMPLE).	If the HDFS directory is readable by all users (e.g., with permission `drwxr-xr-x`), no location needs to be created. If the directory is accessible only to specific users (e.g., owned by hive or impala), you must explicitly specify the HDFS user with `CREATE LOCATION ... CREDENTIAL (USER = 'xxx')`.
KERBEROS	Specify `AUTHENTICATION = 'KERBEROS'` when creating a catalog, and provide `PRINCIPAL`, `KEYTAB`, and `KRB5CONF`.	You must create an HDFS location with Kerberos authentication, and provide the principal, keytab, and krb5.conf.

Important configuration rules: If Kerberos is enabled for either HMS or HDFS, check whether the deployment is correct.

Overview of two authentication modes

SIMPLE mode (common in development and testing environments)

Applicable scenarios:

Kerberos is not enabled for HDFS, and Linux file permissions are used for access control.
Tables are written by specific users (for example, the owner of an HDFS path written by Impala is impala, and the owner of an HDFS path written by Hive on Tez is the user who submitted the job, such as hive).

Configuration requirements:

Anonymous read path: ● HDFS directory permissions are open (for example, drwxr-xr-x (755)), and any user can read without creating a location.
- Meaning: The owner has read, write, and execute permissions (rwx = 4+2+1 = 7); The group and other users have only read and execute permissions (r-x = 4+0+1 = 5);
- For such paths, OceanBase Database does not need to create a location and can directly access the path.
Restricted path: If the user of the OceanBase Database observer process (such as admin) is not in the authorized user or group of the directory, you must:
- Create a location and specify a user with permissions:
```
CREATE LOCATION my_loc URI = 'hdfs://mycluster/'
CREDENTIAL = (USER = 'username');
```

Scenario 1: No Kerberos authentication and no need to specify an HDFS user (default anonymous access)

Applicable to: Development or testing environments where Kerberos authentication is not enabled for the HDFS cluster (that is, hadoop.security.authentication=simple).
Operation: No Location needs to be created.

Scenario 2: No Kerberos authentication is required, but an HDFS user must be specified

Applicable to: Development or testing environments where Kerberos authentication is not enabled for the HDFS cluster (that is, hadoop.security.authentication=simple), but the data files of the target Hive table are stored in a restricted HDFS path (for example, written by Impala, Hive, or another engine, with the path owner being a specific user and without global read permissions).
Typical cases:
- The path owner is impala when Impala writes to the HDFS path.
- The path owner is hive or etl_user when a Hive job writes to the HDFS path.
- By default, the OceanBase Database observer process uses the operating system user (such as admin) to access HDFS. If the user does not have read permissions, the query fails.
Example: Assume that a Hive table is written by Impala, and the HDFS path is hdfs://namenode:8020/warehouse/sales.db/click_log.

CREATE LOCATION hdfs_impala_data
URL = 'hdfs://namenode:8020/'    -- URL, see the description below
CREDENTIAL (
  USERNAME = 'username'  -- Specify the username for accessing HDFS
);

How to determine the correct URL:

Execute the following command in Hive CLI or Beeline:
```
SHOW CREATE TABLE your_db.your_table;
```

View the LOCATION field in the output, for example:

LOCATION 'hdfs://namenode:8020/warehouse/your_db.db/your_table'

Extract the protocol, service name/host name, and service port as the URL. That is, in the CREATE LOCATION statement, set the URL to hdfs://namenode:8020/.

Scenario 3: Kerberos is not enabled, and HDFS is highly available (HA)

Operation: No need to set the PRINCIPAL, KEYTAB, or KRB5CONF parameters.

CREATE LOCATION hdfs_location_ha
    URL = 'hdfs://${nameservice_id}'  -- It is recommended to use the logical service name
    CREDENTIAL (
        CONFIGS = 'dfs.nameservices=${nameservice id}#dfs.ha.namenodes.${nameservice id}=${namenode1}, ${namenode2}#dfs.namenode.rpc-address.${nameservice id}.${namenode1}=${namenode 1 address}#dfs.namenode.rpc-address.${nameservice id}.${namenode2}=${namenode 2 address}#dfs.ha.automatic-failover.enabled.${nameservice id}=true#dfs.client.failover.proxy.provider.${nameservice id}=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
        );

Kerberos mode (standard for production environments)

Applicable scenarios:

Enterprise-level Hadoop clusters where Kerberos authentication is enabled for Hive Metastore Service (HMS) or Hadoop Distributed File System (HDFS).

Configuration requirements:

HMS Catalog:

CREATE EXTERNAL CATALOG hms_krb
  PROPERTIES (
    'type' = 'hms',
    'hive.metastore.uris' = 'thrift://metastore:9083',
    'authentication' = 'KERBEROS',
    'kerberos.principal' = 'ob@REALM',
    'kerberos.keytab' = '/path/to/ob.keytab',
    'kerberos.krb5.conf' = '/etc/krb5.conf'
  );

HDFS Location:

CREATE LOCATION hdfs_krb_loc
  URI = 'hdfs://nn:8020',
  CREDENTIAL = (
    KERBEROS = (
      PRINCIPAL = 'ob@REALM',
      KEYTAB = '/path/to/ob.keytab',
      KRB5CONF = '/etc/krb5.conf'
    )
  );

Scenario 4: Kerberos authentication is enabled, and HDFS is a single NameNode (non-HA mode)

-- Create a LOCATION: Kerberos authentication + single HDFS
CREATE LOCATION hdfs_kerberos_single
URL = 'hdfs://namenode.example.com:8020/'   -- Obtain the complete HDFS path of the table by executing SHOW CREATE TABLE, and extract the root URL.
CREDENTIAL (
  PRINCIPAL = "hdfs/TEST@EXAMPLE.COM",
  KEYTAB   = "/data/hdfs.keytab",
  KRB5CONF = "/data/krb5.conf",
  CONFIGS  = 'dfs.data.transfer.protection=integrity'
);

Note

The dfs.data.transfer.protection parameter in CONFIGS specifies the security level of the HDFS data transmission channel, such as authentication, integrity, privacy, or null.

This value must be consistent with the dfs.data.transfer.protection parameter configured in the hdfs-site.xml file of the Hadoop cluster. Otherwise, data read operations may fail due to mismatched security policies.

We recommend that you confirm the actual value of this parameter from the Hadoop administrator or directly check the hdfs-site.xml file of the HDFS cluster.

Scenario 5: Kerberos authentication is enabled, and HDFS is highly available (HA)

CREATE LOCATION hdfs_kerberos_ha
URL = 'hdfs://${nameservice id}'  -- We recommend that you use the logical service name.
CREDENTIAL (
    PRINCIPAL = "ob_hdfs@EXAMPLE.COM",
    KEYTAB   = "/etc/ob/hdfs.keytab",
    KRB5CONF = "/etc/krb5.conf",
    CONFIGS  = 'dfs.data.transfer.protection=integrity#dfs.nameservices=mycluster#dfs.ha.namenodes.mycluster=nn1,nn2#dfs.namenode.rpc-address.mycluster.nn1=nn1:8020#dfs.namenode.rpc-address.mycluster.nn2=nn2:8020#dfs.client.failover.proxy.provider.mycluster=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
);

Note

Deploy the same keytab and krb5.conf files on all OBServer nodes.

If the "Unknown Host" error occurs, add the HDFS node mapping to the /etc/hosts file.

For more information about the Location parameters, see Create Location.

Configuration notes

Authentication separation: The authentication for Catalog (metadata) and Location (data) are configured separately.
Mode alignment: The authentication mode of OceanBase must be consistent with the hadoop.security.authentication setting of the Hadoop cluster.
Kerberos consistency: If Kerberos is enabled for either HMS or HDFS, Kerberos credentials must be configured for both.
SIMPLE mode requires permission checks: Ensure that the Observer user or specified USER has read permissions for the HDFS path.

OceanBase

Customer Stories

Documentation

HMS Catalog

Supported features

Prerequisites

Syntax for creating an HMS Catalog

Parameters

Create an HMS Catalog with SIMPLE authentication

Create an HMS Catalog with Kerberos authentication

Step 1: Create an HMS Catalog with Kerberos authentication (metadata layer)

Step 2: Create an HDFS location with Kerberos authentication (data layer)

Switch Catalog

Query external data sources through Catalog

Authentication mechanism of HMS Catalog and HDFS storage

The Hadoop authentication mode is determined by the server

Notice

Authentication configuration of OceanBase Database

Overview of two authentication modes

SIMPLE mode (common in development and testing environments)

Scenario 1: No Kerberos authentication and no need to specify an HDFS user (default anonymous access)

Scenario 2: No Kerberos authentication is required, but an HDFS user must be specified

Scenario 3: Kerberos is not enabled, and HDFS is highly available (HA)

Kerberos mode (standard for production environments)

Scenario 4: Kerberos authentication is enabled, and HDFS is a single NameNode (non-HA mode)

Note

Scenario 5: Kerberos authentication is enabled, and HDFS is highly available (HA)

Note

Configuration notes

References