Overview of data catalogs|V4.6.0|OceanBase Database| docs|Distributed Database

Overview of data catalogs

Last Updated：2026-05-07 11:26:24 Updated

OceanBase Database supports the ODPS Catalog since V4.3.5 BP2 and introduces the HMS Catalog (Hive Metastore Catalog) mechanism in V4.4.1. This allows seamless access to data lakes and offline data warehouses. With this feature, users can directly query Hive tables, Iceberg tables, and offline tables in MaxCompute (ODPS) using standard SQL, all managed by Hive Metastore (HMS).

These tables are typically stored in efficient columnar formats such as Parquet or ORC in distributed storage systems like HDFS, OSS, or S3, forming a crucial part of the enterprise's core data lake.

Unlike simple metadata integration, OceanBase Database is not just capable of reading data lakes but also serves as a high-performance computing engine, significantly accelerating queries on lake data. This is made possible by its built-in AP (Analytical Processing) optimization capabilities:

Predicate Pushdown and Column Pruning: Pushes WHERE conditions and SELECT column information down to the data source (such as ODPS Storage API or HDFS file reading layer).
Parallel Execution and Vectorized Computing: Automatically parallelizes large table scan tasks and efficiently processes columnar data using a vectorized engine.
External Table Data Caching (Optional): Supports caching data files for frequently accessed static lake tables, reducing query latency.

With these capabilities, users can achieve query acceleration in OceanBase Database without migrating raw data from data lakes or data warehouses into the database:

Real-time business tables (such as orders and user profiles) within OceanBase Database.
Historical logs, wide tables, or Iceberg tables (such as user behavior and IoT time-series data) in data lakes, as well as offline tables in cloud-native data warehouses like ODPS.

This makes OceanBase Database the entry point for querying data lakes, enabling the federated analytics experience of "querying data from anywhere". With just one SQL statement, users can access data from both the database and the data lake, greatly simplifying cross-source analytics and making data lake usage more straightforward and efficient.

Deployment and environment dependencies

Cross-IDC deployment recommendation	HDFS storage dependency
If HMS and OBServer are deployed across IDCs, network latency may cause metadata retrieval to take up to seconds, significantly increasing query latency. It is recommended to deploy HMS and OBServer within the same low-latency network.	When the underlying storage is HDFS, Java runtime environment (JDK 8/11) must be deployed on OBServer nodes in advance, and Java support must be enabled (`enable_java = true`). For more information, see Deploy the OceanBase Java SDK environment.

Relationship between catalog and external tables

OceanBase provides two modes for accessing external data:

Mode	Applicable scenarios	Metadata management	Typical usage
Catalog mode (such as HMS Catalog and ODPS Catalog)	Predefined, structured data sources (such as Hive data warehouses and ODPS projects)	Automatically synchronized metadata	`SET CATALOG my_hms;` `SELECT * FROM db1.sales;`
Explicit external tables (file/JDBC/ODPS external tables)	Temporary files, custom paths, or non-Catalog data sources	Manually create tables to define structures	`CREATE EXTERNAL TABLE t1 (...) LOCATION = 'oss://...';`

Notice

A catalog is a metadata abstraction of the underlying data (such as Parquet/ORC files and ODPS tables).
- Currently, HMS Catalog only supports integration with Hive Metastore and cannot directly point to any HDFS path. To access files not registered with HMS, use the CREATE EXTERNAL TABLE ... LOCATION method.

Catalog types

In OceanBase, a catalog is a top-level namespace for organizing and isolating metadata from different data sources. OceanBase supports the following catalog types:

Catalog type	Data source	Supported table type	Typical scenarios
Internal catalog	OceanBase local storage	OceanBase tables	Core business data
ODPS catalog	Alibaba Cloud MaxCompute (ODPS)	ODPS tables	Big data offline analysis
HMS catalog	Hive Metastore (HMS)	Hive tables, Iceberg tables	Hadoop ecosystem integration

Create a catalog

The built-in internal catalog in OceanBase is used to manage the metadata of the database itself and does not need to be manually created. To use external data sources, you can execute the CREATE EXTERNAL CATALOG statement to create an ODPS catalog or an HMS catalog.

Access a catalog

Note

Currently, only OceanBase Database in MySQL mode supports the catalog feature.

View catalogs in the current tenant

You can run the SHOW CATALOGS; statement to view all catalogs created in the current tenant.

Example:

SHOW CATALOGS;

Query data in a catalog

In a catalog, you can execute any query operation as in a database. You can also perform a JOIN operation across catalogs.

Example:

SELECT * FROM internal.ap_db.ap_table, odps_catalog.odps_db.odps_table;

Query data across catalogs

OceanBase Database allows you to access data from multiple catalogs (such as internal, hive_catalog, and odps_catalog) in a single SQL statement to perform a JOIN operation. You can use the following two methods to reference tables in a different context:

Core principles:

If you do not switch the session context, you must use the full three-part identifier: catalog_name.database_name.table_name;
If you have switched to the target catalog and database, you can directly use the abbreviated table name.

Method 1: Use the three-part identifier (no context switch required)

You can use the catalog.db.table three-part identifier to directly access tables in any catalog without switching the session context.

Example 1: Query a table in the default internal catalog

-- View the current session context
SELECT CURRENT_CATALOG();   -- Returns 'internal'
SELECT DATABASE();  -- Or use SELECT SCHEMA(); to return the current database, such as 'ap_db'

-- Query a table in a different catalog
SELECT * FROM hive_catalog.hive_db.hive_table;

You can also use the catalog.db.table three-part identifier to directly access tables in the HMS catalog.

Example 2: Perform a JOIN operation across catalogs (mixed sources)

-- Assume the current context is internal.ap_db
SELECT CURRENT_CATALOG();   -- Returns 'internal'
SELECT DATABASE();  -- Or use SELECT SCHEMA(); to return the current database, such as 'ap_db'

-- Query tables in both the HMS catalog and the local catalog
SELECT h.*, o.*
FROM hive_catalog.hive_db.hive_table h
JOIN internal.ap_db.ap_table o ON h.id = o.id;

Method 2: Switch the session context and use the abbreviated table name

You can explicitly switch the current session's catalog and database using SET CATALOG and USE, and then directly use the table name without any prefix.

Example: Perform a JOIN operation in the hive_catalog.hive_db catalog

-- Switch the catalog and database
SET CATALOG hive_catalog;
USE hive_db;

-- Verify the current context
SELECT CURRENT_CATALOG();   -- Returns 'hive_catalog'
SELECT DATABASE();  -- Returns 'hive_db'

-- You can now directly reference tables in hive_db (such as hive_table)
-- You can still use the three-part identifier to access tables in other catalogs (such as internal)
SELECT h.*, o.*
FROM hive_table h                      -- Abbreviated: current catalog + DB
JOIN internal.ap_db.ap_table o ON h.id = o.id;

Note

If you do not execute SET CATALOG and USE, and directly write SELECT * FROM hive_table;, an error will occur because the system will search for the table in the current catalog (such as internal).

Switching the context only affects subsequent statements and does not impact the connection or transaction status.

For more information, see View a catalog.

Switch a catalog

You can run the SET CATALOG catalog_name; statement to switch to a catalog, or explicitly specify it in a query:

SELECT * FROM odps_catalog.database_name.table_name;

For more information about the SQL syntax, see SET CATALOG.

HMS/ODPS Catalog support details

HMS Catalog support details

Access mode

Read-only access: All objects in the HMS Catalog are read-only. You cannot execute DML/DDL operations such as INSERT, UPDATE, and DROP TABLE.

Supported table types

Table type	Format support	Description
Hive table	ORC, Parquet, TextFile, and CSV	- Complex types are supported only for the ARRAY type in Parquet format. For more information, see Overview of array element types
Iceberg table	V1, V2, or V3 (Parquet is recommended)	Supports advanced features such as schema evolution (column addition and deletion) and partition transformation (such as `bucket()` and `year()`).

Hive version compatibility

Hive version	Iceberg support
Hive 4.x	Hive natively supports Iceberg tables. You can create or update Iceberg tables in Hive. OceanBase Database can directly access Iceberg tables through the HMS Catalog.
Hive 1.2.x, 2.3.x, or 3.1.x	Hive does not support Iceberg tables. However, if an Iceberg table is created by an engine such as Spark or Flink and its metadata is registered in the Hive Metastore, OceanBase Database can still read the table.

Key principle: As long as the metadata of an Iceberg table is correctly registered in the HMS Catalog (regardless of the engine used to write the table), OceanBase Database can access the table.

ODPS Catalog support details

Supported operations: Analytical queries such as SELECT, JOIN, and GROUP BY;
Unsupported operations: Write operations such as INSERT, UPDATE, and DROP TABLE;
Optimization capabilities: Automatic partition pruning and column pruning to reduce the amount of data transmitted from ODPS.

Manage catalog privileges

OceanBase Database supports fine-grained privilege management at the catalog level. Users must have the appropriate privileges to create, switch, or query external catalogs (such as ODPS and HMS).

Catalog privileges are divided into two levels:
- User Level (global level): Applies to all catalogs (corresponding to . in MySQL mode)
- Catalog Level (object level): Applies only to the specified catalog

Privilege system in MySQL mode

Privilege	Scope	Description
`CREATE CATALOG`	User Level	Grants the privilege to execute `CREATE` and `DROP EXTERNAL CATALOG`
`USE CATALOG`	User Level or Catalog Level	Grants the privilege to execute `SET CATALOG`, `SHOW CATALOGS`, and `SELECT FROM catalog.db.table`

Syntax

Grant global privileges

This is applicable to administrators who need to operate on any catalog:

GRANT CREATE CATALOG ON *.* TO 'user1';
GRANT USE CATALOG ON *.* TO 'user1' WITH GRANT OPTION;

REVOKE CREATE CATALOG ON *.* FROM 'user1';
REVOKE USE CATALOG ON *.* FROM 'user1';

Grant specific catalog privileges

This is applicable to regular users who only need to access a specific catalog (recommended approach):

-- Grant the USE and SELECT privileges on the catalog 'odps_prod'
GRANT SELECT, USE CATALOG ON CATALOG odps_prod TO 'user1' WITH GRANT OPTION;

-- Revoke the privileges
REVOKE USE CATALOG ON CATALOG odps_prod FROM 'user1';

Important notes:

The creator automatically has privileges

After a user successfully executes CREATE EXTERNAL CATALOG ..., they automatically gain the USE CATALOG and SELECT privileges on that catalog, without needing additional authorization.

Example:

-- user1 executes
CREATE EXTERNAL CATALOG odps_dev TYPE = 'ODPS' ...;

-- At this point, user1 can directly use the catalog
SET CATALOG odps_dev;
SELECT * FROM project_db.table1 LIMIT 1;

Catalog privileges ≠ Data access privileges

Even if a user has the USE CATALOG and SELECT privileges, querying tables within the catalog may still fail. This is because:

Catalog level: Used to connect to Hive Metastore (to retrieve table structure, partition, etc., metadata)
Location level: Used to read data files stored in file systems like HDFS

Authentication for both levels must be configured separately and must align with the security policies of the Hadoop cluster.

Example:

-- user1 has the USE CATALOG privilege on odps_prod
SET CATALOG hms_prod;
USE sales_db;

-- Querying the table fails
SELECT * FROM sales_log LIMIT 1;
-- An error is generated

-- Reason: No location was created, or user1 does not have access to the location
-- Solution: Create a location with authentication information
CREATE LOCATION hms_sales URL = 'hdfs://nn:8020/sales/'
  CREDENTIAL (USER = 'username');

For more information about the authentication mechanism for HMS catalogs and HDFS storage, see HMS Catalog.

Privilege verification example

-- Grant privileges
GRANT SELECT, USE CATALOG ON CATALOG ca1 TO zhangsan WITH GRANT OPTION;

-- View privileges
SHOW GRANTS FOR zhangsan;

Output:

+-------------------------------------------------------------------------+
| Grants for zhangsan@%                                                      |
+-------------------------------------------------------------------------+
| GRANT USAGE ON *.* TO 'zhangsan'                                           |
| GRANT SELECT, USE CATALOG ON CATALOG `ca1` TO 'zhangsan' WITH GRANT OPTION |
+-------------------------------------------------------------------------+

Privilege query views

You can use the following system views to query privileges:

information_schema.USER_PRIVILEGES: User global privileges
oceanbase.DBA_OB_USERS / CDB_OB_USERS: User-related metadata information

References

Overview of Catalog
Create a Catalog
View a Catalog
Drop a Catalog
SET CATALOG
CREATE EXTERNAL CATALOG
DROP CATALOG

OceanBase

Customer Stories

Documentation

Overview of data catalogs

Deployment and environment dependencies

Relationship between catalog and external tables

Notice

Catalog types

Create a catalog

Access a catalog

Note

View catalogs in the current tenant

Query data in a catalog

Query data across catalogs

Method 1: Use the three-part identifier (no context switch required)

Method 2: Switch the session context and use the abbreviated table name

Note

Switch a catalog

HMS/ODPS Catalog support details

HMS Catalog support details

ODPS Catalog support details

Manage catalog privileges

Privilege system in MySQL mode

Syntax

Privilege verification example

Privilege query views

References