Overview of data catalogs|V4.4.2|OceanBase Database| docs|Distributed Database

Overview of data catalogs

Last Updated：2026-04-02 06:23:56 Updated

Starting from V4.3.5 BP2, OceanBase Database supports the ODPS catalog. In V4.4.1, it further introduces the HMS catalog (Hive Metastore catalog) mechanism to seamlessly access data lakes and offline data warehouses. With this capability, users can directly query Hive tables, Iceberg tables, and offline tables in MaxCompute (ODPS) using standard SQL, all managed by the Hive Metastore (HMS).

These tables are typically stored in efficient columnar formats like Parquet or ORC in distributed storage systems such as HDFS, OSS, or S3, forming a crucial part of enterprise data lakes.

Unlike simple metadata integration, OceanBase Database is not just capable of reading data lakes, but serves as a high-performance computing engine that significantly accelerates queries on lake data. This is made possible by its built-in AP (Analytical Processing) optimization capabilities:

Predicate Pushdown and Column Pruning: Pushes WHERE conditions and SELECT column information down to the data source (e.g., ODPS Storage API, HDFS file reading layer).
Parallel Execution and Vectorized Computing: Automatically parallelizes large table scan tasks and efficiently processes columnar data using a vectorized engine.
External Table Data Caching (Optional): Supports caching data files for frequently accessed static lake tables, reducing query latency from repeated queries.

With these capabilities, users can accelerate queries without migrating raw data from data lakes or data warehouses into the database:

Real-time business tables (e.g., orders, user profiles) within OceanBase Database.
Historical logs, wide tables, or Iceberg tables (e.g., user behavior, IoT time-series data) in data lakes, and offline tables in cloud-native data warehouses like ODPS.

This makes OceanBase Database the entry point for querying data lakes, enabling a true "query once, access all data" federated analytics experience. With just one SQL statement, users can simultaneously access data from both the database and the data lake, greatly simplifying cross-source analytics workflows and making data lake usage more straightforward and efficient.

Deployment and environment dependencies

Cross IDC deployment recommendation	HDFS storage dependency
If HMS and OBServer are deployed across IDCs, network latency can cause metadata retrieval to take up to seconds, significantly increasing query latency. It is recommended to deploy HMS and OBServer within the same low-latency network.	When the underlying storage is HDFS, Java runtime environment (JDK 8/11) must be deployed on OBServer nodes, and Java support must be enabled (`enable_java = true`). For more information, see Deploy OceanBase Java SDK environment.

Relationship between catalog and external tables

OceanBase provides two modes for accessing external data:

Mode	Applicable scenario	Metadata management	Typical usage
Catalog mode (such as HMS catalog and ODPS catalog)	Predefined and structured data sources (such as Hive data warehouse and ODPS projects)	Automatic metadata synchronization	`SET CATALOG my_hms;` `SELECT * FROM db1.sales;`
Explicit external tables (file/JDBC/ODPS external tables)	Temporary files, custom paths, or non-catalog data sources	Manual table creation and structure definition	`CREATE EXTERNAL TABLE t1 (...) LOCATION = 'oss://...';`

Notice

Catalog is a metadata abstraction of the underlying data (such as Parquet/ORC files and ODPS tables).
- Currently, HMS catalog only supports integration with Hive Metastore and cannot directly point to any HDFS path. To access files not registered in HMS, use the CREATE EXTERNAL TABLE ... LOCATION method.

Catalog types

In OceanBase, a catalog is a top-level namespace for database objects, used to organize and isolate metadata from different data sources. OceanBase supports the following Catalog types:

Catalog type	Data source	Supported table types	Typical scenarios
Internal catalog	OceanBase local storage	OceanBase tables	Core business data
ODPS catalog	Alibaba Cloud MaxCompute (ODPS)	ODPS tables	Big data offline analysis
HMS catalog	Hive Metastore (HMS)	Hive tables, Iceberg tables	Hadoop ecosystem integration

Create a catalog

The built-in internal catalog in OceanBase is used to manage the metadata of the database itself and does not require manual creation. To use external data sources, you can execute the CREATE EXTERNAL CATALOG statement to create an ODPS catalog or HMS catalog.

Access a catalog

Note

At present, only OceanBase Database in MySQL-compatible mode supports the data catalog feature.

View the catalogs in the current tenant

You can execute the SHOW CATALOGS; statement to view all catalogs created in the current tenant.

Example:

SHOW CATALOGS;

View data in a catalog

In a catalog, you can execute any query on tables in the catalog and perform cross-catalog joins.

Example:

SELECT * FROM internal.ap_db.ap_table, odps_catalog.odps_db.odps_table;

Query data across catalogs

OceanBase Database allows you to access data from multiple catalogs (such as internal, hive_catalog, and odps_catalog) in a single SQL statement to perform federated queries. You can use the following two methods to reference tables in a different context.

Core principle:

If you do not change the session context, you must use the full three-part identifier: catalog_name.database_name.table_name;
If you have changed the session context to the target catalog and database, you can directly use the abbreviated table name.

Method 1: Use the three-part identifier (no context switch required)

You can use the catalog.db.table three-part identifier to directly access tables in any catalog without switching the context.

Example 1: Query a table in the HMS catalog

-- View the current session context
SELECT CURRENT_CATALOG();   -- Returns 'internal'
SELECT DATABASE();  -- Or use SELECT SCHEMA(); to return the current database, such as 'ap_db'

-- Query a table in the Hive catalog
SELECT * FROM hive_catalog.hive_db.hive_table;

You can also use the catalog.db.table three-part identifier to directly access tables in the HMS catalog.

Example 2: Federated query (mixed sources)

-- Assume the current context is internal.ap_db
SELECT CURRENT_CATALOG();   -- Returns 'internal'
SELECT DATABASE();  -- Or use SELECT SCHEMA(); to return the current database, such as 'ap_db'

-- Query tables in both the HMS catalog and the local catalog
SELECT h.*, o.*
FROM hive_catalog.hive_db.hive_table h
JOIN internal.ap_db.ap_table o ON h.id = o.id;

Method 2: Switch the session context and use the abbreviated table name

You can explicitly switch the current session's catalog and database using SET CATALOG and USE, and then directly use the table name without any prefix.

Example: Federated query in hive_catalog.hive_db

-- Switch the catalog and database
SET CATALOG hive_catalog;
USE hive_db;

-- Verify the current context
SELECT CURRENT_CATALOG();   -- Returns 'hive_catalog'
SELECT DATABASE();  -- Returns 'hive_db'

-- You can now directly reference tables in hive_db (such as hive_table)
-- You can still use the three-part identifier to access tables in other catalogs (such as internal)
SELECT h.*, o.*
FROM hive_table h                      -- Abbreviated: current catalog + DB
JOIN internal.ap_db.ap_table o ON h.id = o.id;

Note

If you do not execute SET CATALOG and USE, directly writing SELECT * FROM hive_table; will result in an error, as the system will search for the table in the current catalog (such as internal).

Switching the context only affects subsequent statements and does not affect the connection or transaction state.

For more information, see View CATALOG.

Switch catalogs

You can execute the SET CATALOG catalog_name; statement to switch the catalog, or explicitly specify the catalog in a query:

SELECT * FROM odps_catalog.database_name.table_name;

For more information about the SQL syntax, see SET CATALOG

Supported features of HMS/ODPS catalog

Supported features of HMS catalog

Access mode

Read-only access: All objects under HMS catalog are read-only. You cannot execute DML/DDL operations such as INSERT, UPDATE, and DROP TABLE.

Supported table types

Table type	Format support	Description
Hive table	ORC, Parquet, TextFile, and CSV	- Only the ARRAY type of Parquet format is supported for complex types. For more information, see Overview of array element types.
Iceberg table	V1/V2 (Parquet is recommended)	Supports advanced features such as schema evolution (adding or dropping columns) and partition transform (such as `bucket()` and `year()`).

Hive version compatibility

Hive version	Iceberg support
Hive 4.x	Hive engine natively supports Iceberg tables. You can create or update Iceberg tables in Hive. OceanBase Database can directly access Iceberg tables through HMS catalog.
Hive 1.2.x / 2.3.x / 3.1.x	Hive engine does not support Iceberg tables. However, if Iceberg tables are created by Spark or Flink and the metadata is registered in Hive Metastore, OceanBase Database can still read the tables.

Key principle: As long as the metadata of Iceberg tables is correctly registered in HMS (regardless of the engine used to write the tables), OceanBase Database can access the tables.

Supported features of ODPS catalog

Supported operations: Analytical queries such as SELECT, JOIN, and GROUP BY;
Unsupported operations: Write operations such as INSERT, UPDATE, and DROP TABLE;
Optimization capabilities: Automatically push down partition pruning and column pruning conditions to reduce the amount of data transmitted from ODPS.

Catalog privileges

OceanBase Database supports fine-grained privilege control at the catalog level. Users must have the appropriate privileges to create, switch to, or query external catalogs (such as ODPS or HMS).

Catalog privileges are divided into two levels:
- User level (global): Applies to all catalogs (corresponding to . in MySQL-compatible mode)
- Catalog level (object): Applies only to the specified catalog

MySQL-compatible mode privilege system

Privilege	Scope	Description
`CREATE CATALOG`	User level	Allows the execution of `CREATE` and `DROP EXTERNAL CATALOG`
`USE CATALOG`	User level or catalog level	Allows `SET CATALOG`, `SHOW CATALOGS`, and `SELECT FROM catalog.db.table`

Syntax

Grant global privileges

For administrators who need to operate on any catalog:

GRANT CREATE CATALOG ON *.* TO 'user1';
GRANT USE CATALOG ON *.* TO 'user1' WITH GRANT OPTION;

REVOKE CREATE CATALOG ON *.* FROM 'user1';
REVOKE USE CATALOG ON *.* FROM 'user1';

Grant specific catalog privileges

For regular users who only need to access a specific catalog (recommended):

-- Grant the USE and SELECT privileges on the 'odps_prod' catalog
GRANT SELECT, USE CATALOG ON CATALOG odps_prod TO 'user1' WITH GRANT OPTION;

-- Revoke privileges
REVOKE USE CATALOG ON CATALOG odps_prod FROM 'user1';

Important notes:

The creator automatically has privileges

After a user successfully executes CREATE EXTERNAL CATALOG ..., they automatically gain the USE CATALOG and SELECT privileges for that catalog, without needing additional authorization.

Example:

-- user1 executes
CREATE EXTERNAL CATALOG odps_dev TYPE = 'ODPS' ...;

-- At this point, user1 can directly use the catalog
SET CATALOG odps_dev;
SELECT * FROM project_db.table1 LIMIT 1;

Catalog privileges ≠ Data access privileges

Even if a user has the USE CATALOG and SELECT privileges, querying tables within the catalog may still fail. This is because:

Catalog level: Used to connect to the Hive Metastore (to retrieve metadata such as table structure and partitions)
Location level: Used to read data files stored in file systems like HDFS

Authentication for both levels must be configured separately and must align with the security policies of the Hadoop cluster.

Example:

-- user1 has the USE CATALOG privilege on odps_prod
SET CATALOG hms_prod;
USE sales_db;

-- Querying the table fails
SELECT * FROM sales_log LIMIT 1;
-- An error is generated

-- Reason: No location was created, or user1 does not have access privileges for the location
-- Solution: Create a location with authentication information
CREATE LOCATION hms_sales URL = 'hdfs://nn:8020/sales/'
  CREDENTIAL (USER = 'username');

For more information about the authentication mechanism for HMS catalogs and HDFS storage, see HMS catalog.

Privilege verification example

-- Grant privileges
GRANT SELECT, USE CATALOG ON CATALOG ca1 TO zhangsan WITH GRANT OPTION;

-- View privileges
SHOW GRANTS FOR zhangsan;

Output:

+-------------------------------------------------------------------------+
| Grants for zhangsan@%                                                      |
+-------------------------------------------------------------------------+
| GRANT USAGE ON *.* TO 'zhangsan'                                           |
| GRANT SELECT, USE CATALOG ON CATALOG `ca1` TO 'zhangsan' WITH GRANT OPTION |
+-------------------------------------------------------------------------+

Privilege query views

You can view privileges using the following system views:

information_schema.USER_PRIVILEGES: User global privileges
oceanbase.DBA_OB_USERS / CDB_OB_USERS: User-related metadata information

OceanBase

Customer Stories

Documentation

Overview of data catalogs

Deployment and environment dependencies

Relationship between catalog and external tables

Notice

Catalog types

Create a catalog

Access a catalog

Note

View the catalogs in the current tenant

View data in a catalog

Query data across catalogs

Method 1: Use the three-part identifier (no context switch required)

Method 2: Switch the session context and use the abbreviated table name

Note

Switch catalogs

Supported features of HMS/ODPS catalog

Supported features of HMS catalog

Supported features of ODPS catalog

Catalog privileges

MySQL-compatible mode privilege system

Syntax

Privilege verification example

Privilege query views

References