OceanBase Database supports the ODPS Catalog since V4.3.5 BP2 and further introduces the HMS Catalog (Hive Metastore Catalog) mechanism in V4.4.1. This allows seamless access to data lakes and offline data warehouses. With this capability, users can directly query Hive tables managed by the Hive Metastore (HMS), Iceberg tables, and offline tables in MaxCompute (ODPS) using standard SQL.
These tables are typically stored in efficient columnar formats like Parquet or ORC in distributed storage systems such as HDFS, OSS, or S3, forming a crucial part of enterprise data lakes.
Unlike simple metadata integration, OceanBase Database does more than just "read" data lakes. It serves as a high-performance computing engine, significantly accelerating queries on lake data. This is made possible by several built-in AP optimizations:
- Predicate Pushdown and Column Pruning: Pushes WHERE conditions and SELECT column information down to data sources (e.g., ODPS Storage API, HDFS file reading layer).
- Parallel Execution and Vectorized Computing: Automatically parallelizes large table scans and efficiently processes columnar data using vectorized engines.
- External Table Data Caching (Optional): Supports caching data files for frequently accessed static lake tables, reducing query latency.
With these capabilities, users can accelerate queries without migrating raw data from data lakes or data warehouses into the database:
- Real-time business tables within OceanBase Database (e.g., orders, user profiles).
- Historical logs, wide tables, or Iceberg tables in data lakes (e.g., user behavior, IoT time-series data), as well as offline tables in cloud-native data warehouses like ODPS.
This makes OceanBase Database the entry point for querying data lakes, delivering a true federated analytics experience with "query once, access all data." A single SQL statement can access data from both the database and the data lake, greatly simplifying cross-source analytics and making data lake usage more straightforward and efficient.
Deployment and environment dependencies
Cross-IDC deployment recommendation |
HDFS storage dependency |
|---|---|
| If HMS and OBServer are deployed across IDCs, network latency may cause metadata retrieval to take several seconds, significantly increasing query latency. It is recommended to deploy HMS and OBServer within the same low-latency network. | When the underlying storage is HDFS, you must deploy the Java runtime environment (JDK 8/11) on OBServer nodes and enable Java support (enable_java = true). For more information, see Deploy OceanBase Java SDK environment. |
Relationship between catalog and external tables
OceanBase Database provides two modes for accessing external data:
Mode |
Applicable scenarios |
Metadata management |
Typical usage |
|---|---|---|---|
| Catalog mode (such as HMS catalog or ODPS catalog) |
Predefined and structured data sources, such as Hive data warehouses and ODPS projects | Automatically synchronize metadata | SET CATALOG my_hms;SELECT * FROM db1.sales; |
| Explicit external tables (file/JDBC/ODPS external tables) |
Temporary files, custom paths, or non-catalog data sources | Manually create tables and define structures | CREATE EXTERNAL TABLE t1 (...) LOCATION = 'oss://...'; |
Notice
- A catalog is a metadata abstraction of the underlying data (such as Parquet/ORC files and ODPS tables).
- Currently, HMS catalogs can only be connected to Hive Metastores and cannot directly point to arbitrary HDFS paths. If you need to access files not registered with HMS, use the
CREATE EXTERNAL TABLE ... LOCATIONstatement.
- Currently, HMS catalogs can only be connected to Hive Metastores and cannot directly point to arbitrary HDFS paths. If you need to access files not registered with HMS, use the
- If you do not switch the session context, you must use the full three-part identifier:
catalog_name.database_name.table_name; - If you switch to the target catalog and database, you can directly use the shortened table name.
- If you do not execute
SET CATALOGandUSE, and directly writeSELECT * FROM hive_table;, an error will occur because the system will search for the table in the current catalog, such as internal. - Switching the context only affects subsequent statements and does not affect the connection or transaction status.
- Read-only access: All objects in the HMS Catalog are read-only. You cannot execute DML/DDL operations such as
INSERT,UPDATE, andDROP TABLE. - Supported operations: Analytical queries such as
SELECT,JOIN, andGROUP BY; - Unsupported operations: Write operations and DDL operations such as
INSERT,UPDATE, andDROP TABLE; - Optimization capabilities: Automatically push down partition pruning and column pruning conditions to reduce the amount of data transmitted from ODPS.
Catalog permissions are divided into two levels:
- User Level (global level): Applies to all catalogs (corresponding to . in MySQL mode)
- Catalog Level (object level): Applies only to the specified catalog
- The creator automatically has permissions
- Catalog permissions ≠ Data access permissions
- Catalog level: Used to connect to the Hive Metastore (to retrieve metadata such as table structure and partitions)
- Location level: Used to read data files stored in file systems like HDFS
information_schema.USER_PRIVILEGES: User global permissionsoceanbase.DBA_OB_USERS/CDB_OB_USERS: User-related metadata information
Catalog types
In OceanBase Database, a catalog is a top-level namespace for database objects. It is used to organize and isolate metadata from different data sources. OceanBase Database supports the following catalog types:
Catalog type |
Data source |
Supported table type |
Typical scenarios |
|---|---|---|---|
| Internal catalog | OceanBase Database local storage | OceanBase Database tables | Core business data |
| ODPS catalog | Alibaba Cloud MaxCompute (ODPS) | ODPS tables | Big data offline analysis |
| HMS catalog | Hive Metastore (HMS) | Hive tables and Iceberg tables | Hadoop ecosystem integration |
Create a catalog
The internal catalog built in OceanBase Database is used to manage the metadata of the database itself and does not need to be manually created. If you want to use external data sources, you can use the CREATE EXTERNAL CATALOG statement to create an ODPS catalog or HMS catalog.
Access a catalog
Note
Currently, only OceanBase Database in MySQL mode supports the catalog feature.
View catalogs in the current tenant
You can run the SHOW CATALOGS; statement to view all catalogs created in the current tenant.
Example:
SHOW CATALOGS;
Query data in a catalog
In a catalog, you can execute any query operation on tables in the catalog, and you can also join tables in different catalogs.
Here is an example:
SELECT * FROM internal.ap_db.ap_table, odps_catalog.odps_db.odps_table;
Query data across catalogs
OceanBase Database allows you to access data in multiple catalogs (such as internal, hive_catalog, and odps_catalog) in a single SQL statement, enabling federated queries. You can use the following two methods to reference tables in a different context:
Core principles:
Method 1: Use the three-part identifier (without switching the context)
Regardless of the current catalog, you can use the catalog.db.table three-part identifier to directly access tables in any catalog.
Example 1: Query a table in the HMS catalog
-- View the current session context
SELECT CURRENT_CATALOG(); -- Returns 'internal'
SELECT DATABASE(); -- Or use SELECT SCHEMA(); to return the current database, such as 'ap_db'
-- Query a table in the Hive catalog
SELECT * FROM hive_catalog.hive_db.hive_table;
You can also use the catalog.db.table three-part identifier to directly access local tables in the HMS catalog.
Example 2: Federated query (mixed sources)
-- Assume the current context is internal.ap_db
SELECT CURRENT_CATALOG(); -- Returns 'internal'
SELECT DATABASE(); -- Or use SELECT SCHEMA(); to return the current database, such as 'ap_db'
-- Query tables in the HMS catalog and the local catalog
SELECT h.*, o.*
FROM hive_catalog.hive_db.hive_table h
JOIN internal.ap_db.ap_table o ON h.id = o.id;
Method 2: Switch the session context and use the shortened table name
You can explicitly switch the current session's catalog and database using SET CATALOG and USE, and then directly use the table name without prefixes.
Example: Federated query in hive_catalog.hive_db
-- Switch the catalog and database
SET CATALOG hive_catalog;
USE hive_db;
-- Verify the current context
SELECT CURRENT_CATALOG(); -- Returns 'hive_catalog'
SELECT DATABASE(); -- Returns 'hive_db'
-- You can now directly reference tables in hive_db, such as hive_table.
-- You can still use the three-part identifier to access tables in other catalogs, such as internal.
SELECT h.*, o.*
FROM hive_table h -- Shortened: current catalog + DB
JOIN internal.ap_db.ap_table o ON h.id = o.id;
Note
For more information, see View a catalog
Switch catalogs
You can run the SET CATALOG catalog_name; statement to switch to a catalog, or explicitly specify the catalog in a query:
SELECT * FROM odps_catalog.database_name.table_name;
For more information about the SQL syntax, see SET CATALOG
HMS/ODPS Catalog support details
HMS Catalog support details
Access mode
Supported table types
Table type |
Format support |
Description |
|---|---|---|
| Hive table | ORC, Parquet, TextFile, and CSV | - Complex types are supported only for the ARRAY type in Parquet format. For more information, see Overview of array element types |
| Iceberg table | V1 / V2 (Parquet is recommended) | Supports advanced features such as schema evolution (adding or dropping columns) and partition transformation (such as bucket() and year()). |
Hive version compatibility
Hive version |
Iceberg support |
|---|---|
| Hive 4.x | Hive engines natively support Iceberg tables. You can create or update Iceberg tables in Hive. OceanBase Database can directly access Iceberg tables through the HMS Catalog. |
| Hive 1.2.x / 2.3.x / 3.1.x | Hive engines do not support Iceberg tables. However, if Iceberg tables are created by engines such as Spark or Flink and the metadata is registered in the Hive Metastore, OceanBase Database can still read the tables. |
Key principle: As long as the metadata of Iceberg tables is correctly registered in HMS (regardless of the engine used to write the tables), OceanBase Database can access the tables.
ODPS Catalog support details
Catalog permission management
OceanBase Database supports fine-grained permission control at the catalog level. Users must have the appropriate permissions to create, switch to, or query external catalogs (such as ODPS or HMS).
MySQL mode permission system
Permission |
Scope |
Purpose |
|---|---|---|
CREATE CATALOG |
User Level | Allows the execution of CREATE and DROP EXTERNAL CATALOG |
USE CATALOG |
User Level or Catalog Level | Allows SET CATALOG, SHOW CATALOGS, and SELECT FROM catalog.db.table |
Syntax
Grant global permissions
This is for administrators who need to operate on any catalog:
GRANT CREATE CATALOG ON *.* TO 'user1';
GRANT USE CATALOG ON *.* TO 'user1' WITH GRANT OPTION;
REVOKE CREATE CATALOG ON *.* FROM 'user1';
REVOKE USE CATALOG ON *.* FROM 'user1';
Grant specific catalog permissions
This is for regular users who only need to access a specific catalog (recommended approach):
-- Grant the USE and SELECT permissions on the 'odps_prod' catalog
GRANT SELECT, USE CATALOG ON CATALOG odps_prod TO 'user1' WITH GRANT OPTION;
-- Revoke the permissions
REVOKE USE CATALOG ON CATALOG odps_prod FROM 'user1';
Important notes:
After a user successfully executes CREATE EXTERNAL CATALOG ..., they automatically gain the USE CATALOG and SELECT permissions for that catalog, without needing additional authorization.
Example:
-- user1 executes
CREATE EXTERNAL CATALOG odps_dev TYPE = 'ODPS' ...;
-- At this point, user1 can directly use this catalog
SET CATALOG odps_dev;
SELECT * FROM project_db.table1 LIMIT 1;
Even if a user has the USE CATALOG and SELECT permissions, querying tables within the catalog may still fail. This is because:
Authentication for both levels must be configured separately and must align with the security policies of the Hadoop cluster.
Example:
-- user1 has the USE CATALOG permission on odps_prod
SET CATALOG hms_prod;
USE sales_db;
-- Querying the table fails
SELECT * FROM sales_log LIMIT 1;
-- An error is generated
-- Reason: No location was created, or user1 does not have access to this location
-- Solution: Create a location with authentication information
CREATE LOCATION hms_sales URL = 'hdfs://nn:8020/sales/'
CREDENTIAL (USER = 'username');
For more information about the authentication mechanism for HMS catalogs and HDFS storage, see HMS Catalog.
Permission verification example
-- Grant permissions
GRANT SELECT, USE CATALOG ON CATALOG ca1 TO zhangsan WITH GRANT OPTION;
-- View permissions
SHOW GRANTS FOR zhangsan;
Output:
+-------------------------------------------------------------------------+
| Grants for zhangsan@% |
+-------------------------------------------------------------------------+
| GRANT USAGE ON *.* TO 'zhangsan' |
| GRANT SELECT, USE CATALOG ON CATALOG `ca1` TO 'zhangsan' WITH GRANT OPTION |
+-------------------------------------------------------------------------+
Permission query views
You can use the following system views to view permissions:
