Starting from V4.3.5 BP2, OceanBase Database supports the ODPS catalog. In V4.4.1, it further introduces the HMS catalog (Hive Metastore catalog) mechanism to seamlessly access data lakes and offline data warehouses. With this capability, users can directly query Hive tables, Iceberg tables, and offline tables in MaxCompute (ODPS) using standard SQL, all managed by the Hive Metastore (HMS).
These tables are typically stored in efficient columnar formats like Parquet or ORC in distributed storage systems such as HDFS, OSS, or S3, forming a crucial part of enterprise data lakes.
Unlike simple metadata integration, OceanBase Database is not just capable of reading data lakes, but serves as a high-performance computing engine that significantly accelerates queries on lake data. This is made possible by its built-in AP (Analytical Processing) optimization capabilities:
- Predicate Pushdown and Column Pruning: Pushes WHERE conditions and SELECT column information down to the data source (e.g., ODPS Storage API, HDFS file reading layer).
- Parallel Execution and Vectorized Computing: Automatically parallelizes large table scan tasks and efficiently processes columnar data using a vectorized engine.
- External Table Data Caching (Optional): Supports caching data files for frequently accessed static lake tables, reducing query latency from repeated queries.
With these capabilities, users can accelerate queries without migrating raw data from data lakes or data warehouses into the database:
- Real-time business tables (e.g., orders, user profiles) within OceanBase Database.
- Historical logs, wide tables, or Iceberg tables (e.g., user behavior, IoT time-series data) in data lakes, and offline tables in cloud-native data warehouses like ODPS.
This makes OceanBase Database the entry point for querying data lakes, enabling a true "query once, access all data" federated analytics experience. With just one SQL statement, users can simultaneously access data from both the database and the data lake, greatly simplifying cross-source analytics workflows and making data lake usage more straightforward and efficient.
Deployment and environment dependencies
| Cross IDC deployment recommendation | HDFS storage dependency |
|---|---|
| If HMS and OBServer are deployed across IDCs, network latency can cause metadata retrieval to take up to seconds, significantly increasing query latency. It is recommended to deploy HMS and OBServer within the same low-latency network. | When the underlying storage is HDFS, Java runtime environment (JDK 8/11) must be deployed on OBServer nodes, and Java support must be enabled (enable_java = true). For more information, see Deploy OceanBase Java SDK environment. |
Relationship between catalog and external tables
OceanBase provides two modes for accessing external data:
| Mode | Applicable scenario | Metadata management | Typical usage |
|---|---|---|---|
| Catalog mode (such as HMS catalog and ODPS catalog) |
Predefined and structured data sources (such as Hive data warehouse and ODPS projects) | Automatic metadata synchronization | SET CATALOG my_hms;SELECT * FROM db1.sales; |
| Explicit external tables (file/JDBC/ODPS external tables) |
Temporary files, custom paths, or non-catalog data sources | Manual table creation and structure definition | CREATE EXTERNAL TABLE t1 (...) LOCATION = 'oss://...'; |
Notice
- Catalog is a metadata abstraction of the underlying data (such as Parquet/ORC files and ODPS tables).
- Currently, HMS catalog only supports integration with Hive Metastore and cannot directly point to any HDFS path. To access files not registered in HMS, use the
CREATE EXTERNAL TABLE ... LOCATIONmethod.
- Currently, HMS catalog only supports integration with Hive Metastore and cannot directly point to any HDFS path. To access files not registered in HMS, use the
Catalog types
In OceanBase, a catalog is a top-level namespace for database objects, used to organize and isolate metadata from different data sources. OceanBase supports the following Catalog types:
| Catalog type | Data source | Supported table types | Typical scenarios |
|---|---|---|---|
| Internal catalog | OceanBase local storage | OceanBase tables | Core business data |
| ODPS catalog | Alibaba Cloud MaxCompute (ODPS) | ODPS tables | Big data offline analysis |
| HMS catalog | Hive Metastore (HMS) | Hive tables, Iceberg tables | Hadoop ecosystem integration |
Create a catalog
The built-in internal catalog in OceanBase is used to manage the metadata of the database itself and does not require manual creation. To use external data sources, you can execute the CREATE EXTERNAL CATALOG statement to create an ODPS catalog or HMS catalog.
Access a catalog
Note
At present, only OceanBase Database in MySQL-compatible mode supports the data catalog feature.
View the catalogs in the current tenant
You can execute the SHOW CATALOGS; statement to view all catalogs created in the current tenant.
Example:
SHOW CATALOGS;
View data in a catalog
In a catalog, you can execute any query on tables in the catalog and perform cross-catalog joins.
Example:
SELECT * FROM internal.ap_db.ap_table, odps_catalog.odps_db.odps_table;
Query data across catalogs
OceanBase Database allows you to access data from multiple catalogs (such as internal, hive_catalog, and odps_catalog) in a single SQL statement to perform federated queries. You can use the following two methods to reference tables in a different context.
Core principle:
- If you do not change the session context, you must use the full three-part identifier:
catalog_name.database_name.table_name; - If you have changed the session context to the target catalog and database, you can directly use the abbreviated table name.
Method 1: Use the three-part identifier (no context switch required)
You can use the catalog.db.table three-part identifier to directly access tables in any catalog without switching the context.
Example 1: Query a table in the HMS catalog
-- View the current session context
SELECT CURRENT_CATALOG(); -- Returns 'internal'
SELECT DATABASE(); -- Or use SELECT SCHEMA(); to return the current database, such as 'ap_db'
-- Query a table in the Hive catalog
SELECT * FROM hive_catalog.hive_db.hive_table;
You can also use the catalog.db.table three-part identifier to directly access tables in the HMS catalog.
Example 2: Federated query (mixed sources)
-- Assume the current context is internal.ap_db
SELECT CURRENT_CATALOG(); -- Returns 'internal'
SELECT DATABASE(); -- Or use SELECT SCHEMA(); to return the current database, such as 'ap_db'
-- Query tables in both the HMS catalog and the local catalog
SELECT h.*, o.*
FROM hive_catalog.hive_db.hive_table h
JOIN internal.ap_db.ap_table o ON h.id = o.id;
Method 2: Switch the session context and use the abbreviated table name
You can explicitly switch the current session's catalog and database using SET CATALOG and USE, and then directly use the table name without any prefix.
Example: Federated query in hive_catalog.hive_db
-- Switch the catalog and database
SET CATALOG hive_catalog;
USE hive_db;
-- Verify the current context
SELECT CURRENT_CATALOG(); -- Returns 'hive_catalog'
SELECT DATABASE(); -- Returns 'hive_db'
-- You can now directly reference tables in hive_db (such as hive_table)
-- You can still use the three-part identifier to access tables in other catalogs (such as internal)
SELECT h.*, o.*
FROM hive_table h -- Abbreviated: current catalog + DB
JOIN internal.ap_db.ap_table o ON h.id = o.id;
Note
- If you do not execute
SET CATALOGandUSE, directly writingSELECT * FROM hive_table;will result in an error, as the system will search for the table in the current catalog (such as internal).
- Switching the context only affects subsequent statements and does not affect the connection or transaction state.
For more information, see View CATALOG.
Switch catalogs
You can execute the SET CATALOG catalog_name; statement to switch the catalog, or explicitly specify the catalog in a query:
SELECT * FROM odps_catalog.database_name.table_name;
For more information about the SQL syntax, see SET CATALOG
Supported features of HMS/ODPS catalog
Supported features of HMS catalog
Access mode
- Read-only access: All objects under HMS catalog are read-only. You cannot execute DML/DDL operations such as
INSERT,UPDATE, andDROP TABLE.
Supported table types
| Table type | Format support | Description |
|---|---|---|
| Hive table | ORC, Parquet, TextFile, and CSV | - Only the ARRAY type of Parquet format is supported for complex types. For more information, see Overview of array element types. |
| Iceberg table | V1/V2 (Parquet is recommended) | Supports advanced features such as schema evolution (adding or dropping columns) and partition transform (such as bucket() and year()). |
Hive version compatibility
| Hive version | Iceberg support |
|---|---|
| Hive 4.x | Hive engine natively supports Iceberg tables. You can create or update Iceberg tables in Hive. OceanBase Database can directly access Iceberg tables through HMS catalog. |
| Hive 1.2.x / 2.3.x / 3.1.x | Hive engine does not support Iceberg tables. However, if Iceberg tables are created by Spark or Flink and the metadata is registered in Hive Metastore, OceanBase Database can still read the tables. |
Key principle: As long as the metadata of Iceberg tables is correctly registered in HMS (regardless of the engine used to write the tables), OceanBase Database can access the tables.
Supported features of ODPS catalog
- Supported operations: Analytical queries such as
SELECT,JOIN, andGROUP BY; - Unsupported operations: Write operations such as
INSERT,UPDATE, andDROP TABLE; - Optimization capabilities: Automatically push down partition pruning and column pruning conditions to reduce the amount of data transmitted from ODPS.
Catalog privileges
OceanBase Database supports fine-grained privilege control at the catalog level. Users must have the appropriate privileges to create, switch to, or query external catalogs (such as ODPS or HMS).
Catalog privileges are divided into two levels:
- User level (global): Applies to all catalogs (corresponding to . in MySQL-compatible mode)
- Catalog level (object): Applies only to the specified catalog
MySQL-compatible mode privilege system
| Privilege | Scope | Description |
|---|---|---|
CREATE CATALOG |
User level | Allows the execution of CREATE and DROP EXTERNAL CATALOG |
USE CATALOG |
User level or catalog level | Allows SET CATALOG, SHOW CATALOGS, and SELECT FROM catalog.db.table |
Syntax
Grant global privileges
For administrators who need to operate on any catalog:
GRANT CREATE CATALOG ON *.* TO 'user1';
GRANT USE CATALOG ON *.* TO 'user1' WITH GRANT OPTION;
REVOKE CREATE CATALOG ON *.* FROM 'user1';
REVOKE USE CATALOG ON *.* FROM 'user1';
Grant specific catalog privileges
For regular users who only need to access a specific catalog (recommended):
-- Grant the USE and SELECT privileges on the 'odps_prod' catalog
GRANT SELECT, USE CATALOG ON CATALOG odps_prod TO 'user1' WITH GRANT OPTION;
-- Revoke privileges
REVOKE USE CATALOG ON CATALOG odps_prod FROM 'user1';
Important notes:
- The creator automatically has privileges
After a user successfully executes CREATE EXTERNAL CATALOG ..., they automatically gain the USE CATALOG and SELECT privileges for that catalog, without needing additional authorization.
Example:
-- user1 executes
CREATE EXTERNAL CATALOG odps_dev TYPE = 'ODPS' ...;
-- At this point, user1 can directly use the catalog
SET CATALOG odps_dev;
SELECT * FROM project_db.table1 LIMIT 1;
- Catalog privileges ≠ Data access privileges
Even if a user has the USE CATALOG and SELECT privileges, querying tables within the catalog may still fail. This is because:
- Catalog level: Used to connect to the Hive Metastore (to retrieve metadata such as table structure and partitions)
- Location level: Used to read data files stored in file systems like HDFS
Authentication for both levels must be configured separately and must align with the security policies of the Hadoop cluster.
Example:
-- user1 has the USE CATALOG privilege on odps_prod
SET CATALOG hms_prod;
USE sales_db;
-- Querying the table fails
SELECT * FROM sales_log LIMIT 1;
-- An error is generated
-- Reason: No location was created, or user1 does not have access privileges for the location
-- Solution: Create a location with authentication information
CREATE LOCATION hms_sales URL = 'hdfs://nn:8020/sales/'
CREDENTIAL (USER = 'username');
For more information about the authentication mechanism for HMS catalogs and HDFS storage, see HMS catalog.
Privilege verification example
-- Grant privileges
GRANT SELECT, USE CATALOG ON CATALOG ca1 TO zhangsan WITH GRANT OPTION;
-- View privileges
SHOW GRANTS FOR zhangsan;
Output:
+-------------------------------------------------------------------------+
| Grants for zhangsan@% |
+-------------------------------------------------------------------------+
| GRANT USAGE ON *.* TO 'zhangsan' |
| GRANT SELECT, USE CATALOG ON CATALOG `ca1` TO 'zhangsan' WITH GRANT OPTION |
+-------------------------------------------------------------------------+
Privilege query views
You can view privileges using the following system views:
information_schema.USER_PRIVILEGES: User global privilegesoceanbase.DBA_OB_USERS/CDB_OB_USERS: User-related metadata information