OceanBase Database supports the ODPS Catalog since V4.3.5 BP2 and introduces the HMS Catalog (Hive Metastore Catalog) mechanism in V4.4.1. This allows seamless access to data lakes and offline data warehouses. With this feature, users can directly query Hive tables, Iceberg tables, and offline tables in MaxCompute (ODPS) using standard SQL, all managed by Hive Metastore (HMS).
These tables are typically stored in efficient columnar formats such as Parquet or ORC in distributed storage systems like HDFS, OSS, or S3, forming a crucial part of the enterprise's core data lake.
Unlike simple metadata integration, OceanBase Database is not just capable of reading data lakes but also serves as a high-performance computing engine, significantly accelerating queries on lake data. This is made possible by its built-in AP (Analytical Processing) optimization capabilities:
- Predicate Pushdown and Column Pruning: Pushes WHERE conditions and SELECT column information down to the data source (such as ODPS Storage API or HDFS file reading layer).
- Parallel Execution and Vectorized Computing: Automatically parallelizes large table scan tasks and efficiently processes columnar data using a vectorized engine.
- External Table Data Caching (Optional): Supports caching data files for frequently accessed static lake tables, reducing query latency.
With these capabilities, users can achieve query acceleration in OceanBase Database without migrating raw data from data lakes or data warehouses into the database:
- Real-time business tables (such as orders and user profiles) within OceanBase Database.
- Historical logs, wide tables, or Iceberg tables (such as user behavior and IoT time-series data) in data lakes, as well as offline tables in cloud-native data warehouses like ODPS.
This makes OceanBase Database the entry point for querying data lakes, enabling the federated analytics experience of "querying data from anywhere". With just one SQL statement, users can access data from both the database and the data lake, greatly simplifying cross-source analytics and making data lake usage more straightforward and efficient.
Deployment and environment dependencies
Cross-IDC deployment recommendation |
HDFS storage dependency |
|---|---|
| If HMS and OBServer are deployed across IDCs, network latency may cause metadata retrieval to take up to seconds, significantly increasing query latency. It is recommended to deploy HMS and OBServer within the same low-latency network. | When the underlying storage is HDFS, Java runtime environment (JDK 8/11) must be deployed on OBServer nodes in advance, and Java support must be enabled (enable_java = true). For more information, see Deploy the OceanBase Java SDK environment. |
Relationship between catalog and external tables
OceanBase provides two modes for accessing external data:
Mode |
Applicable scenarios |
Metadata management |
Typical usage |
|---|---|---|---|
| Catalog mode (such as HMS Catalog and ODPS Catalog) |
Predefined, structured data sources (such as Hive data warehouses and ODPS projects) | Automatically synchronized metadata | SET CATALOG my_hms;SELECT * FROM db1.sales; |
| Explicit external tables (file/JDBC/ODPS external tables) |
Temporary files, custom paths, or non-Catalog data sources | Manually create tables to define structures | CREATE EXTERNAL TABLE t1 (...) LOCATION = 'oss://...'; |
Notice
- A catalog is a metadata abstraction of the underlying data (such as Parquet/ORC files and ODPS tables).
- Currently, HMS Catalog only supports integration with Hive Metastore and cannot directly point to any HDFS path. To access files not registered with HMS, use the
CREATE EXTERNAL TABLE ... LOCATIONmethod.
- Currently, HMS Catalog only supports integration with Hive Metastore and cannot directly point to any HDFS path. To access files not registered with HMS, use the
Catalog types
In OceanBase, a catalog is a top-level namespace for organizing and isolating metadata from different data sources. OceanBase supports the following catalog types:
Catalog type |
Data source |
Supported table type |
Typical scenarios |
|---|---|---|---|
| Internal catalog | OceanBase local storage | OceanBase tables | Core business data |
| ODPS catalog | Alibaba Cloud MaxCompute (ODPS) | ODPS tables | Big data offline analysis |
| HMS catalog | Hive Metastore (HMS) | Hive tables, Iceberg tables | Hadoop ecosystem integration |
Create a catalog
The built-in internal catalog in OceanBase is used to manage the metadata of the database itself and does not need to be manually created. To use external data sources, you can execute the CREATE EXTERNAL CATALOG statement to create an ODPS catalog or an HMS catalog.
Access a catalog
Note
Currently, only OceanBase Database in MySQL mode supports the catalog feature.
View catalogs in the current tenant
You can run the SHOW CATALOGS; statement to view all catalogs created in the current tenant.
Example:
SHOW CATALOGS;
Query data in a catalog
In a catalog, you can execute any query operation as in a database. You can also perform a JOIN operation across catalogs.
Example:
SELECT * FROM internal.ap_db.ap_table, odps_catalog.odps_db.odps_table;
Query data across catalogs
OceanBase Database allows you to access data from multiple catalogs (such as internal, hive_catalog, and odps_catalog) in a single SQL statement to perform a JOIN operation. You can use the following two methods to reference tables in a different context:
Core principles:
- If you do not switch the session context, you must use the full three-part identifier:
catalog_name.database_name.table_name; - If you have switched to the target catalog and database, you can directly use the abbreviated table name.
Method 1: Use the three-part identifier (no context switch required)
You can use the catalog.db.table three-part identifier to directly access tables in any catalog without switching the session context.
Example 1: Query a table in the default internal catalog
-- View the current session context
SELECT CURRENT_CATALOG(); -- Returns 'internal'
SELECT DATABASE(); -- Or use SELECT SCHEMA(); to return the current database, such as 'ap_db'
-- Query a table in a different catalog
SELECT * FROM hive_catalog.hive_db.hive_table;
You can also use the catalog.db.table three-part identifier to directly access tables in the HMS catalog.
Example 2: Perform a JOIN operation across catalogs (mixed sources)
-- Assume the current context is internal.ap_db
SELECT CURRENT_CATALOG(); -- Returns 'internal'
SELECT DATABASE(); -- Or use SELECT SCHEMA(); to return the current database, such as 'ap_db'
-- Query tables in both the HMS catalog and the local catalog
SELECT h.*, o.*
FROM hive_catalog.hive_db.hive_table h
JOIN internal.ap_db.ap_table o ON h.id = o.id;
Method 2: Switch the session context and use the abbreviated table name
You can explicitly switch the current session's catalog and database using SET CATALOG and USE, and then directly use the table name without any prefix.
Example: Perform a JOIN operation in the hive_catalog.hive_db catalog
-- Switch the catalog and database
SET CATALOG hive_catalog;
USE hive_db;
-- Verify the current context
SELECT CURRENT_CATALOG(); -- Returns 'hive_catalog'
SELECT DATABASE(); -- Returns 'hive_db'
-- You can now directly reference tables in hive_db (such as hive_table)
-- You can still use the three-part identifier to access tables in other catalogs (such as internal)
SELECT h.*, o.*
FROM hive_table h -- Abbreviated: current catalog + DB
JOIN internal.ap_db.ap_table o ON h.id = o.id;
Note
- If you do not execute
SET CATALOGandUSE, and directly writeSELECT * FROM hive_table;, an error will occur because the system will search for the table in the current catalog (such as internal).
- Switching the context only affects subsequent statements and does not impact the connection or transaction status.
For more information, see View a catalog.
Switch a catalog
You can run the SET CATALOG catalog_name; statement to switch to a catalog, or explicitly specify it in a query:
SELECT * FROM odps_catalog.database_name.table_name;
For more information about the SQL syntax, see SET CATALOG.
HMS/ODPS Catalog support details
HMS Catalog support details
Access mode
- Read-only access: All objects in the HMS Catalog are read-only. You cannot execute DML/DDL operations such as
INSERT,UPDATE, andDROP TABLE.
Supported table types
Table type |
Format support |
Description |
|---|---|---|
| Hive table | ORC, Parquet, TextFile, and CSV | - Complex types are supported only for the ARRAY type in Parquet format. For more information, see Overview of array element types |
| Iceberg table | V1, V2, or V3 (Parquet is recommended) | Supports advanced features such as schema evolution (column addition and deletion) and partition transformation (such as bucket() and year()). |
Hive version compatibility
Hive version |
Iceberg support |
|---|---|
| Hive 4.x | Hive natively supports Iceberg tables. You can create or update Iceberg tables in Hive. OceanBase Database can directly access Iceberg tables through the HMS Catalog. |
| Hive 1.2.x, 2.3.x, or 3.1.x | Hive does not support Iceberg tables. However, if an Iceberg table is created by an engine such as Spark or Flink and its metadata is registered in the Hive Metastore, OceanBase Database can still read the table. |
Key principle: As long as the metadata of an Iceberg table is correctly registered in the HMS Catalog (regardless of the engine used to write the table), OceanBase Database can access the table.
ODPS Catalog support details
- Supported operations: Analytical queries such as
SELECT,JOIN, andGROUP BY; - Unsupported operations: Write operations such as
INSERT,UPDATE, andDROP TABLE; - Optimization capabilities: Automatic partition pruning and column pruning to reduce the amount of data transmitted from ODPS.
Manage catalog privileges
OceanBase Database supports fine-grained privilege management at the catalog level. Users must have the appropriate privileges to create, switch, or query external catalogs (such as ODPS and HMS).
Catalog privileges are divided into two levels:
- User Level (global level): Applies to all catalogs (corresponding to . in MySQL mode)
- Catalog Level (object level): Applies only to the specified catalog
Privilege system in MySQL mode
Privilege |
Scope |
Description |
|---|---|---|
CREATE CATALOG |
User Level | Grants the privilege to execute CREATE and DROP EXTERNAL CATALOG |
USE CATALOG |
User Level or Catalog Level | Grants the privilege to execute SET CATALOG, SHOW CATALOGS, and SELECT FROM catalog.db.table |
Syntax
Grant global privileges
This is applicable to administrators who need to operate on any catalog:
GRANT CREATE CATALOG ON *.* TO 'user1';
GRANT USE CATALOG ON *.* TO 'user1' WITH GRANT OPTION;
REVOKE CREATE CATALOG ON *.* FROM 'user1';
REVOKE USE CATALOG ON *.* FROM 'user1';
Grant specific catalog privileges
This is applicable to regular users who only need to access a specific catalog (recommended approach):
-- Grant the USE and SELECT privileges on the catalog 'odps_prod'
GRANT SELECT, USE CATALOG ON CATALOG odps_prod TO 'user1' WITH GRANT OPTION;
-- Revoke the privileges
REVOKE USE CATALOG ON CATALOG odps_prod FROM 'user1';
Important notes:
- The creator automatically has privileges
After a user successfully executes CREATE EXTERNAL CATALOG ..., they automatically gain the USE CATALOG and SELECT privileges on that catalog, without needing additional authorization.
Example:
-- user1 executes
CREATE EXTERNAL CATALOG odps_dev TYPE = 'ODPS' ...;
-- At this point, user1 can directly use the catalog
SET CATALOG odps_dev;
SELECT * FROM project_db.table1 LIMIT 1;
- Catalog privileges ≠ Data access privileges
Even if a user has the USE CATALOG and SELECT privileges, querying tables within the catalog may still fail. This is because:
- Catalog level: Used to connect to Hive Metastore (to retrieve table structure, partition, etc., metadata)
- Location level: Used to read data files stored in file systems like HDFS
Authentication for both levels must be configured separately and must align with the security policies of the Hadoop cluster.
Example:
-- user1 has the USE CATALOG privilege on odps_prod
SET CATALOG hms_prod;
USE sales_db;
-- Querying the table fails
SELECT * FROM sales_log LIMIT 1;
-- An error is generated
-- Reason: No location was created, or user1 does not have access to the location
-- Solution: Create a location with authentication information
CREATE LOCATION hms_sales URL = 'hdfs://nn:8020/sales/'
CREDENTIAL (USER = 'username');
For more information about the authentication mechanism for HMS catalogs and HDFS storage, see HMS Catalog.
Privilege verification example
-- Grant privileges
GRANT SELECT, USE CATALOG ON CATALOG ca1 TO zhangsan WITH GRANT OPTION;
-- View privileges
SHOW GRANTS FOR zhangsan;
Output:
+-------------------------------------------------------------------------+
| Grants for zhangsan@% |
+-------------------------------------------------------------------------+
| GRANT USAGE ON *.* TO 'zhangsan' |
| GRANT SELECT, USE CATALOG ON CATALOG `ca1` TO 'zhangsan' WITH GRANT OPTION |
+-------------------------------------------------------------------------+
Privilege query views
You can use the following system views to query privileges:
information_schema.USER_PRIVILEGES: User global privilegesoceanbase.DBA_OB_USERS/CDB_OB_USERS: User-related metadata information
References
- Overview of Catalog
- Create a Catalog
- View a Catalog
- Drop a Catalog
- SET CATALOG
- CREATE EXTERNAL CATALOG
- DROP CATALOG
