Before OceanBase Database V4.3.5 BP2, users needed to manually create external tables using CREATE EXTERNAL TABLE to access external data sources. When external data sources contain a large number of tables, this manual mapping process lacks flexibility.
To facilitate querying data stored in various external data sources, OceanBase Database introduced the catalog (data directory) feature, which adds a new data layer to enable unified management of external data sources.
Catalog types
Catalogs are divided into the following two types:
Internal catalog (internal data directory)
Used to manage data stored within OceanBase Database. Each tenant has exactly one internal data directory named
internal. All databases and objects (such as tables) created before V4.3.5 BP2 are considered stored ininternal. Internal catalogs cannot be created, deleted, or modified.External catalog (external data directory)
Used to connect to external data sources and retrieve metadata about external data. You can directly query external data using an external catalog without needing to import or migrate the data.
Advantages of catalogs
Multi-layered data management architecture
In addition to the existing Database-Table hierarchy, a new catalog layer is added, forming a three-layer structure: Catalog-Database-Table. This allows direct access to the directory tree of external data sources.
Automatic mapping and on-demand querying
By simply declaring the connection to an external data source using
CREATE EXTERNAL CATALOG, you can directly query external tables without the need to manually create mapping tables.Unified query experience
When querying external data sources, the syntax is consistent with local tables, and cross-source join analysis is supported.
Features of catalogs
Support for multiple data sources
Supports external data sources such as ODPS (MaxCompute) and HMS (Hive, Iceberg).
Permission isolation
Controls access permissions to external data sources at the catalog level, ensuring data security.
Performance optimization
Query plan optimization based on catalogs reduces intermediate data transmission, improving cross-source query efficiency.
Applicable scenarios
Joint analysis
Scenarios requiring joint analysis of data from OceanBase Database and external data sources.
Data lake/warehouse integration
Building a unified data lake architecture to directly query large volumes of data stored in external systems such as ODPS and Hive.
ETL pipeline optimization
Reduces data migration steps by allowing direct access to source system data through catalogs, lowering ETL development complexity.
Limitations
In the current version, data within catalogs is read-only (read-only) and cannot be modified. Any modification statements on objects under catalogs, such as DROP TABLE and INSERT INTO, will be prohibited.