Prior to OceanBase Database V4.3.5 BP2, accessing external data sources required users to manually create mapping tables using the CREATE EXTERNAL TABLE statement. If the external data source contained many tables, each one had to be mapped individually, which was inconvenient and inflexible.
To simplify querying data stored in various external sources, OceanBase Database now introduces the catalog feature. By adding a new layer to the data hierarchy, this feature enables unified management of external data sources.
Catalog types
Catalogs are divided into the following two types:
Internal catalog
It is used to manage data stored in OceanBase Database. Each tenant has one internal catalog named
internal. All databases and database objects (such as tables) created before V4.3.5 BP2 are stored in theinternaldirectory. Internal catalogs cannot be created, deleted, or modified.External catalog
It is used to connect to external data sources and obtain metadata of external data. You can directly query external data through an external catalog without importing or migrating data.
Advantages of catalogs
Multi-layered data management architecture
In addition to the database-table architecture, the catalog architecture is added to form a three-layered catalog-database-table architecture, which supports direct access to the directory tree of external data sources.
Automatic mapping and on-demand query
You only need to declare a connection to an external data source by using the
CREATE EXTERNAL CATALOGstatement. Then, you can query external tables without manually creating mapping tables.Unified query experience
When querying an external data source, the syntax is the same as that for querying a local table. Cross-source join analysis is supported.
Features of catalogs
Support for multiple data sources
It supports external data sources such as ODPS (MaxCompute).
Permission isolation
You can control the access permissions to external data sources at the catalog layer to ensure data security.
Performance optimization
Query plans based on catalogs are optimized to reduce the transmission of intermediate data and improve cross-source query efficiency.
Applicable scenarios
Joint analysis
Scenarios where joint analysis of data from OceanBase Database and external data sources is required.
Data lake/data warehouse integration
Building a unified data lake architecture to directly query massive amounts of data stored in external systems such as ODPS.
ETL pipeline optimization
Reducing data migration steps by accessing source system data directly through the Catalog, thereby lowering the complexity of ETL development.
Limitations
In the current version, data within the catalog is read-only and cannot be modified. All statements that attempt to modify objects under the catalog, such as DROP TABLE or INSERT INTO, are not permitted and will be blocked from execution.