This topic describes how to create an external catalog by using SQL statements.
Privilege requirements
To create an external catalog, the current user must have the CREATE CATALOG privilege. For more information about the privileges of OceanBase Database, see Privilege types in MySQL-compatible mode.
Syntax
The syntax for creating an external catalog is as follows:
CREATE EXTERNAL CATALOG [IF NOT EXISTS] external_catalog_name
PROPERTIES [=] (properties_type_options);
properties_type_options:
odps_type_list
| hms_type_list
odps_type_list:
TYPE = 'ODPS',
[ACCESSTYPE = 'accesstype_string',]
ACCESSID = 'string',
ACCESSKEY = 'string',
STSTOKEN = 'string',
ENDPOINT = 'string',
TUNNEL_ENDPOINT = 'string',
PROJECT_NAME = 'string',
[QUOTA_NAME = 'string',]
[COMPRESSION = 'compression_string',]
REGION = 'string'
hms_type_list:
TYPE = 'HMS',
URI = "string"
[, KRB5CONF = "string"]
[, KEYTAB = "string"]
[, PRINCIPAL = "string"]
[, MAX_CLIENT_POOL_SIZE = 20]
[, SOCKET_TIMEOUT = 10000000]
accesstype_string:
aliyun
| sts
| app
compression_string:
zlib
| zstd
| lz4
| odps_lz4
Parameters:
IF NOT EXISTS: Optional. If you specify this clause, the system will not return an error when the external data catalog already exists in the current tenant. Instead, it will display a warning message. If you do not specify this clause, an error will be returned.external_catalog_name: The name of the external data catalog to be created.PROPERTIES [=] (properties_type_options): Specifies the properties of the external data catalog.odps_type_list: Specifies the properties of an ODPS-type external data catalog (ODPS catalog).hms_type_list: Specifies the properties of an HMS-type external data catalog (HMS catalog).
For more information about the parameters of the CREATE EXTERNAL CATALOG statement, see CREATE EXTERNAL CATALOG.
Create an external catalog of the ODPS type
When you read or write data from or to a MaxCompute table, set the value to ODPS.
You can run the CREATE EXTERNAL CATALOG statement to create a connection to MaxCompute (ODPS) and directly query and write data in an ODPS table without manually creating a mapping table.
Here is an example:
obclient> CREATE EXTERNAL CATALOG test_odps_catalog
PROPERTIES = (
TYPE = 'ODPS',
ACCESSID = '$odps_accessid',
ACCESSKEY = '$odps_accesskey',
ENDPOINT = '$odps_endpoint',
TUNNEL_ENDPOINT = 'http://xxx.maxcompute.aliyun.com',
PROJECT_NAME = 'mysqltest_regression_sqlqa',
QUOTA_NAME = '',
COMPRESSION_CODE = ''
);
Create an external catalog of the HMS type
OceanBase Database allows you to use the HMS catalog feature to centrally manage Hive metadata (Metastore). With HMS catalog, OceanBase Database can directly access data stored in the HDFS storage path corresponding to a Hive table, enabling efficient data lake queries and analysis.
Notice
Before you create an HMS catalog, if the HMS is expected to access file data stored in HDFS, you must deploy the Java SDK environment. For more information, see Deploy the Java SDK environment for OceanBase Database.
HMS catalog support
The current version of HMS catalog supports two table formats: Hive and Iceberg. It can read statistics from Hive and Iceberg tables and optimize SQL plans.
Hive Format Support
The current version supports the following four formats: Parquet, ORC, Textfile, and OpenCSV.
Note
Complex types are supported only for the ARRAY type in the Parquet format.
Iceberg Support
The current version supports Iceberg V1 and V2.
OceanBase Database supports reading Iceberg tables that have undergone schema evolution and partition transformation.
Hive Version Support
OceanBase Database supports the following Hive versions in HMS catalog:
- Hive 1.2.x
- Hive 2.3.x
- Hive 3.1.x
- Hive 4.x
Authentication Type Support
OceanBase Database supports the following two authentication methods for HMS catalog:
| Authentication Type | Description | Configuration Requirements |
|---|---|---|
| SIMPLE | Simple authentication based on username and password (default). | The PROPERTIES clause does not require the KRB5CONF, KEYTAB, and PRINCIPAL parameters. |
| KERBEROS | Strong authentication based on the Kerberos protocol (suitable for enterprise-level security environments). | The PROPERTIES clause requires the KRB5CONF, KEYTAB, and PRINCIPAL parameters. |
Here are some examples:
Create an HMS catalog with SIMPLE authentication in the default mode without setting the location authentication.
obclient> CREATE EXTERNAL CATALOG test_hms_catalog PROPERTIES = ( TYPE = 'HMS', URI = "thrift://xxx.xxx.xxx.xxx:xxxx" );Create an HMS catalog with Kerberos authentication.
obclient> CREATE EXTERNAL CATALOG test_hms_catalog_kerberos PROPERTIES = ( TYPE = 'HMS', URI = "thrift://xxx.xxx.xxx.xxx:xxxx", PRINCIPAL = "hive/xxx@xxx.COM", KEYTAB = "/xxx/xxx/xxx/hadoop.keytab", KRB5CONF = "/etc/xxx.conf" );
Location authentication
OceanBase Database HMS catalog supports key management through External Location to ensure secure access to external storage. When accessing a table in HMS, the system automatically matches the corresponding External Location based on the table path. If a match is found, it uses the associated key for authentication.
Location authentication scenarios
Based on the storage type and security requirements, location authentication can be categorized into the following scenarios:
| Storage type | Authentication method |
|---|---|
| OSS | AK/SK authentication. |
| HDFS | HDFS supports the following authentication methods:
|
Procedure
Determine the authentication scenario for the location.
Choose the appropriate authentication scenario based on the storage type and security requirements.
Create a location object.
Use the
CREATE LOCATIONstatement to create a location object for the chosen scenario. For more information on creating a location, see CREATE LOCATION.Associate the catalog.
When creating a catalog, no additional configuration is required. The system will automatically match the location based on the storage path of the corresponding table.
Location authentication scenario configuration example
Object Storage Service (OSS) authentication
Create a Location object for accessing Alibaba Cloud OSS.
CREATE LOCATION oss_credential
URL = 'oss://bucket-name/path'
CREDENTIAL = (
ACCESSID = 'your-access-key-id'
ACCESSKEY = 'your-access-key-secret'
HOST = 'oss-region.aliyuncs.com'
);
HDFS authentication
Scenario 1: No authentication.
No Location needs to be created. This is applicable for development or testing environments where the HDFS cluster is not enabled with Kerberos authentication (
hadoop.security.authentication=simple).Scenario 2: User authentication.
No Kerberos is required, but access must be done as a specific HDFS user.
Note
This applies to clusters without Kerberos authentication, but where specific HDFS users are required to access certain paths.
Create a Location: Specify the HDFS user.
CREATE LOCATION hdfs_user01 URL = 'hdfs://namenode:8020/' CREDENTIAL ( USER = 'hdfs_user_name' );Scenario 3: Kerberos enabled, HDFS is in single NameNode mode (non-HA).
Note
The
dfs.data.transfer.protectionparameter inCONFIGSmust match the HDFS cluster configuration.Create a Location: Kerberos authentication + single HDFS.
CREATE LOCATION hdfs_kerberos_single URL = 'hdfs://namenode.example.com:8020/' CREDENTIAL ( PRINCIPAL = "hdfs/xxx@xxx.COM", KEYTAB = "/data/hdfs.keytab", KRB5CONF = "/data/krb5.conf", CONFIGS = 'dfs.data.transfer.protection=integrity' );Scenario 4: Kerberos enabled, HDFS is in high availability (HA) mode.
CREATE LOCATION hdfs_kerberos_ha URL = 'hdfs://${nameservice id}' CREDENTIAL ( PRINCIPAL = "hdfs/xxx@xxx.COM", KEYTAB = "/etc/ob/hdfs.keytab", KRB5CONF = "/etc/krb5.conf", CONFIGS = 'dfs.data.transfer.protection=${string}#dfs.nameservices=${nameservice id}#dfs.ha.namenodes.${nameservice id}=${namenode1}, ${namenode2}#dfs.namenode.rpc-address.${nameservice id}.${namenode1}=${namenode 1 address}#dfs.namenode.rpc-address.${nameservice id}.${namenode2}=${namenode 2 address}#dfs.ha.automatic-failover.enabled.${nameservice id}=true#dfs.client.failover.proxy.provider.${nameservice id}=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider' );Scenario 5: Kerberos is not enabled, but HDFS is in high availability (HA) mode.
Note
If Kerberos is not enabled, but HDFS is in high availability (HA) mode, you do not need to set the
PRINCIPAL,KEYTAB, andKRB5CONFparameters.CREATE LOCATION hdfs_location_ha URL = 'hdfs://${nameservice id}' CREDENTIAL ( CONFIGS = 'dfs.nameservices=${nameservice id}#dfs.ha.namenodes.${nameservice id}=${namenode1}, ${namenode2}#dfs.namenode.rpc-address.${nameservice id}.${namenode1}=${namenode 1 address}#dfs.namenode.rpc-address.${nameservice id}.${namenode2}=${namenode 2 address}#dfs.ha.automatic-failover.enabled.${nameservice id}=true#dfs.client.failover.proxy.provider.${nameservice id}=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider' );