Purpose
This statement is used to create an external catalog in the database, which is used to connect to external data sources and retrieve metadata of external data. You can directly query external data without importing or migrating data.
Limitations and considerations
- Currently, you can create external catalogs of the
ODPSandHMStypes. - Before you create an HMS catalog, if the HMS catalog is expected to access file data stored in HDFS, you must deploy the Java SDK environment. For more information about how to configure the Java SDK environment, see Deploy the OceanBase Database Java SDK environment.
Privilege requirements
To execute the CREATE EXTERNAL CATALOG statement, the current user must have the CREATE CATALOG privilege. For more information about the privileges of OceanBase Database, see Privilege classification in MySQL mode.
Syntax
CREATE EXTERNAL CATALOG [IF NOT EXISTS] external_catalog_name
PROPERTIES [=] (properties_type_options);
properties_type_options:
odps_type_list
| hms_type_list
odps_type_list:
TYPE = 'ODPS',
[ACCESSTYPE = 'accesstype_string',]
ACCESSID = 'string',
ACCESSKEY = 'string',
STSTOKEN = 'string',
ENDPOINT = 'string',
TUNNEL_ENDPOINT = 'string',
PROJECT_NAME = 'string',
[QUOTA_NAME = 'string',]
[COMPRESSION = 'compression_string',]
REGION = 'string'
hms_type_list:
TYPE = 'HMS',
URI = "string"
[, KRB5CONF = "string"]
[, KEYTAB = "string"]
[, PRINCIPAL = "string"]
[, MAX_CLIENT_POOL_SIZE = 20]
[, SOCKET_TIMEOUT = 10000000]
accesstype_string:
aliyun
| sts
| app
compression_string:
zlib
| zstd
| lz4
| odps_lz4
Parameters
| Parameter | Description |
|---|---|
| IF NOT EXISTS | Optional. If this clause is specified, no error is returned even if an external data catalog with the specified name already exists in the current tenant. A warning message is returned instead. If this clause is not specified, an error is returned. |
| external_catalog_name | The name of the external data catalog to be created. |
| PROPERTIES [=] (properties_type_options) | The properties of the external data catalog. For more information, see properties_type_options. |
properties_type_options
odps_type_list: The properties of an ODPS catalog. For more information, see odps_type_list.hms_type_list: The properties of an HMS catalog. For more information, see hms_type_list.
odps_type_list
TYPE: The type of the external data catalog. Valid value:ODPS. This value indicates that the external data catalog reads and writes data from and to MaxCompute.ACCESSTYPE: Optional. The type of the MaxCompute account. Default value:aliyun. Valid values (case-insensitive):aliyunstsapp
ACCESSID: The AccessKey ID of the account. If the value ofACCESSTYPEisaliyunorapp, this parameter specifies the AccessKey ID of thealiyunorappaccount or the RAM user with access permissions to MaxCompute.ACCESSKEY: The AccessKey secret of the account. If the value ofACCESSTYPEisaliyunorapp, this parameter specifies the AccessKey secret of thealiyunorappaccount or the RAM user with access permissions to MaxCompute.STSTOKEN: The token. If the value ofACCESSTYPEissts, this parameter specifies the token for accessing MaxCompute.ENDPOINT: The endpoint of MaxCompute.TUNNEL_ENDPOINT: The endpoint of the Tunnel. MaxCompute uses the Tunnel SDK to obtain data.PROJECT_NAME: The name of the project space in MaxCompute. A project space is the basic organizational unit of MaxCompute, similar to a database or schema in a traditional database.QUOTA_NAME: Optional. The name of the quota. In MaxCompute, a quota represents a resource pool (computing, access, and write resources). If you have configured a quota, you can specify the quota by using this parameter.COMPRESSION: Optional. The compression format of the data source. If this parameter is not specified, compression is not enabled. Valid values (case-insensitive):zlibzstdlz4odps_lz4
REGION: The region where MaxCompute is enabled.
hms_type_list
TYPE: The type of the external data catalog. Valid value:HMS. This value indicates that the external data catalog reads data from Hive-managed data.URI: The Thrift URI of theHMSservice. The format isthrift://$host:$port, where:$host: the IP address of thethriftserver.$port: the port of thethriftserver. The default port ofHMSis 9083.
KRB5CONF: Optional. The path of the Kerberos configuration file required to access theHMSservice that uses Kerberos authentication.Notice
You must set the
KRB5CONFparameter only when Kerberos authentication is enabled forHMS.KEYTAB: Optional. The path of theKEYTABkey file required to access theHMSservice that uses Kerberos authentication. If OceanBase Database is deployed in a distributed manner, the corresponding OBServer node must have this file.Notice
You must set the
KEYTABparameter only when Kerberos authentication is enabled forHMS.PRINCIPAL: Optional. The name of the Kerberos principal required to access theHMSservice that uses Kerberos authentication. The name is usually in theservice/HOST@REGION.comformat. Example:hive/hadoop@QA.COM.Notice
You must set the
PRINCIPALparameter only when Kerberos authentication is enabled forHMS.MAX_CLIENT_POOL_SIZE: Optional. The maximum size of the client queue available for the current catalog. Default value: 20. This value indicates that the current HMS catalog can start up to 20 clients to connect to theHMSservice.SOCKET_TIMEOUT: Optional. The timeout period for client access to the current catalog. Default value: 10000000 (10 seconds).
Examples
Create an external data catalog of the
ODPStype.obclient> CREATE EXTERNAL CATALOG test_odps_catalog PROPERTIES = ( TYPE = 'ODPS', ACCESSID = '$odps_accessid', ACCESSKEY = '$odps_accesskey', ENDPOINT = '$odps_endpoint', TUNNEL_ENDPOINT = 'http://xxx.maxcompute.aliyun.com', PROJECT_NAME = 'mysqltest_regression_sqlqa', QUOTA_NAME = '', COMPRESSION_CODE = '' );Create an external data catalog of the
HMStype.Create an HMS catalog that uses simple authentication.
obclient> CREATE EXTERNAL CATALOG test_hms_catalog PROPERTIES = ( TYPE = 'HMS', URI = "thrift://xxx.xxx.xxx.xxx:xxxx" );Create an HMS catalog that uses Kerberos authentication.
obclient> CREATE EXTERNAL CATALOG test_hms_catalog_kerberos PROPERTIES = ( TYPE = 'HMS', URI = "thrift://xxx.xxx.xxx.xxx:xxxx", PRINCIPAL = "hive/xxx@xxx.COM", KEYTAB = "/xxx/xxx/xxx/hadoop.keytab", KRB5CONF = "/etc/xxx.conf" );
Location authentication
OceanBase Database HMS Catalog supports key management through External Location to ensure secure access to external storage. When accessing tables in HMS, the system automatically matches the corresponding External Location based on the table path. If a match is found, it uses the associated key for authentication.
Location authentication scenarios
Based on storage type and security requirements, location authentication is categorized into the following scenarios:
| Storage type | Authentication method |
|---|---|
| OSS | AK/SK authentication. |
| HDFS | HDFS supports the following authentication methods:
|
Configuration steps
Determine the location authentication scenario.
Select the appropriate authentication scenario based on the storage type and security requirements.
Create a location object.
Use the
CREATE LOCATIONstatement to create a location object for the selected scenario. For more information about creating a location, see CREATE LOCATION.Associate the catalog.
When creating a catalog, no additional configuration is required. The system automatically matches the location based on the storage path of the corresponding table.
Example of configuring location authentication scenarios
OSS storage authentication
Create a location object for accessing Alibaba Cloud OSS object storage.
CREATE LOCATION oss_credential
URL = 'oss://bucket-name/path'
CREDENTIAL = (
ACCESSID = 'your-access-key-id'
ACCESSKEY = 'your-access-key-secret'
HOST = 'oss-region.aliyuncs.com'
);
HDFS storage authentication
Scenario 1: No authentication mode.
No location object needs to be created. This scenario applies to HDFS clusters that do not use Kerberos authentication (i.e.,
hadoop.security.authentication=simple) in development or testing environments.Scenario 2: User authentication mode.
No Kerberos is required, but access must be performed under a specific HDFS user.
Note
This scenario applies to clusters that do not use Kerberos authentication, but specific HDFS paths require access under a particular user.
Create a location object: Specify the HDFS user.
CREATE LOCATION hdfs_user URL = 'hdfs://namenode:8020/' CREDENTIAL ( USER = 'hdfs_user_name' );Scenario 3: Kerberos authentication with HDFS operating in single NameNode (non-HA) mode.
Note
The
dfs.data.transfer.protectionparameter inCONFIGSmust match the configuration of the HDFS cluster.Create a location object: Kerberos authentication + single NameNode HDFS.
CREATE LOCATION hdfs_kerberos_single URL = 'hdfs://namenode.example.com:8020/' CREDENTIAL ( PRINCIPAL = "hdfs/xxx@xxx.COM", KEYTAB = "/data/hdfs.keytab", KRB5CONF = "/data/krb5.conf", CONFIGS = 'dfs.data.transfer.protection=integrity' );Scenario 4: Kerberos authentication with HDFS operating in high availability (HA) mode.
CREATE LOCATION hdfs_kerberos_ha URL = 'hdfs://${nameservice id}' CREDENTIAL ( PRINCIPAL = "hdfs/xxx@xxx.COM", KEYTAB = "/etc/ob/hdfs.keytab", KRB5CONF = "/etc/krb5.conf", CONFIGS = 'dfs.data.transfer.protection=${string}#dfs.nameservices=${nameservice id}#dfs.ha.namenodes.${nameservice id}=${namenode1}, ${namenode2}#dfs.namenode.rpc-address.${nameservice id}.${namenode1}=${namenode 1 address}#dfs.namenode.rpc-address.${nameservice id}.${namenode2}=${namenode 2 address}#dfs.ha.automatic-failover.enabled.${nameservice id}=true#dfs.client.failover.proxy.provider.${nameservice id}=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider' );Scenario 5: No Kerberos authentication, but HDFS operating in high availability (HA) mode.
Note
No Kerberos authentication is used, but HDFS operates in high availability (HA) mode. Therefore, you do not need to set the
PRINCIPAL,KEYTAB, andKRB5CONFparameters.CREATE LOCATION hdfs_location_ha URL = 'hdfs://${nameservice id}' CREDENTIAL ( CONFIGS = 'dfs.nameservices=${nameservice id}#dfs.ha.namenodes.${nameservice id}=${namenode1}, ${namenode2}#dfs.namenode.rpc-address.${nameservice id}.${namenode1}=${namenode 1 address}#dfs.namenode.rpc-address.${nameservice id}.${namenode2}=${namenode 2 address}#dfs.ha.automatic-failover.enabled.${nameservice id}=true#dfs.client.failover.proxy.provider.${nameservice id}=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider' );
