MaxCompute (ODPS) provides two data access interfaces:
- Storage API: a data service interface that provides efficient, low-latency, and secure data reading capabilities.
- Tunnel API: a data upload/download interface that is mainly used for batch operations on table data (for example, full table import and export).
Since OceanBase Database V4.4.0 (MySQL mode), you can access MaxCompute tables by using external tables by adapting the ODPS API.
By creating an external table in OceanBase Database, you can transparently read and write data from an Alibaba Cloud MaxCompute (formerly ODPS) table without migrating the data. This enables cross-system joint analysis, real-time reporting, and data reflow.
When you create an external table in OceanBase Database, you can configure parameters for the Storage API and Tunnel API. For more information, see CREATE EXTERNAL TABLE. The following table describes the recommended use cases for the two APIs.
Dimension |
Storage API |
Tunnel API |
Use case |
|---|---|---|---|
| Feature | Supports fine-grained data access (for example, partition filtering and predicate pushing down). | Focuses on efficient full-table import and export without conditional filtering. | HTAP hybrid workloads, conditional queries on partitioned tables, and deep integration with computing engines (such as Spark). |
| Sharding strategy | Automatic sharding: dynamically splits tasks by byte or row count to improve parallel efficiency. | Manual sharding: requires developers to calculate partition size or row count, which is complex. Performance is lower than that of the Storage API, and no special resource configuration is required for ODPS. It is compatible with all ODPS configurations. |
|
| Performance optimization | Low resource consumption: predicate pushing down reduces data transmission, and computation is pushed down to the database side, resulting in faster query speed. | High resource consumption: full transmission may occupy a large amount of bandwidth and storage. | Select the Storage API when you need to reduce data transmission and improve HTAP efficiency. Select the Tunnel API for simple ETL or full backup. |
| Data filtering capability | Supported: data can be filtered by SQL conditions (such as WHERE), and only the required data is transmitted. | Not supported: data must be fully transmitted and then filtered locally. | Select the Storage API when you need to filter data by conditions (such as analyzing specific user behavior). |
Overview
Feature |
Description |
|---|---|
| Read-only query | SELECT * FROM odps_external_table |
| Partition support | Supports both non-partitioned and partitioned tables (including dynamic partitioning) |
| Column mapping | Automatically or explicitly binds MaxCompute columns |
| Automatic partition refresh | Supports IMMEDIATE / INTERVAL / OFF synchronization strategies |
| Limitations |
|
Create an ODPS external table
Non-partitioned tables
Basic syntax (automatic column mapping)
-- step1: MaxCompute has a table named t1.
-- CREATE TABLE IF NOT EXISTS odps_project.t1 (c1 INT, c2 INT);
-- step2: Create an external table named t1 in OceanBase Database.
CREATE EXTERNAL TABLE t1 (c1 INT, c2 INT)
PROPERTIES = (
TYPE = 'ODPS',
ACCESSID = '*****',
ACCESSKEY = '*****',
ENDPOINT = 'http://service.cn-hangzhou.maxcompute.aliyun.com/api',
PROJECT_NAME = 'odps_project',
SCHEMA_NAME = '',
TABLE_NAME = 't1',
QUOTA_NAME = '',
COMPRESSION_CODE = ''
API_MODE = {"tunnel_api"},
);
Note
If no generated columns are specified, OceanBase Database automatically maps them to external$tablecol1, external$tablecol2, and so on, in the order defined.
Explicit column mapping (recommended)
-- step1: MaxCompute has a table named t1.
-- CREATE TABLE IF NOT EXISTS odps_project.t1 (c1 INT, c2 INT);
-- step2: Create an external table named t1 in OceanBase Database.
CREATE EXTERNAL TABLE t1 (
c1 INT AS (external$tablecol1),
c2 INT AS (external$tablecol2)
)
PROPERTIES = (
TYPE = 'ODPS',
ACCESSID = '*****',
ACCESSKEY = '*****',
ENDPOINT = 'http://service.cn-hangzhou.maxcompute.aliyun.com/api',
PROJECT_NAME = 'odps_project',
SCHEMA_NAME = '',
TABLE_NAME = 't1',
QUOTA_NAME = '',
COMPRESSION_CODE = 'lz4'
API_MODE = {"tunnel_api"},
);
Note
- Use
AS (external$tablecolx)to specify a column to be mapped to the xth column of the t1 table in ODPS. The xth column must be a regular column and cannot be a partitioning column. The starting number is 1.
- If no generated columns are specified, OceanBase Database generates them by default. The numbers of the generated columns are incremented from 1 in the order defined.
Partitioned tables
Key rules:
- The partitioning column must be explicitly declared using
metadata$partition_list_colX. - The
PARTITION BYclause must match the partitioning structure of the MaxCompute table.
-- step1: MaxCompute has a partitioned table t2 with c1 and c2 as regular columns, and c3 and c4 as partitioning columns.
-- CREATE TABLE IF NOT EXISTS odps_project.t2 (c1 INT, c2 INT) PARTITIONED BY (c3 VARCHAR(20), c4 VARCHAR(20));
-- step2: Create an external partitioned table t2 in OceanBase Database.
CREATE EXTERNAL TABLE t2 (
c1 INT,
c2 INT,
c3 VARCHAR(20) AS (metadata$partition_list_col1),
c4 VARCHAR(20) AS (metadata$partition_list_col2)
)
PROPERTIES = (
TYPE = 'ODPS',
ACCESSID = '*****',
ACCESSKEY = '*****',
ENDPOINT = 'http://service.cn-hangzhou.maxcompute.aliyun.com/api',
PROJECT_NAME = 'odps_project',
SCHEMA_NAME = '',
TABLE_NAME = 't2',
QUOTA_NAME = '',
COMPRESSION_CODE = ''
API_MODE = {"tunnel_api"},
)
PARTITION BY (c3, c4);
Note
- The partitioning column
AS (metadata$partition_list_colx)specifies the x-th partitioning column of the t2 table in ODPS, with the starting number being 1.
- The partitioning column
AS (metadata$partition_list_colx)cannot be omitted. Otherwise, it will be treated as a regular column, leading to unexpected errors.
- The number of partitioning columns in the MaxCompute table must match the number of partitioning columns in the external table in OceanBase Database.
Error example:
If you omit AS (metadata$partition_list_colX), the partitioning column will be treated as a regular column, causing the query to fail.
-- step1: MaxCompute has a partitioned table t2 with c1 and c2 as regular columns, and c3 and c4 as partitioning columns.
-- CREATE TABLE IF NOT EXISTS odps_project.t2 (c1 INT, c2 INT) PARTITIONED BY (c3 VARCHAR(20), c4 VARCHAR(20));
-- step2: Incorrect approach: Partitioning columns are not mapped using metadata$partition_list_colX
CREATE EXTERNAL TABLE t2 (
c1 INT,
c2 INT,
c3 VARCHAR(20), -- ⚠️ Missing AS (metadata$partition_list_col1)
c4 VARCHAR(20) -- ⚠️ Missing AS (metadata$partition_list_col2)
)
PROPERTIES = (
TYPE = 'ODPS',
ACCESSID = '*****',
ACCESSKEY = '*****',
ENDPOINT = 'http://service.cn-hangzhou.maxcompute.aliyun.com/api',
PROJECT_NAME = 'odps_project',
SCHEMA_NAME = '',
TABLE_NAME = 't2',
QUOTA_NAME = '',
COMPRESSION_CODE = ''
API_MODE = {"tunnel_api"},
)
PARTITION BY (c3, c4); -- Although partitioning is declared, no partitioning metadata mapping is provided.
Since c3 and c4 are not mapped to actual partitioning values in MaxCompute, OceanBase Database cannot obtain partitioning information. Therefore, partitioning columns must be explicitly mapped using metadata$partition_list_colX.
Note
Generation columns of regular columns can be explicitly specified or not, just like in non-partitioned external tables.
Key parameters (PROPERTIES)
Parameter |
Required |
Description |
|---|---|---|
| TYPE | Yes | Fixed to 'ODPS' |
| ACCESSID / ACCESSKEY | Yes | The AccessKey of the RAM user (it is recommended to use the least privilege principle). |
| ENDPOINT | Yes | The endpoint of MaxCompute, which contains the region. |
| PROJECT_NAME | Yes | The name of the MaxCompute project. |
| TABLE_NAME | Yes | The name of the table. |
| SCHEMA_NAME | No | The name of the schema. This parameter is required if the table is located in a schema. |
| ACCESSTYPE | No | The account type. Valid values: aliyun (default), sts, and app. |
| STSTOKEN | No | Required only when ACCESSTYPE = 'sts'. |
| QUOTA_NAME | No | The name of the computing resource quota. |
| COMPRESSION_CODE | No | The compression format. Valid values: zlib, zstd, lz4, and odps_lz4. |
Query data from MaxCompute
You can query data from a MaxCompute table by using an external table in the same way as querying data from a regular table.
-- step1: MaxCompute has a table named t1.
-- CREATE TABLE IF NOT EXISTS odps_project.t1 (c1 INT, c2 INT);
-- step2: Create an external table named t1 in OceanBase Database.
CREATE EXTERNAL TABLE t1 (c1 INT, c2 INT)
PROPERTIES = (
TYPE = 'ODPS',
ACCESSID = '*****',
ACCESSKEY = '*****',
ENDPOINT = 'http://service.cn-hangzhou.maxcompute.aliyun.com/api',
PROJECT_NAME = 'odps_project',
SCHEMA_NAME = '',
TABLE_NAME = 't1',
QUOTA_NAME = '',
COMPRESSION_CODE = ''
API_MODE = {"tunnel_api"},
);
-- step3: Query data from the external table t1.
SELECT * FROM t1;
-- To speed up the query, set the parallelism to N.
SELECT /*+ PARALLEL(N) */ * FROM t1;
Export data to MaxCompute
OceanBase Database supports exporting data from internal tables to external tables of MaxCompute (ODPS) by using standard SQL statements.
-- Full update.
INSERT OVERWRITE external_table_name
SELECT column_list FROM source_table [WHERE ...];
-- Incremental import.
INSERT INTO external_table_name
SELECT column_list FROM source_table [WHERE ...];
external_table_name: the name of an external table in MaxCompute.source_table: the name of an internal table in OceanBase Database or another queryable object.
For more information, see Insert data in MySQL mode and Insert data in Oracle mode.
Export data from a non-partitioned table
-- step1: Create a table t1 in MaxCompute.
-- CREATE TABLE IF NOT EXISTS odps_project.t1 (c1 INT, c2 INT);
-- step2: Create an external table t1 in OceanBase Database.
CREATE EXTERNAL TABLE t1 (c1 INT, c2 INT)
PROPERTIES = (
TYPE = 'ODPS',
ACCESSID = '*****',
ACCESSKEY = '*****',
ENDPOINT = 'http://service.cn-hangzhou.maxcompute.aliyun.com/api',
PROJECT_NAME = 'odps_project',
SCHEMA_NAME = '',
TABLE_NAME = 't1',
QUOTA_NAME = '',
COMPRESSION_CODE = ''
API_MODE = {"tunnel_api"},
);
-- step3: Create a regular table t1_ in OceanBase Database.
CREATE TABLE t1_ (c1 INT, c2 INT);
-- step4: Export data from the regular table to the external table.
INSERT INTO t1 SELECT * FROM t1_;
-- Increase the export speed by setting the parallelism to N.
INSERT /*+ PARALLEL(N) */ INTO t1 SELECT * FROM t1_;
-- Specify OVERWRITE to overwrite the data in the external table. If the t1_ table is empty, the t1 table will also be overwritten to be empty.
INSERT OVERWRITE t1 SELECT * FROM t1_;
-- Specify the columns to be inserted.
INSERT INTO t1 (c1) SELECT c1 FROM t1_; -- Insert NULL into the c2 column.
Export data from a partitioned table
-- step1: Create a partitioned table t2 in MaxCompute. c1 and c2 are regular columns, and c3 and c4 are partitioning columns.
-- CREATE TABLE IF NOT EXISTS odps_project.t2 (c1 INT, c2 INT) PARTITIONED BY (c3 VARCHAR(20), c4 VARCHAR(20));
-- step2: Create a partitioned external table t2 in OceanBase Database.
CREATE EXTERNAL TABLE t2 (
c1 INT AS (metadata$odpscol1),
c2 INT AS (metadata$odpscol2),
c3 VARCHAR(20) AS (metadata$partition_list_col1),
c4 VARCHAR(20) AS (metadata$partition_list_col2)
)
PROPERTIES = (
TYPE = 'ODPS',
ACCESSID = '*****',
ACCESSKEY = '*****',
ENDPOINT = 'http://service.cn-hangzhou.maxcompute.aliyun.com/api',
PROJECT_NAME = 'odps_project',
SCHEMA_NAME = '',
TABLE_NAME = 't2',
QUOTA_NAME = '',
COMPRESSION_CODE = ''
API_MODE = {"tunnel_api"},
)
PARTITION BY (c3, c4);
-- step3: Create a regular table t2_ in OceanBase Database.
CREATE TABLE t2_ (c1 INT, c2 INT);
-- step4: Export data to a specified partition.
-- Note: Starting from V4.3.5 BP2, OceanBase Database automatically creates partitions in MaxCompute that do not exist in the MaxCompute table. You do not need to manually create the partitions in advance. Earlier versions of OceanBase Database do not support this feature. You must manually create the target partitions in the MaxCompute table.
INSERT INTO t2 PARTITION (c3 = 'abc', c4 = 'def') SELECT * FROM t2_;
INSERT OVERWRITE t2 PARTITION (c3 = 'abc', c4 = 'def') SELECT * FROM t2_;
Partition metadata refresh strategy
Use the AUTO_REFRESH option to specify the refresh method for MaxCompute partition metadata: AUTO_REFRESH = { IMMEDIATE | OFF | INTERVAL }
Strategy |
Description |
Applicable scenarios |
|---|---|---|
IMMEDIATE |
Automatically refreshes partition metadata on each query. | Partitions change frequently, such as hourly. |
OFF |
Manually refresh partition metadata by using the ALTER EXTERNAL TABLE <table_name> REFRESH; statement. |
Static partitioned tables. |
INTERVAL |
Specifies the interval in seconds for automatic refreshes by using the DBMS_EXTERNAL_TABLE.AUTO_REFRESH_EXTERNAL_TABLE(x) function. |
Partitions change at a moderate frequency. |
Example of enabling immediate refresh
CREATE EXTERNAL TABLE t2 (
c1 INT AS (metadata$odpscol1),
c2 INT AS (metadata$odpscol2),
c3 VARCHAR(20) AS (metadata$partition_list_col1),
c4 VARCHAR(20) AS (metadata$partition_list_col2)
)
AUTO_REFRESH = IMMEDIATE
PROPERTIES = (
TYPE = 'ODPS',
ACCESSID = '*****',
ACCESSKEY = '*****',
ENDPOINT = 'http://service.cn-hangzhou.maxcompute.aliyun.com/api',
PROJECT_NAME = 'odps_project',
SCHEMA_NAME = '',
TABLE_NAME = 't2',
QUOTA_NAME = '',
COMPRESSION_CODE = ''
API_MODE = {"tunnel_api"},
)
PARTITION BY (c3, c4);
Manual refresh command (used when AUTO_REFRESH = OFF)
ALTER EXTERNAL TABLE t2 REFRESH;
Data types and time zones
Type mapping
- For more information, see the data type mapping between OceanBase Database and MaxCompute.
Time zone handling
- MaxCompute does not support time zones for time types such as
DATETIME. - By default, OceanBase Database assumes that the time zone of MaxCompute is the same as its own.
- Recommendation: Make sure that the time zones of OceanBase Database and MaxCompute are the same, such as
Asia/Shanghai.
