MaxCompute (ODPS) provides two data access interfaces:
- Storage API: A data service interface that provides efficient, low-latency, and secure data reading capabilities.
- Tunnel API: A data upload/download interface, primarily used for batch operations on table data (such as full table import and export).
Starting from OceanBase Database V4.4.0 (MySQL-compatible mode) and later, OceanBase Database supports accessing MaxCompute tables through external tables by adapting to the ODPS API.
By creating an external table of MaxCompute, you can transparently read and write data from an Alibaba Cloud MaxCompute (formerly ODPS) table without the need for data migration. This enables cross-system joint analysis, real-time reporting, and data reflow.
When creating an external table of MaxCompute, OceanBase Database provides parameter configuration options for both the Storage API and Tunnel API. For more information, see CREATE EXTERNAL TABLE. The following table describes the recommended use cases for each API:
| Dimension | Storage API | Tunnel API | Use case |
|---|---|---|---|
| Feature | Supports fine-grained data access (such as partition filtering and predicate pushdown). | Focuses on efficient full-table import/export without conditional filtering. | HTAP hybrid workloads, partitioned table conditional queries, and deep integration with computing engines (such as Spark). |
| Sharding strategy | Automatic sharding: Dynamically splits tasks by byte or row count to improve parallel efficiency. | Manual sharding: Requires developers to calculate partition size or row count, which is complex. Performance is lower than the Storage API, and no special ODPS resource configuration is required. |
|
| Performance optimization | Low resource consumption: Predicate pushdown reduces data transmission volume, and computation is pushed to the database side for faster query speed. | High resource consumption: Full data transmission may consume significant bandwidth and storage. | Select the Storage API to reduce data transmission volume and improve HTAP efficiency; select the Tunnel API for simple ETL or full backup. |
| Data filtering capability | Supported: Data can be filtered using SQL conditions (such as WHERE) to transmit only the required portion. | Not supported: Full data transmission is required, followed by local filtering. | Select the Storage API when you need to filter data based on specific conditions (such as analyzing user behavior). |
Overview
| Feature | Description |
|---|---|
| Read-only queries | SELECT * FROM odps_external_table |
| Partition support | Supports non-partitioned and partitioned tables (including dynamic partitioning) |
| Column mapping | Automatically or explicitly maps MaxCompute columns |
| Automatic partition refresh | Supports IMMEDIATE, INTERVAL, and OFF synchronization strategies |
| Limitations |
|
Create an ODPS external table
Non-partitioned table
Basic syntax (automatic column mapping)
-- step1: MaxCompute has a table t1
-- CREATE TABLE IF NOT EXISTS odps_project.t1 (c1 INT, c2 INT);
-- step2: OB creates an external table t1
CREATE EXTERNAL TABLE t1 (c1 INT, c2 INT)
PROPERTIES = (
TYPE = 'ODPS',
ACCESSID = '*****',
ACCESSKEY = '*****',
ENDPOINT = 'http://service.cn-hangzhou.maxcompute.aliyun.com/api',
PROJECT_NAME = 'odps_project',
SCHEMA_NAME = '',
TABLE_NAME = 't1',
QUOTA_NAME = '',
COMPRESSION_CODE = ''
API_MODE = {"tunnel_api"},
);
Note
If you do not specify generated columns, OceanBase Database automatically maps the columns in the order they are defined to external$tablecol1, external$tablecol2, and so on.
Explicit column mapping (Recommended)
-- step1: MaxCompute has a table t1
-- CREATE TABLE IF NOT EXISTS odps_project.t1 (c1 INT, c2 INT);
-- step2: OB creates an external table t1
CREATE EXTERNAL TABLE t1 (
c1 INT AS (external$tablecol1),
c2 INT AS (external$tablecol2)
)
PROPERTIES = (
TYPE = 'ODPS',
ACCESSID = '*****',
ACCESSKEY = '*****',
ENDPOINT = 'http://service.cn-hangzhou.maxcompute.aliyun.com/api',
PROJECT_NAME = 'odps_project',
SCHEMA_NAME = '',
TABLE_NAME = 't1',
QUOTA_NAME = '',
COMPRESSION_CODE = 'lz4'
API_MODE = {"tunnel_api"},
);
Note
- Use
AS (external$tablecolx)for a regular column to specify the x-th column in the t1 table of ODPS to which the column should be mapped. The x-th column is a regular column and cannot be a partition column. The starting number is 1.
- If you do not specify generated columns as in Example 1, OceanBase Database generates default columns. The numbers of the columns start from 1 and increase in the order of column definitions.
Partitioned table
Key rules:
- The partitioning columns must be explicitly declared using
metadata$partition_list_colX. - The
PARTITION BYclause must align with the partitioning structure of the MaxCompute table.
-- step1: Create a partitioned table t2 in MaxCompute, where c1 and c2 are regular columns, and c3 and c4 are partitioning columns.
-- CREATE TABLE IF NOT EXISTS odps_project.t2 (c1 INT, c2 INT) PARTITIONED BY (c3 VARCHAR(20), c4 VARCHAR(20));
-- step2: Create an external partitioned table t2 in OceanBase Database.
CREATE EXTERNAL TABLE t2 (
c1 INT,
c2 INT,
c3 VARCHAR(20) AS (metadata$partition_list_col1),
c4 VARCHAR(20) AS (metadata$partition_list_col2)
)
PROPERTIES = (
TYPE = 'ODPS',
ACCESSID = '*****',
ACCESSKEY = '*****',
ENDPOINT = 'http://service.cn-hangzhou.maxcompute.aliyun.com/api',
PROJECT_NAME = 'odps_project',
SCHEMA_NAME = '',
TABLE_NAME = 't2',
QUOTA_NAME = '',
COMPRESSION_CODE = ''
API_MODE = {"tunnel_api"},
)
PARTITION BY (c3, c4);
Note
- The partitioning column
AS (metadata$partition_list_colx)specifies the x-th partitioning column of the ODPS table t2, where x starts from 1.
- The partitioning column
AS (metadata$partition_list_colx)cannot be omitted. Otherwise, it will be treated as a regular column and cause unexpected errors.
- The number of partitioning columns in the MaxCompute table must match the number of partitioning columns in the external table.
Error example:
If you omit AS (metadata$partition_list_colX), the partitioning columns will be treated as regular columns, causing the query to fail.
-- step1: Create a partitioned table t2 in MaxCompute, where c1 and c2 are regular columns, and c3 and c4 are partitioning columns.
-- CREATE TABLE IF NOT EXISTS odps_project.t2 (c1 INT, c2 INT) PARTITIONED BY (c3 VARCHAR(20), c4 VARCHAR(20));
-- step2: Incorrect syntax: Partitioning columns are not mapped using metadata$partition_list_colX.
CREATE EXTERNAL TABLE t2 (
c1 INT,
c2 INT,
c3 VARCHAR(20), -- ⚠️ Missing AS (metadata$partition_list_col1)
c4 VARCHAR(20) -- ⚠️ Missing AS (metadata$partition_list_col2)
)
PROPERTIES = (
TYPE = 'ODPS',
ACCESSID = '*****',
ACCESSKEY = '*****',
ENDPOINT = 'http://service.cn-hangzhou.maxcompute.aliyun.com/api',
PROJECT_NAME = 'odps_project',
SCHEMA_NAME = '',
TABLE_NAME = 't2',
QUOTA_NAME = '',
COMPRESSION_CODE = ''
API_MODE = {"tunnel_api"},
)
PARTITION BY (c3, c4); -- Although partitioning is declared, no partitioning metadata is mapped.
Since c3 and c4 are not bound to actual partition values in MaxCompute, OceanBase Database cannot obtain partitioning information. Therefore, partitioning columns must be explicitly mapped using metadata$partition_list_colX.
Note
For regular columns, you can explicitly specify or omit the generation column, just like in non-partitioned external tables.
Key parameters (PROPERTIES)
| Parameter | Required | Description |
|---|---|---|
| TYPE | Yes | Fixed to 'ODPS' |
| ACCESSID / ACCESSKEY | Yes | RAM user AccessKey (recommended to use least privilege) |
| ENDPOINT | Yes | MaxCompute service endpoint (including region) |
| PROJECT_NAME | Yes | MaxCompute project name |
| TABLE_NAME | Yes | Table name |
| SCHEMA_NAME | No | Required if the table is in a schema |
| ACCESSTYPE | No | Account type: aliyun (default) / sts / app |
| STSTOKEN | Conditional | Required only when ACCESSTYPE = 'sts' |
| QUOTA_NAME | No | Specifies the compute resource quota |
| COMPRESSION_CODE | No | Compression format: zlib / zstd / lz4 / odps_lz4 |
Query data from MaxCompute
You can query data from a MaxCompute table by using an external table in the same way as you query data from a regular table.
-- step1: MaxCompute has a table named t1.
-- CREATE TABLE IF NOT EXISTS odps_project.t1 (c1 INT, c2 INT);
-- step2: Create an external table named t1 in OceanBase Database.
CREATE EXTERNAL TABLE t1 (c1 INT, c2 INT)
PROPERTIES = (
TYPE = 'ODPS',
ACCESSID = '*****',
ACCESSKEY = '*****',
ENDPOINT = 'http://service.cn-hangzhou.maxcompute.aliyun.com/api',
PROJECT_NAME = 'odps_project',
SCHEMA_NAME = '',
TABLE_NAME = 't1',
QUOTA_NAME = '',
COMPRESSION_CODE = ''
API_MODE = {"tunnel_api"},
);
-- step3: Query the data in the external table t1, which corresponds to the MaxCompute table t1.
SELECT * FROM t1;
-- Increase the read speed by using a parallelism of N.
SELECT /*+ PARALLEL(N) */ * FROM t1;
Export data to MaxCompute
OceanBase Database allows you to write data from internal tables to external tables in MaxCompute (ODPS) by using standard SQL statements. This way, you can export data.
-- Full update.
INSERT OVERWRITE external_table_name
SELECT column_list FROM source_table [WHERE ...];
-- Incremental import.
INSERT INTO external_table_name
SELECT column_list FROM source_table [WHERE ...];
external_table_name: the name of an external table in MaxCompute.source_table: the name of an internal table in OceanBase Database or another queryable object.
For more information, see Insert data in MySQL-compatible mode and Insert data in Oracle-compatible mode.
Export data from a non-partitioned table
-- step1: A table named t1 exists in MaxCompute.
-- CREATE TABLE IF NOT EXISTS odps_project.t1 (c1 INT, c2 INT);
-- step2: Create an external table named t1 in OceanBase Database.
CREATE EXTERNAL TABLE t1 (c1 INT, c2 INT)
PROPERTIES = (
TYPE = 'ODPS',
ACCESSID = '*****',
ACCESSKEY = '*****',
ENDPOINT = 'http://service.cn-hangzhou.maxcompute.aliyun.com/api',
PROJECT_NAME = 'odps_project',
SCHEMA_NAME = '',
TABLE_NAME = 't1',
QUOTA_NAME = '',
COMPRESSION_CODE = ''
API_MODE = {"tunnel_api"},
);
-- step3: Create a regular table named t1_ in OceanBase Database.
CREATE TABLE t1_ (c1 INT, c2 INT);
-- step4: Export data from the regular table to the external table.
INSERT INTO t1 SELECT * FROM t1_;
-- Increase the export speed by setting the parallelism to N.
INSERT /*+ PARALLEL(N) */ INTO t1 SELECT * FROM t1_;
-- Specify OVERWRITE to overwrite the data in the external table. If the t1_ table is empty, the t1 table will also be overwritten.
INSERT OVERWRITE t1 SELECT * FROM t1_;
-- Specify the columns to be inserted.
INSERT INTO t1 (c1) SELECT c1 FROM t1_; -- The c2 column is inserted with NULL values.
Export data from a partitioned table
-- step1: A partitioned table named t2 exists in MaxCompute. c1 and c2 are ordinary columns, and c3 and c4 are partitioning columns.
-- CREATE TABLE IF NOT EXISTS odps_project.t2 (c1 INT, c2 INT) PARTITIONED BY (c3 VARCHAR(20), c4 VARCHAR(20));
-- step2: Create a partitioned external table named t2 in OceanBase Database.
CREATE EXTERNAL TABLE t2 (
c1 INT AS (metadata$odpscol1),
c2 INT AS (metadata$odpscol2),
c3 VARCHAR(20) AS (metadata$partition_list_col1),
c4 VARCHAR(20) AS (metadata$partition_list_col2)
)
PROPERTIES = (
TYPE = 'ODPS',
ACCESSID = '*****',
ACCESSKEY = '*****',
ENDPOINT = 'http://service.cn-hangzhou.maxcompute.aliyun.com/api',
PROJECT_NAME = 'odps_project',
SCHEMA_NAME = '',
TABLE_NAME = 't2',
QUOTA_NAME = '',
COMPRESSION_CODE = ''
API_MODE = {"tunnel_api"},
)
PARTITION BY (c3, c4);
-- step3: Create a regular table named t2_ in OceanBase Database.
CREATE TABLE t2_ (c1 INT, c2 INT);
-- step4: Export data to a specified partition.
-- Note: V4.3.5 BP2 and later versions support automatic creation of partitions that do not exist in MaxCompute tables. You do not need to manually create these partitions in advance. Earlier versions of OceanBase Database do not support this feature, so you must explicitly create the target partitions in MaxCompute tables before exporting data.
INSERT INTO t2 PARTITION (c3 = 'abc', c4 = 'def') SELECT * FROM t2_;
INSERT OVERWRITE t2 PARTITION (c3 = 'abc', c4 = 'def') SELECT * FROM t2_;
Partition metadata refresh strategy
You can use the AUTO_REFRESH option to specify the refresh mode of MaxCompute partition metadata: AUTO_REFRESH = { IMMEDIATE | OFF | INTERVAL }
| Strategy | Description | Applicable scenarios |
|---|---|---|
IMMEDIATE |
Automatically refreshes the partition metadata when a query is executed. | Partitions change frequently, for example, hourly. |
OFF |
Manually refreshes the partition metadata by using the ALTER EXTERNAL TABLE <table_name> REFRESH; statement. |
Static partition tables. |
INTERVAL |
Specifies a refresh interval in seconds by using the DBMS_EXTERNAL_TABLE.AUTO_REFRESH_EXTERNAL_TABLE(x) function. |
Partitions change at a moderate frequency. |
Example of enabling immediate refresh
CREATE EXTERNAL TABLE t2 (
c1 INT AS (metadata$odpscol1),
c2 INT AS (metadata$odpscol2),
c3 VARCHAR(20) AS (metadata$partition_list_col1),
c4 VARCHAR(20) AS (metadata$partition_list_col2)
)
AUTO_REFRESH = IMMEDIATE
PROPERTIES = (
TYPE = 'ODPS',
ACCESSID = '*****',
ACCESSKEY = '*****',
ENDPOINT = 'http://service.cn-hangzhou.maxcompute.aliyun.com/api',
PROJECT_NAME = 'odps_project',
SCHEMA_NAME = '',
TABLE_NAME = 't2',
QUOTA_NAME = '',
COMPRESSION_CODE = ''
API_MODE = {"tunnel_api"},
)
PARTITION BY (c3, c4);
Manual refresh statement (used when AUTO_REFRESH = OFF)
ALTER EXTERNAL TABLE t2 REFRESH;
Data types and time zones
Type mapping
- For more information, see the data type mapping between OceanBase Database and MaxCompute.
Time zone handling
- MaxCompute does not support time zones for time types such as
DATETIME. - By default, OceanBase Database treats the time as being in the same time zone as its own.
- Recommendation: Ensure that OceanBase Database and MaxCompute use the same time zone, such as
Asia/Shanghai.