Scenarios
| Scenario | Description |
|---|---|
| ODPS Catalog | Fast data exploration, no need to create tables, and automatic metadata synchronization |
| ODPS External Table + Tunnel API | Public network access, prioritizing compatibility |
| ODPS External Table + Storage API | High-performance analysis within VPC, hybrid HTAP workloads |
Prerequisites
This tutorial assumes that a standard TPC-H lineitem table exists in MaxCompute (ODPS), with the following structure and sample data:
+------------+-----------+-----------+--------------+------------+------------------+------------+-------+--------------+--------------+------------+------------+-------------+--------------------+------------+----------------------------------+
| l_orderkey | l_partkey | l_suppkey | l_linenumber | l_quantity | l_extendedprice | l_discount | l_tax | l_returnflag | l_linestatus | l_shipdate | l_commitdate | l_receiptdate | l_shipinstruct | l_shipmode | l_comment |
+------------+-----------+-----------+--------------+------------+------------------+------------+-------+--------------+--------------+------------+------------+-------------+--------------------+------------+----------------------------------+
| 1 | 1551894 | 76910 | 1 | 17.00 | 33078.94 | 0.04 | 0.02 | N | O | 1996-03-13 | 1996-02-12 | 1996-03-22 | DELIVER IN PERSON | TRUCK | egular courts above the |
| 1 | 673091 | 73092 | 2 | 36.00 | 38306.16 | 0.09 | 0.06 | N | O | 1996-04-12 | 1996-02-28 | 1996-04-20 | TAKE BACK RETURN | MAIL | ly final dependencies: slyly bold|
| 1 | 636998 | 36999 | 3 | 8.00 | 15479.68 | 0.10 | 0.02 | N | O | 1996-01-29 | 1996-03-05 | 1996-01-31 | TAKE BACK RETURN | REG AIR | riously. regular, express dep |
| 1 | 21315 | 46316 | 4 | 28.00 | 34616.68 | 0.09 | 0.06 | N | O | 1996-04-21 | 1996-03-30 | 1996-05-16 | NONE | AIR | lites. fluffily even de |
| 1 | 240267 | 15274 | 5 | 24.00 | 28974.00 | 0.10 | 0.04 | N | O | 1996-03-30 | 1996-03-14 | 1996-04-01 | NONE | FOB | pending foxes. slyly re |
+------------+-----------+-----------+--------------+------------+------------------+------------+-------+--------------+--------------+------------+------------+-------------+--------------------+------------+----------------------------------+
Environment preparation
Before you start, make sure that the following conditions are met:
- OceanBase Database version is ≥ V4.3.5 BP2 (ODPS Catalog is supported).
- JDK 8 or 11 is deployed and Java support is enabled (
ob_enable_java_env = true). - OceanBase Database JAVA SDK is deployed. For more information, see Deploy OceanBase Database JAVA SDK.
Method 1: Use ODPS Catalog for fast data exploration (Recommended)
OceanBase Database supports ODPS Catalog starting from V4.3.5 BP2, allowing direct access to MaxCompute tables without manually creating external tables.
Create an ODPS Catalog
CREATE EXTERNAL CATALOG odps_lineitem_test
PROPERTIES (
TYPE = 'ODPS',
ACCESSID = 'XXXXXXXXXXXX',
ACCESSKEY = 'xxxxxxxxxxxxxx',
ENDPOINT = 'https://service.cn-hangzhou.maxcompute.aliyun.com/api',
TUNNEL_ENDPOINT = '',
PROJECT_NAME = 'bigdata_public_dataset'
);
Parameter description:
- TUNNEL_ENDPOINT can be left empty, as the system will automatically derive it.
- This method defaults to using Tunnel API, suitable for public network environments or when OBServer and ODPS are in different VPCs.
bigdata_public_dataset: For more information, see Public datasets.
Query data for verification
SET CATALOG odps_lineitem_test;
USE odps_lineitem_test.test_db;
SELECT * FROM lineitem LIMIT 5;
Expected result: 5 rows of data consistent with the source table.
Advantages:
- Automatic synchronization of table structure and partition information.
- No need to define column types, avoiding manual mapping errors.
- Suitable for quick data exploration and metadata verification.
Method 2: Create an ODPS external table (for fine-grained control scenarios)
When explicit column mapping, enabling Storage API, or compatibility with older versions is required, create an ODPS external table.
Switch to the OceanBase database
If you have already tried Method 1, you need to switch back to the OceanBase database.
USE internal.test;
Create an external table
CREATE EXTERNAL TABLE external_lineitem (
l_orderkey BIGINT,
l_partkey BIGINT,
l_suppkey BIGINT,
l_linenumber BIGINT,
l_quantity DECIMAL(15,2),
l_extendedprice DECIMAL(15,2),
l_discount DECIMAL(15,2),
l_tax DECIMAL(15,2),
l_returnflag CHAR(1),
l_linestatus CHAR(1),
l_shipdate DATE,
l_commitdate DATE,
l_receiptdate DATE,
l_shipinstruct CHAR(25),
l_shipmode CHAR(10),
l_comment VARCHAR(44)
)
PROPERTIES (
TYPE = 'ODPS',
ACCESSID = 'XXXXXXXXXXXX',
ACCESSKEY = 'xxxxxxxxxxxxxx',
ENDPOINT = 'https://service.cn-hangzhou.maxcompute.aliyun.com/api',
TUNNEL_ENDPOINT = '',
PROJECT_NAME = 'bigdata_public_dataset',
SCHEMA_NAME = 'tpch_10g',
TABLE_NAME = 'lineitem',
QUOTA_NAME = '',
COMPRESSION_CODE = 'zstd'
);
Key configuration description:
COMPRESSION_CODE = 'zstd': Enable compression to improve network transmission efficiency.- By default,
Tunnel APIis used, suitable for general network environments. - Column definitions must match those of the MaxCompute table.
bigdata_public_dataset: For more information, see Public datasets.
Verify and advanced operations
Verify external table reading
SELECT * FROM external_lineitem LIMIT 5;
Expected output: 5 rows of data consistent with the source table.
(Optional) Import data to an internal table for faster analysis
If you need frequent queries or complex aggregations, you can import the data to an internal table in OceanBase Database:
-- Create an internal table
CREATE TABLE ob_lineitem LIKE external_lineitem;
-- Batch import
INSERT INTO ob_lineitem SELECT * FROM external_lineitem;
-- Verify
SELECT COUNT(*) FROM ob_lineitem;
(Optional) Export data back to MaxCompute
Use the INSERT OVERWRITE statement to write data back to MaxCompute:
-- Assume that the target external table lineitem_target has been created
INSERT OVERWRITE lineitem_target
SELECT * FROM ob_lineitem
WHERE l_shipdate >= '1996-01-01';
Limitations:
- The number, order, and types of columns in the target table must be identical to those in the source table.
- Only the INSERT INTO / OVERWRITE ... SELECT syntax is supported.
Additional information: API mode selection guide
OceanBase Database supports two MaxCompute access modes. Choose the appropriate mode based on your deployment environment:
Scenario 1: High-performance access within VPC (Recommended)
Applicable conditions: OceanBase Database is deployed on an Alibaba Cloud ECS instance in the same region as MaxCompute in a VPC.
CREATE EXTERNAL TABLE lineitem1 ( /* Fields are the same as before */ )
PROPERTIES (
TYPE = 'ODPS',
ACCESSID = 'XXXXXXXXXXXX',
ACCESSKEY = 'xxxxxxxxxxxxxx',
ENDPOINT = 'https://service.cn-hangzhou-vpc.maxcompute.aliyun-inc.com/api',
PROJECT_NAME = 'test_obqa',
TABLE_NAME = 'lineitem1',
QUOTA_NAME = 'pay-as-you-go', -- Must be on-demand billing
API_MODE = 'storage_api', -- Enable Storage API
SPLIT = 'byte', -- Shard by byte (recommended for uneven row sizes)
COMPRESSION_CODE = 'zstd'
);
Advantages:
- Supports predicate pushdown and column pruning.
- Low latency and high throughput.
- Automatic parallel sharding.
Scenario 2: Public network or general environment (compatibility priority)
CREATE EXTERNAL TABLE lineitem ( /* Fields are the same as before */ )
PROPERTIES (
TYPE = 'ODPS',
ACCESSID = 'XXXXXXXXXXXX',
ACCESSKEY = 'xxxxxxxxxxxxxx',
ENDPOINT = 'https://service.cn-hangzhou.maxcompute.aliyun.com/api',
PROJECT_NAME = 'bigdata_public_dataset',
SCHEMA_NAME = 'tpch_10g',
TABLE_NAME = 'lineitem',
COMPRESSION_CODE = 'zstd'
-- Tunnel API is used by default
);
Advantages:
- No VPC network configuration is required.
- Compatible with all deployment environments.
- Simple configuration for quick setup.
For more information, see ODPS external tables.