Scenarios
Scenario |
Description |
|---|---|
| ODPS Catalog | Quick exploration, no need to create tables, and automatic metadata synchronization |
| ODPS External Table + Tunnel API | Public network access, prioritizing compatibility |
| ODPS External Table + Storage API | High-performance analysis within VPC, and hybrid HTAP workloads |
Prerequisites
This tutorial assumes that a standard TPC-H lineitem table exists in MaxCompute (ODPS), with the following structure and sample data:
+------------+-----------+-----------+--------------+------------+------------------+------------+-------+--------------+--------------+------------+------------+-------------+--------------------+------------+----------------------------------+
| l_orderkey | l_partkey | l_suppkey | l_linenumber | l_quantity | l_extendedprice | l_discount | l_tax | l_returnflag | l_linestatus | l_shipdate | l_commitdate | l_receiptdate | l_shipinstruct | l_shipmode | l_comment |
+------------+-----------+-----------+--------------+------------+------------------+------------+-------+--------------+--------------+------------+------------+-------------+--------------------+------------+----------------------------------+
| 1 | 1551894 | 76910 | 1 | 17.00 | 33078.94 | 0.04 | 0.02 | N | O | 1996-03-13 | 1996-02-12 | 1996-03-22 | DELIVER IN PERSON | TRUCK | egular courts above the |
| 1 | 673091 | 73092 | 2 | 36.00 | 38306.16 | 0.09 | 0.06 | N | O | 1996-04-12 | 1996-02-28 | 1996-04-20 | TAKE BACK RETURN | MAIL | ly final dependencies: slyly bold|
| 1 | 636998 | 36999 | 3 | 8.00 | 15479.68 | 0.10 | 0.02 | N | O | 1996-01-29 | 1996-03-05 | 1996-01-31 | TAKE BACK RETURN | REG AIR | riously. regular, express dep |
| 1 | 21315 | 46316 | 4 | 28.00 | 34616.68 | 0.09 | 0.06 | N | O | 1996-04-21 | 1996-03-30 | 1996-05-16 | NONE | AIR | lites. fluffily even de |
| 1 | 240267 | 15274 | 5 | 24.00 | 28974.00 | 0.10 | 0.04 | N | O | 1996-03-30 | 1996-03-14 | 1996-04-01 | NONE | FOB | pending foxes. slyly re |
+------------+-----------+-----------+--------------+------------+------------------+------------+-------+--------------+--------------+------------+------------+-------------+--------------------+------------+----------------------------------+
Environment preparation
Before you start, make sure that the following conditions are met:
- OceanBase Database version is ≥ V4.3.5 BP2 (which supports ODPS Catalog)
- JDK 8 or 11 is deployed and Java support is enabled (
ob_enable_java_env = true) - OceanBase JAVA SDK is deployed. For more information, see Deploy OceanBase Database JAVA SDK.
Method 1: Use ODPS Catalog for quick data exploration (Recommended)
OceanBase Database supports ODPS Catalog starting from V4.3.5 BP2, allowing direct access to MaxCompute tables without manually creating external tables.
Create an ODPS Catalog
CREATE EXTERNAL CATALOG odps_lineitem_test
PROPERTIES (
TYPE = 'ODPS',
ACCESSID = 'XXXXXXXXXXXX',
ACCESSKEY = 'xxxxxxxxxxxxxx',
ENDPOINT = 'https://service.cn-hangzhou.maxcompute.aliyun.com/api',
TUNNEL_ENDPOINT = '',
PROJECT_NAME = 'bigdata_public_dataset'
);
Parameter description:
- TUNNEL_ENDPOINT can be left empty, as the system will automatically infer it.
- This method uses Tunnel API by default, suitable for public network environments or when OBServer and ODPS are in different VPCs.
bigdata_public_dataset: For more information, see Public datasets.
Query data for verification
SET CATALOG odps_lineitem_test;
USE odps_lineitem_test.test_db;
SELECT * FROM lineitem LIMIT 5;
Expected result: 5 rows of data consistent with the source table.
Advantages:
- Automatically synchronizes table structure and partition information.
- No need to define column types, avoiding manual mapping errors.
- Suitable for quick data exploration and metadata verification.
Method 2: Create an ODPS External Table (for fine-grained control)
When you need explicit column mapping, enable Storage API, or ensure compatibility with older versions, you can create an ODPS external table.
Switch back to OceanBase Database
If you have already tried Method 1, switch back to OceanBase Database.
USE internal.test;
Create an external table
CREATE EXTERNAL TABLE external_lineitem (
l_orderkey BIGINT,
l_partkey BIGINT,
l_suppkey BIGINT,
l_linenumber BIGINT,
l_quantity DECIMAL(15,2),
l_extendedprice DECIMAL(15,2),
l_discount DECIMAL(15,2),
l_tax DECIMAL(15,2),
l_returnflag CHAR(1),
l_linestatus CHAR(1),
l_shipdate DATE,
l_commitdate DATE,
l_receiptdate DATE,
l_shipinstruct CHAR(25),
l_shipmode CHAR(10),
l_comment VARCHAR(44)
)
PROPERTIES (
TYPE = 'ODPS',
ACCESSID = 'XXXXXXXXXXXX',
ACCESSKEY = 'xxxxxxxxxxxxxx',
ENDPOINT = 'https://service.cn-hangzhou.maxcompute.aliyun.com/api',
TUNNEL_ENDPOINT = '',
PROJECT_NAME = 'bigdata_public_dataset',
SCHEMA_NAME = 'tpch_10g',
TABLE_NAME = 'lineitem',
QUOTA_NAME = '',
COMPRESSION_CODE = 'zstd'
);
Key configuration description:
COMPRESSION_CODE = 'zstd': Enables compression to improve network transmission efficiency.- By default,
Tunnel APIis used, suitable for general network environments. - Column definitions must match those of the MaxCompute table.
bigdata_public_dataset: For more information, see Public datasets.
Verify and perform advanced operations
Verify external table reading
SELECT * FROM external_lineitem LIMIT 5;
Expected output: 5 rows of data consistent with the source table.
(Optional) Import data into an internal table for faster analysis
If you need frequent queries or complex aggregations, you can import the data into an internal table in OceanBase Database:
-- Create an internal table
CREATE TABLE ob_lineitem LIKE external_lineitem;
-- Batch import
INSERT INTO ob_lineitem SELECT * FROM external_lineitem;
-- Verify
SELECT COUNT(*) FROM ob_lineitem;
(Optional) Export data back to MaxCompute
Use the INSERT OVERWRITE statement to write data back to MaxCompute:
-- Assume that the target external table lineitem_target has been created
INSERT OVERWRITE lineitem_target
SELECT * FROM ob_lineitem
WHERE l_shipdate >= '1996-01-01';
Limitations:
- The target table must have the same number of columns, in the same order, and with the same data types as the source table.
- Only the INSERT INTO / OVERWRITE ... SELECT syntax is supported.
Additional information: API mode selection guide
OceanBase Database supports two MaxCompute access modes. Choose the appropriate mode based on your deployment environment:
Scenario 1: High-performance access within VPC (Recommended)
Applicable conditions: OceanBase Database is deployed on Alibaba Cloud ECS and is in the same region as MaxCompute within a VPC.
CREATE EXTERNAL TABLE lineitem1 ( /* Fields are the same as above */ )
PROPERTIES (
TYPE = 'ODPS',
ACCESSID = 'XXXXXXXXXXXX',
ACCESSKEY = 'xxxxxxxxxxxxxx',
ENDPOINT = 'https://service.cn-hangzhou-vpc.maxcompute.aliyun-inc.com/api',
PROJECT_NAME = 'test_obqa',
TABLE_NAME = 'lineitem1',
QUOTA_NAME = 'pay-as-you-go', -- Must be on-demand billing
API_MODE = 'storage_api', -- Enable Storage API
SPLIT = 'byte', -- Shard by byte (recommended for uneven row sizes)
COMPRESSION_CODE = 'zstd'
);
Advantages:
- Supports predicate pushdown and column pruning.
- Low latency and high throughput.
- Automatic parallel sharding.
Scenario 2: Public network or general environment (Prioritizing compatibility)
CREATE EXTERNAL TABLE lineitem ( /* Fields are the same as above */ )
PROPERTIES (
TYPE = 'ODPS',
ACCESSID = 'XXXXXXXXXXXX',
ACCESSKEY = 'xxxxxxxxxxxxxx',
ENDPOINT = 'https://service.cn-hangzhou.maxcompute.aliyun.com/api',
PROJECT_NAME = 'bigdata_public_dataset',
SCHEMA_NAME = 'tpch_10g',
TABLE_NAME = 'lineitem',
COMPRESSION_CODE = 'zstd'
-- Tunnel API is used by default
);
Advantages:
- No VPC network configuration is required.
- Compatible with all deployment environments.
- Simple configuration for quick setup.
For more information, see ODPS External Table.
