Overview
Dataphin is a cloud-based implementation of the OneData data governance methodology used internally by Alibaba Group. It provides a comprehensive set of big data capabilities for the entire lifecycle of data collection, construction, management, and utilization. This helps enterprises significantly improve their data governance levels and build enterprise-grade data middleware platforms that are reliable, convenient, and secure.
This topic describes how to configure an OceanBase Database data source in Dataphin to integrate offline and real-time data.
Version compatibility
- OceanBase Database version: ≥ V4.2.1.
Prerequisites
Before you use Dataphin, make sure that:
- You have an Dataphin account.
- You have deployed OceanBase Database and created a MySQL compatible user tenant. For more information about how to create a user tenant, see Create a tenant.
- You have enabled the Binlog service in the MySQL compatible user tenant. For more information, see OceanBase Binlog service.
Procedure
Step 1: Obtain the database connection string
Contact the OceanBase Database deployment personnel to obtain the connection string, for example:
obclient -h$host -P$port -u$user_name -p$password -D$database_name
Parameter description:
$host: the IP address for connection. For ODP connection, use the ODP IP address. For direct connection, use the OBServer IP address.$port: the connection port. For ODP connection, the default value is2883. For direct connection, the default value is2881.$database_name: the database name.Notice
The user used to connect to the tenant must have the
CREATE,INSERT,DROP, andSELECTprivileges on the database. For more information about user privileges, see Privilege types in MySQL mode.$user_name: the connection account. For ODP connection, the format isuser@tenant#clusterorcluster:tenant:user. For direct connection, the format isuser@tenant.$password: the account password.
For more information about the connection string, see Connect to an OceanBase tenant by using OBClient.
Here is an example:
obclient -hxxx.xxx.xxx.xxx -P2881 -utest_user001@mysql001 -p****** -Dtest
Step 2: Configure a Dataphin compute task
Dataphin allows you to synchronize offline data from an OceanBase data source and process real-time data by connecting to OceanBase through a MySQL data source. It is suitable for enterprise data warehouse construction, data lake building, and real-time data analysis.
Offline compute task
You can directly connect to an OceanBase data source to create an offline compute task.
- In the Dataphin management center, click Data Source Management and then click Create Data Source. Select OceanBase.
- Fill in the OceanBase database information (obtained in step 1) and click Test Connection to verify the connection.
- In the management center, click Data Integration and select Offline Pipeline to create a task.
- Select Manual Node (for testing) or Periodic Node (for regular execution).
- In the component library, you can select an existing OceanBase data source as the input or output.
- After the configuration is complete, click Submit and then click Go to O&M to go to the development page.
- Click Run to execute the data migration task.
Notice
Each execution of an offline compute task collects full data. If the data does not exist in the target table, it is inserted. If the data exists, it is not inserted.
Real-time compute task
To create a real-time compute task, you must connect to OceanBase through a MySQL data source. It supports full and incremental data collection.
- In the Dataphin management center, click System Settings and then click Real-time Compute Engine. Select Flink.
- Click Data Source Management and then click Create Data Source. Select MySQL.
- Fill in the OceanBase connection information:
| Parameter | Required | Default Value | Type | Description |
|---|---|---|---|---|
| JDBC URL | Yes | N/A | String | jdbc:mysql://$host:$port/$database_name |
| Username | Yes | N/A | String | $user_name |
| Password | Yes | N/A | String | $password |
| Version | Yes | N/A | String | MySQL 5.6/5.7 |
- Click Test Connection to verify the connection.
- On the development page, click + and select Metadata Table. Fill in the metadata table name and select the data source.
- Click Submit and then select Compute Task.
- Create an Flink SQL task and fill in the relevant information.
- Click Submit and then click Go to O&M to go to the real-time compute page.
- On the real-time compute page, you can edit and execute Flink SQL statements to process data in real time.
Notice
Dataphin does not support directly selecting OceanBase as a data source for real-time compute. In Flink, the OceanBase connector can only be used as a destination table or dimension table, not as a source table.
