Migrate data from HBase to OBKV|V4.2.6|OceanBase Migration Service| docs|Distributed Database

This topic describes how to use OceanBase Migration Service (OMS) Community Edition to migrate data from an HBase database to OBKV.

Background information

You can create a data migration task in the console of OMS Community Edition to seamlessly migrate the existing business data and incremental data from an HBase database to OBKV through schema migration, full migration, and incremental synchronization.

Prerequisites

You have created a corresponding schema in OBKV. OMS Community Edition allows you to migrate tables and columns. Therefore, you must create a corresponding schema in the destination database before the migration.
To migrate data in Flink mode, you need to create a Flink cluster of version 1.14.0.
To use the incremental synchronization feature, you need to enable synchronous replication for HBase tables by setting the REPLICATION_SCOPE parameter to 1.

Limitations

Limitations on the source database

Do not perform DDL operations for database or schema changes during schema migration or full migration. Otherwise, the data migration task may be interrupted.
Only HBase 1.2.0-cdh5.15.2 and 2.4.13 are supported.
The migration of tag data, full verification, and reverse incremental migration are not supported.
The incremental data of the source HBase database cannot be migrated in bulkload mode during incremental synchronization.

Considerations

To ensure the performance of a data migration task, we recommend that you migrate no more than 1,000 tables at a time.
OMS Community Edition cannot obtain the statistics of tables in an HBase database. Therefore, the progress and estimated time displayed on all pages are inaccurate. You can calculate the amount of time required based on the displayed real-time requests per second (RPS) value and the actual number of table records.
Incremental data of an HBase database is replicated by using a peer, which is created in the source HBase database. If the process for incremental synchronization of OMS Community Edition is paused for a long time, the incremental data of the source HBase database may not be sent to the incremental synchronization task of OMS Community Edition for processing. This results in full usage of disks.
The start timestamp of incremental synchronization from the HBase database for a peer is related only to the time when the peer was created, and cannot be specified. If the peer is deleted, you need to create a new data migration task and initialize the full migration task.
During migration in Flink mode, if you want to stop the migration process, you must stop the Flink jobs as well as the full migration and incremental synchronization tasks in OMS Community Edition.

You can find the corresponding Flink jobs based on the ID of the data migration task in the console of OMS Community Edition.
- Full migration job: OMS_BATCH_{task ID in OMS Community Edition}
- Incremental synchronization job: OMS_STREAM_{task ID in OMS Community Edition}
  
  You can also obtain information about the corresponding Flink jobs by viewing the flink_jobid file in the /home/ds/run/{component ID}/conf/ directory in the container of OMS Community Edition.
No statistics are provided for the incremental synchronization task that migrates incremental data from the HBase database to OBKV.

Data type mappings

By default, a column family in the HBase database maps to a table schema in OBKV:

create table if not exists {TABLE_NAME} -- Maps HBase {namespace}.{table_name}${column_family}.
(
    `K` varbinary(1024) not null, -- Maps the HBase rowkey.
    `Q` varbinary(256) not null,  -- Maps the column in the HBase column family.
    `T` bigint not null,          -- Maps the HBase version/timestamp.
    `V` varbinary(1048576),       -- Maps the HBase value.
    primary key(`K`, `Q`, `T`))
partition by key(`K`) partitions 64

You can specify the default CREATE TABLE statement in OBKV for schema migration by modifying the value of the struct.obkv.createtable parameter.

Parameter	Description	CREATE TABLE statement
struct.obkv.createtable	On the System Parameters page, you can modify the value of this parameter to specify the default CREATE TABLE statement in OBKV for schema migration in all tasks.	`create table if not exists {TABLE_NAME} (`K`varbinary(1024) not null,`Q`varbinary(256) not null,`T`bigint not null,`V`varbinary(1048576),primary key(`K`,`Q`,`T`)) partition by key(`K`) partitions 64`
structObkvCreatetable	In the Incremental Synchronization section on the Migration Options page, click Configuration Details. Then, you can specify the default CREATE TABLE statement in OBKV for schema migration in the current task by modifying the `sink.json` file.	`create table if not exists {TABLE_NAME} (`K`varbinary(1024) not null,`Q`varbinary(256) not null,`T`bigint not null,`V`varbinary(1048576),primary key(`K`,`Q`,`T`)) partition by key(`K`) partitions 64`

Procedure

Create a data migration task.
1. Log on to the console of OMS Community Edition.
2. In the left-side navigation pane, click Data Migration.
3. On the Data Migration page, click Create Migration Task in the upper-right corner.

On the Select Source and Destination page, configure the parameters.

Parameter	Description
Migration Task Name	We recommend that you set it to a combination of digits and letters. It must not contain any spaces and cannot exceed 64 characters in length.
Tag	Click the field and select a target tag from the drop-down list. You can also click Manage Tags to create, modify, and delete tags. For more information, see Use tags to manage data migration tasks.
Source	If you have created an HBase data source, select it from the drop-down list. If not, click New Data Source in the drop-down list and create one in the dialog box that appears on the right. For more information about the parameters, see Create an HBase data source.
Destination	If you have created a data source of OceanBase Database Community Edition, select it from the drop-down list. If not, click New Data Source in the drop-down list and create one in the dialog box that appears on the right. For more information about parameters, see Create a data source of OceanBase Database Community Edition.

Click Next. On the Select Migration Type page, configure the parameters.

The following options are available for Migration Type: Schema Migration, Full Migration, and Incremental Migration.

Migration type	Description
Schema migration	The definitions of data objects, such as tables, indexes, constraints, comments, and views, are migrated from the source database to the destination database. Temporary tables are automatically filtered out.
Full migration	The existing data is migrated from tables in the source database to the corresponding tables in the destination database.
Incremental synchronization	Changed data in the source database is synchronized to the corresponding tables in the destination database after an incremental synchronization task starts. Data changes are data addition, modification, and deletion.

Click Next. On the Select Migration Objects page, select the migration objects and migration scope.

You can select Specify Objects or Match Rules to specify the migration objects.

Select Specify Objects. Then select the objects to be migrated on the left and click > to add them to the list on the right. OBKV supports only tables with a single column family. Therefore, a table with multiple column families in the HBase database corresponds to multiple tables in OBKV.

Notice

The names of tables to be migrated, as well as the names of columns in the tables, must not contain Chinese characters.
If the database or table name contains a double dollar sign ($$), you cannot create the migration task.

OMS Community Edition also allows you to import objects from text, rename objects, set row filters, view column information, and remove a single migration object or all migration objects.

Operation	Description
Import objects	In the list on the right of the Specify Migration Scope section, click Import Objects in the upper-right corner. In the dialog box that appears, click OK. Notice This operation will overwrite previous selections. Proceed with caution. In the Import Objects dialog box, import the objects to be migrated. You can import CSV files to rename databases/tables and set row filtering conditions. For more information, see Download and import the settings of migration objects. Click Validate. After the validation succeeds, click OK.
Rename an object	OMS Community Edition allows you to rename migration objects. For more information, see Rename a database table.
Configure settings	OMS Community Edition allows you to filter rows by using `WHERE` conditions. For more information, see Use SQL conditions to filter data. You can also view column information of the migration object in the View Column section.
Remove one or all objects	OMS Community Edition allows you to remove a single object or all objects to be migrated to the destination database during data mapping. To remove a single migration object: In the list on the right of the Specify Migration Scope section, move the pointer over the target object, and click Remove. To remove all migration objects: In the list on the right of the Specify Migration Scope section, click Remove All in the upper-right corner. In the dialog box that appears, click OK.

Select Match Rules. For more information, see Configure matching rules for migration objects.

Click Next. On the Migration Options page, configure the parameters.

Full migration

The following table describes the full migration parameters, which are displayed only if you have selected Full Migration on the Select Migration Type page.

Parameter	Description
Concurrency Speed	Valid values: Stable, Normal, Fast, and Custom. The amount of resources to be consumed by a full data migration task varies based on the migration performance. If you select Custom, you can set Read Concurrency, Write Concurrency, and JVM Memory as needed.
Processing Strategy When Records Exist in Target Object	Valid values: Ignore and Stop Migration. If you select Ignore, when the data to be inserted conflicts with the existing data of a destination table, OMS Community Edition retains the existing data and records the conflict data. Notice If you select Ignore, data is pulled in IN mode for verification. In this case, the scenario where the destination table contains more data than the source table cannot be verified, and the verification efficiency will be decreased. If you select Stop Migration and a destination table contains records, an error is returned during full migration, indicating that the migration is not allowed. In this case, you must clear the data in the destination table before you can continue with the migration. Notice After an error is returned, if you click Resume in the dialog box, OMS Community Edition ignores this error and continues to migrate data. Proceed with caution.
Computing Platform	The default value is `local`, which indicates the local running mode. You can also choose to run on the Flink computing platform. To add a computing platform, click Manage Computing Platform in the drop-down list. For more information, see Manage computing platforms.
Writing Method	Valid values: SQL (specifies to write data to tables by using `INSERT` or `REPLACE`) and Direct Load (specifies to write data through bypass import). The limitations on the Direct Load are as follows: This write mode is supported for full migration only when the destination is an instance of OceanBase Database Community Edition of a version later than V4.2.1. The size of data in a single row cannot exceed 2 MB. Only data duplication caused by primary key constraints can be handled, and that caused by unique key constraints cannot be handled. Generated columns, triggers, and user-defined types (UDTs) are not supported. Bypass import applies to empty tables and cannot be interrupted. In other words, resumable transmission is not supported. Therefore, if bypass import is used for full migration and is interrupted, the data in the source is read again even after the task resumes normal. If an imported table contains large object (LOB) fields, execute the following statement to disable the `enable_rebalance` parameter for OceanBase Database Community Edition: `SHOW PARAMETERS LIKE 'enable_rebalance';` `ALTER SYSTEM SET enable_rebalance = false;` Requests in a queue may time out if data is concurrently written in bypass mode. We recommend that you set `soure.sliceWorkerNum` to 1. The number and order of fields in the source table must be the same as those in the destination table. In a multi-table aggregation scenario, data writing through bypass import is not supported.

You can specify the query method for full migration by setting the queryType parameter in the source section. Valid values are hfile and scan. The default value is hfile, which indicates that the full data is obtained by reading HFiles. By default, a table flush operation is performed before the full migration starts. To disable the operation, set the flushTable parameter in the source section to false. To view or modify parameters related to full migration, click Configuration Details in the upper-right corner of the Full Migration section. For more information about the parameters, see Coordinator.

Incremental synchronization

The following table describes the full migration parameters, which are displayed only if you have selected Full Migration on the Select Migration Type page.

Parameter	Description
Concurrency Speed	Valid values: Stable, Normal, Fast, and Custom. The amount of resources to be consumed by an incremental synchronization task varies based on the synchronization performance. If you select Custom, you can set Read Concurrency, Write Concurrency, and JVM Memory as needed.
Peer ID	Use the default value.
rootDir	Use the default value.
zkHost	The ZooKeeper configuration used by the Incr-Sync component to simulate the startup of HBase. You must specify the parameter.
zkPath	Use the default value.
Computing Platform	The default value is `local`, which indicates the local running mode. You can also choose to run on the Flink computing platform. To add a computing platform, click Manage Computing Platform in the drop-down list. For more information, see Manage computing platforms.

By default, OMS Community Edition starts one simulated region for incremental synchronization. You can change the number by modifying the regions parameter in the source section. If the traffic of incremental data is heavy, you can specify multiple regions to increase the speed of incremental synchronization. To view or modify parameters related to incremental synchronization, click Configuration Details in the upper-right corner of the Incremental Synchronization section. For more information about the parameters, see Coordinator.

Click Precheck to start a precheck on the data migration task.

During the precheck, OMS Community Edition checks the read and write privileges of the database users and the network connectivity of the databases. The data migration task can be started only after it passes all check items. If an error is returned during the precheck, you can perform the following operations:
- Identify and troubleshoot the error and then perform the precheck again.
- Click Skip in the Actions column of the failed precheck item. In the dialog box that prompts the consequences of the operation, click OK.
Click Start Task. If you do not need to start the task now, click Save to go to the details page of the data migration task. You can start the task later as needed.

OMS Community Edition allows you to modify the migration objects when the migration task is running. For more information, see View and modify migration objects. After a data migration task is started, the migration subtasks will be executed based on the selected migration types. For more information, see the "View migration details" section in the View details of a data migration task topic.

Enterprise Edition

Community Edition

Migrate data from HBase to OBKV

Background information

Prerequisites

Limitations

Considerations

Data type mappings

Procedure

Notice

Notice

Notice