Migrate data from HBase to OBKV|V4.2.0|OceanBase Migration Service| docs|Distributed Database

Migrate data from HBase to OBKV

Last Updated：2023-11-30 06:45:08 Updated

This topic describes how to use OceanBase Migration Service (OMS) Community Edition to migrate data from an HBase database to OBKV.

Background information

You can create a data migration project in the console of OMS Community Edition to seamlessly migrate the existing business data and incremental data from an HBase database to OBKV through schema migration, full migration, and incremental synchronization.

Prerequisites

You have created a corresponding schema in OBKV. OMS Community Edition allows you to migrate tables and columns. Therefore, you must create a corresponding schema in the destination database before migration.
To migrate data in Flink mode, you need to create a Flink cluster of version 1.14.0.
To use the incremental synchronization feature, you need to enable synchronous replication for HBase tables by setting the REPLICATION_SCOPE parameter to 1.

Limitations

Limitations on the source database

Do not perform DDL operations for database or schema changes during schema migration or full data migration. Otherwise, the data migration project may be interrupted.
Only HBase 1.2.0-cdh5.15.2 is supported.
The migration of tag data, full verification, and reverse incremental migration are not supported.
The incremental data of the source HBase database cannot be migrated in bulkload mode during incremental synchronization.

Considerations

OMS Community Edition cannot obtain the statistics of tables in an HBase database. Therefore, the progress and estimated time displayed on all pages are inaccurate. You can calculate the amount of time required based on the displayed real-time requests per second (RPS) value and the actual number of table records.
Incremental data of an HBase database is replicated by using a peer, which is created in the source HBase database. If the process for incremental synchronization of OMS Community Edition is paused for a long time, the incremental data of the source HBase database may not be sent to the incremental synchronization task of OMS Community Edition for processing. This results in full usage of disks.
The start timestamp of incremental synchronization from the HBase database for a peer is related only to the time when the peer was created, and cannot be specified. If the peer is deleted, you need to create a new data migration project and initialize the full migration task.
During migration in Flink mode, if you want to stop the migration process, you must stop the Flink jobs as well as the full migration and incremental synchronization tasks in OMS Community Edition.

You can find the corresponding Flink jobs based on the ID of the data migration project in the console of OMS Community Edition.
- Full migration job: OMS_BATCH_{project ID in OMS Community Edition}
- Incremental synchronization job: OMS_STREAM_{project ID in OMS Community Edition}
  
  You can also obtain information of the corresponding Flink jobs by viewing the flink_jobid file in the /home/ds/run/{component ID}/conf/ directory in the container of OMS Community Edition.
No statistics are provided for the incremental synchronization task that migrates incremental data from the HBase database to OBKV.

Data type mappings

By default, a column family in the HBase database maps to a table schema in OBKV:

create table if not exists {TABLE_NAME} -- Maps HBase {namespace}.{table_name}${column_family}.
(
    `K` varbinary(1024) not null, -- Maps the HBase rowkey.
    `Q` varbinary(256) not null,  -- Maps the column in the HBase column family.
    `T` bigint not null,          -- Maps the HBase version/timestamp.
    `V` varbinary(1048576),       -- Maps the HBase value.
    primary key(`K`, `Q`, `T`))
partition by key(`K`) partitions 64

You can specify the default CREATE TABLE statement in OBKV for schema migration by modifying the value of the struct.obkv.createtable parameter.

Parameter	Description	CREATE TABLE statement
struct.obkv.createtable	On the System Parameters page, you can modify the value of this parameter to specify the default CREATE TABLE statement in OBKV for schema migration in all projects.	`create table if not exists {TABLE_NAME} (`K`varbinary(1024) not null,`Q`varbinary(256) not null,`T`bigint not null,`V`varbinary(1048576),primary key(`K`,`Q`,`T`)) partition by key(`K`) partitions 64`
structObkvCreatetable	In the Incremental Synchronization section on the Migration Options page, click Configuration Details. Then, you can specify the default CREATE TABLE statement in OBKV for schema migration in the current project by modifying the `sink.json` file.	`create table if not exists {TABLE_NAME} (`K`varbinary(1024) not null,`Q`varbinary(256) not null,`T`bigint not null,`V`varbinary(1048576),primary key(`K`,`Q`,`T`)) partition by key(`K`) partitions 64`

Procedure

Create a data migration project.
1. Log on to the console of OMS Community Edition.
2. In the left-side navigation pane, click Data Migration.
3. On the Data Migration page, click Create Migration Project in the upper-right corner.

On the Select Source and Destination page, configure the parameters.

Parameter	Description
Migration Project Name	We recommend that you set it to a combination of digits and letters. It must not contain any spaces and cannot exceed 64 characters in length.
Tag	Click the field and select a target tag from the drop-down list. You can also click Manage Tags to create, modify, and delete tags. For more information, see Use tags to manage data migration projects.
Source	If you have created an HBase data source, select it from the drop-down list. If not, click New Data Source in the drop-down list and create one in the dialog box that appears on the right. For more information about the parameters, see Create an HBase data source.
Destination	If you have created a data source of OceanBase Database Community Edition, select it from the drop-down list. If not, click New Data Source in the drop-down list and create one in the dialog box that appears on the right. For more information about parameters, see Create a data source of OceanBase Database Community Edition.

Click Next. On the Select Migration Type page, configure the parameters.

The following options are available for Migration Type: Schema Migration, Full Migration, and Incremental Migration.

Migration type	Description
Schema migration	The definitions of data objects, such as tables, indexes, constraints, comments, and views, are migrated from the source database to the destination database. Temporary tables are automatically filtered out.
Full migration	The existing data is migrated from tables in the source database to the corresponding tables in the destination database.
Incremental synchronization	Changed data in the source database is synchronized to the corresponding tables in the destination database after an incremental synchronization task starts. Data changes are data addition, modification, and deletion.

Click Next. On the Select Migration Objects page, select the migration objects and migration scope.

You can select Specify Objects or Match Rules to specify the migration objects.

Select Specify Objects. Then select the objects to be migrated on the left and click > to add them to the list on the right. OBKV supports only tables with a single column family. Therefore, a table with multiple column families in the HBase database corresponds to multiple tables in OBKV.

Notice

The names of tables to be migrated, as well as the names of columns in the tables, must not contain Chinese characters.
If the database or table name contains a double dollar sign ($$), you cannot create the migration project.

OMS Community Edition also allows you to import objects from text, rename objects, set row filters, view column information, and remove a single migration object or all migration objects.

Operation	Step
Import Objects	In the list on the right of the Specify Migration Scope section, click Import Objects in the upper-right corner. In the dialog box that appears, click OK. Notice This operation will overwrite previous selections. Proceed with caution. In the Import Objects dialog box, import the objects to be migrated. You can import CSV files to rename databases/tables and set row filtering conditions. For more information, see Download and import the settings of migration objects. Click Validate. After the validation succeeds, click OK.
Rename	OMS Community Edition allows you to rename migration objects. For more information, see Rename a database table.
Settings	OMS Community Edition allows you to filter rows by using `WHERE` conditions. For more information, see Use SQL conditions to filter data. You can also view column information of the migration object in the View Column section.
Remove/Remove All	OMS Community Edition allows you to remove a single object or all objects to be migrated to the destination database during data mapping. To remove a single migration object: In the list on the right of the Specify Migration Scope section, hover the pointer over the target object, and click Remove. To remove all migration objects: In the list on the right of the Specify Migration Scope section, click Remove All in the upper-right corner. In the dialog box that appears, click OK.

Select Match Rules. For more information, see Configure matching rules for migration objects.

Click Next. On the Migration Options page, configure the parameters.

Full migration

The following table describes the full migration parameters, which are displayed only when you have selected Full Migration on the Select Migration Type page.

Parameter	Description
Concurrency Speed	Valid values: Stable, Normal, Fast, and Custom. The amount of resources to be consumed by a full data migration task varies based on the migration performance. If you select Custom, you can set Read Concurrency, Write Concurrency, and JVM Memory as needed.
Processing Strategy When Records Exist in Target Object	Valid values: Ignore and Stop Migration. If you select Ignore, the data in the source and destination databases may be inconsistent. If you select Stop Migration, the project is set to the Failed state when the system detects records in the destination table. To continue data migration, manually resume the project.
Computing Platform	The default value is `local`, which indicates the local running mode. You can also choose to run on the Flink computing platform. To add a computing platform, click Manage Computing Platform in the drop-down list. For more information, see Manage computing platforms.

You can specify the query method for full migration by setting the queryType parameter in the source section. Valid values are hfile and scan. The default value is hfile, which indicates that the full data is obtained by reading HFiles. By default, a table flush operation is performed before the full migration starts. To disable the operation, set the flushTable parameter in the source section to false. To view or modify parameters related to full migration, click Configuration Details in the upper-right corner of the Full Migration section. For more information about the parameters, see Coordinator.

Incremental synchronization

The following table describes the full migration parameters, which are displayed only when you have selected Full Migration on the Select Migration Type page.

Parameter	Description
Concurrency Speed	Valid values: Stable, Normal, Fast, and Custom. The amount of resources to be consumed by an incremental synchronization task varies based on the synchronization performance. If you select Custom, you can set Read Concurrency, Write Concurrency, and JVM Memory as needed.
Peer ID	Use the default value.
rootDir	Use the default value.
zkHost	The ZooKeeper configuration used by the Incr-Sync component to simulate startup of HBase. You must specify the parameter.
zkPath	Use the default value.
Computing Platform	The default value is `local`, which indicates the local running mode. You can also choose to run on the Flink computing platform. To add a computing platform, click Manage Computing Platform in the drop-down list. For more information, see Manage computing platforms.

By default, OMS Community Edition starts one simulated region for incremental synchronization. You can change the number by modifying the regions parameter in the source section. If the traffic of incremental data is heavy, you can specify multiple regions to increase the speed of incremental synchronization. To view or modify parameters related to incremental synchronization, click Configuration Details in the upper-right corner of the Incremental Synchronization section. For more information about the parameters, see Coordinator.

Click Precheck to start a precheck on the data migration project.

During the precheck, OMS Community Edition checks the read and write privileges of the database users and the network connectivity of the databases. The data migration project can be started only after it passes all check items. If an error is returned during the precheck:
- You can identify and troubleshoot the problem and then perform the precheck again.
- You can also click Skip in the Actions column of the failed precheck item. A dialog box appears, prompting you the impact. If you want to skip this operation, click OK.
Click Start Project. If you do not need to start the project now, click Save to go to the details page of the data migration project. You can start the project later as needed.

OMS Community Edition allows you to modify the migration objects when the migration project is running. For more information, see View and modify migration objects. After a data migration project is started, the migration subtasks will be executed based on the selected migration types. For more information, see the "View migration details" section in the View details of a data migration project topic.

Enterprise Edition

Community Edition

Migrate data from HBase to OBKV

Background information

Prerequisites

Limitations

Considerations

Data type mappings

Procedure

Notice