This topic describes the system architecture, roles, cluster characteristics, and replay process of OceanBase Migration Assessment (OMA).
System architecture
In the standalone edition, all features are implemented on one server and can be directly used without any special topology. The following figure shows the system architecture of the distributed edition.
Multiple workers constitute a replay cluster, which can replay data in the destination OceanBase database.
The client initiates a replay, controls all workers to simultaneously replay data, summarizes replay information, and generates a report.
Roles
The following two roles exist in the system during a distributed replay:
Worker
A worker is a node used to replay data. Generally, you can configure 2 to 10 workers. You can start multiple workers on one physical server. The multiple workers are independent of each other. However, they must run in different directories and have unique names.
You must start the workers before initiating a replay. After the workers are started, they can perform multiple replays without a restart.
Client
A client schedules and controls the workers. Clients are similar to programs in the standalone edition. A replay is always initiated by a client. After the replay is completed, the client summarizes the information from each worker and generates a report.
You can start multiple clients, which are independent of each other. Each client corresponds to one replay task.
Cluster characteristics
Networking mode
Multiple workers and multiple clients that constitute an OMA cluster must be interconnected over a network. OMA clusters support the following two networking modes:
Multicast networking: This mode is convenient, and requires no special settings. However, the deployment environment needs to support multicast.
Networking with fixed IP addresses: In this mode, you must obtain the IP address of each node in advance. When the nodes are started, their IP addresses are passed to the workers and clients as parameters. This mode is suitable for complex network environments.
Data storage
A cluster that consists of multiple workers is configured with cache and storage systems. You can execute SQL statements to access and operate data in the cluster. During a replay, the replay metrics are saved to the cluster.
Workload capture replay (WCR) files can be large in size. Therefore, the system does not save all SQL statements parsed from the WCR files to the cluster. Instead, the system distributes the statements in file format to workers through the client to improve efficiency.
Replay process
Parse WCR files
A client initiates a parsing task.
The client parses data locally. During parsing, the client evenly distributes all SQL statements parsed from WCR files to the workers for a replay.
The client generates a name for the result set of the parsing task, for a subsequent replay.
Replay data
The client starts the SQL_AUDIT scanner to scan the SQL_AUDIT view of the OceanBase database, and compares the collected SQL statements with those in the source database.
The client starts a replay task. The data source of the task is the result set of the preceding data parsing task.
Optional. If OceanBase Cloud Platform (OCP) is deployed in the environment, the client enables the snapshot mechanism of OCP to generate a snapshot of the OceanBase database.
The client instructs all workers that have the result set to replay the data. In this case, all workers start to replay traffic to the OceanBase database.
After the replay is completed, the workers save their replay results to a table in the cluster, and the client will summarize the results.
The client pulls data from the table in the cluster, summarizes the data, and generates a replay report after combining the scan results from the SQL_AUDIT scanner.
The client creates another snapshot of the OceanBase database, obtains the load, system, and slow query status of the OceanBase database in the period between the two snapshots, and generates a load report for the OceanBase database during the replay.