This topic describes the scenarios in which OceanBase Migration Service (OMS) generates alerts.
Downtime alerts
Alert objects: servers deployed with OMS
Judgment method: OMS has an agent (oms_drc_supervisor) on each deployment server. The agents report heartbeats to the OMS control database. Then, OMS compares the heartbeat time with the current time to determine whether a server is available.
Impact on business: If high availability (HA) is not enabled, related projects fail. If HA is enabled, projects related to the server are migrated to an available server. If no server is available, OMS cannot run properly. If migration is triggered, a migration alert is reported to prompt you to check the migration result.
Troubleshooting method:
If the oms_drc_supervisor application is interrupted, log on to the host and run
supervisorctl restart oms_drc_supervisorto restart the application. If the application fails to recover, check the error log under/home/admin/logs/supervisor.For other reasons, such as network problems, on-site O&M intervention is required.
If HA is enabled on multiple servers, you can enable the
enableHostparameter and perform migration. In this case, only JDBCWriter/Connector is migrated.
OMS server resource level alerts
Alert objects: servers deployed with OMS
Judgment method: The agent (oms_drc_supervisor) deployed on each OMS server regularly collects the CPU, memory, and disk usage of the server.
Impact on business: Projects cannot be started on the server, or the migration and synchronization performance may be impaired.
Troubleshooting method: First, run
topto check whether the resource consuming processes are OMS processes. If not, perform troubleshooting accordingly. If yes, perform the following operations:Add an OMS server.
Clean up unnecessary processes or disk files.
Migration project failed
Alert objects: all migration projects
Judgment method: The status of a migration project is displayed as "Failed".
Impact on business: Data migration is interrupted.
Troubleshooting method: Check the cause of failure in the migration project, and then perform troubleshooting accordingly.
Migration project delayed
Alert objects: running migration projects that are in the incremental migration phase
Judgment method: Obtain the maximum delay time of a specific project based on the delay information and the protection level of the migration project that are configured in the alert center. If the project delay time exceeds the expected tolerance level, an alert is sent.
Impact on business: Data cannot be synchronized from the source to the destination in real time, or data cannot be synchronized from the destination to the source in reverse synchronization.
Troubleshooting method: Obtain the following logs and perform troubleshooting according to these logs or information on the View Component Monitoring page:
Store logs. You can check the latest pull time of Store on the View Component Monitoring page to determine whether the data pulled by Store lags behind .
Writer logs.
Synchronization project failed
Alert objects: all synchronization projects
Judgment method: The status of a synchronization project is displayed as "Failed".
Impact on business: Data synchronization is interrupted.
Troubleshooting method: Check the cause of failure in the synchronization project, and then perform troubleshooting accordingly.
Synchronization project delayed
Alert object: running synchronization projects that are in the incremental synchronization phase
Judgment method: Obtain the maximum delay time of a specific project based on the delay information and the protection level of the synchronization project that are configured in the alert center. If the project delay time exceeds the expected tolerance level, an alert is sent.
Impact on business: The incremental data on the source cannot be synchronized to the destination in real time.
Troubleshooting method: Obtain the following logs and perform troubleshooting according to these logs or information on the View Component Monitoring page.
Store logs for store synchronization projects. You can check the latest pull time of Store on the View Component Monitoring page to determine whether the data pulled by Store lags behind.
Connector logs.