This topic describes how to locate and troubleshoot issues that occur when log synchronization is stuck for the physical standby database.
Issue 1: The RECOVER statement is not executed, the RECOVER statement is executed to restore the tenant to a specified end point, or the ALTER SYSTEM RECOVER xxx CANCEL statement has been executed
Symptom
- After you restore a standby tenant through physical restore, if you do not execute the
RECOVERstatement to specify a new restore end point for the standby tenant, the log synchronization for the standby tenant is stuck when log streams are restored to the previously specified end point. - After you restore a standby tenant through physical restore, if you execute the
RECOVERstatement to specify a restore end point for the standby tenant, the log synchronization for the standby tenant is stuck when log streams are restored to the specified end point. - After you restore a standby tenant through physical restore, if you execute the
RECOVERstatement and then theALTER SYSTEM RECOVER xxx CANCELstatement, the log synchronization for the standby tenant is stuck when log streams are restored to the specified end point.
Possible causes
Generally, after you restore a standby tenant through physical restore, if you do not execute the RECOVER statement or if you execute the RECOVER statement to restore the standby tenant to a specified end point, the log synchronization for the standby tenant will stop at the specified restore end point. If you execute the RECOVER statement and then the ALTER SYSTEM RECOVER xxx CANCEL statement on the standby tenant, the log synchronization for the standby tenant will also stop at the specified restore end point.
As shown in the following example, the role of the restore_oracle_tenant tenant is STANDBY and the synchronization progress is stuck at a point in time. This is because you did not execute the RECOVER statement, you executed the RECOVER statement to restore the standby tenant to a specified end point, or you executed the ALTER SYSTEM RECOVER xxx CANCEL statement after the RECOVER statement. When the value of SYNC_SCN equals the value of RECOVERY_UNTIL_SCN, no more logs need to be synchronized for the standby tenant.
*************************** 1. row ***************************
TENANT_ID: 1002
TENANT_NAME: restore_oracle_tenant
TENANT_ROLE: STANDBY
STATUS: NORMAL
SWITCHOVER_STATUS: NORMAL
SYNC_SCN: 1690425747241344851
REPLAYABLE_SCN: 1690425747241344851
READABLE_SCN: 1690425747241344851
RECOVERY_UNTIL_SCN: 1690425747241344851
1 row in set
Troubleshooting procedure
You can query the DBA_OB_TENANTS view for the synchronization timestamp.
Log in as the administrator to the problematic tenant or the
systenant of the cluster to which the problematic tenant belongs.Query the basic status information of the problematic tenant, such as the role and synchronization timestamp of the tenant, based on its
TENANT_IDorTENANT_NAMEvalue.Query information about the problematic tenant from the
systenantSELECT TENANT_ID, TENANT_NAME, TENANT_ROLE, STATUS, SWITCHOVER_STATUS, SYNC_SCN, REPLAYABLE_SCN,READABLE_SCN, RECOVERY_UNTIL_SCN FROM oceanbase.DBA_OB_TENANTS;Query information about the problematic tenant from the current tenant
MySQL modeOracle modeHere is a sample statement:
SELECT TENANT_ID, TENANT_NAME, TENANT_ROLE, STATUS, SWITCHOVER_STATUS, SYNC_SCN, REPLAYABLE_SCN,READABLE_SCN, RECOVERY_UNTIL_SCN FROM oceanbase.DBA_OB_TENANTS;Here is a sample statement:
SELECT TENANT_ID, TENANT_NAME, TENANT_ROLE, STATUS, SWITCHOVER_STATUS, SYNC_SCN, REPLAYABLE_SCN,READABLE_SCN, RECOVERY_UNTIL_SCN FROM SYS.DBA_OB_TENANTS;
Pay attention to values in the following columns in the query result:
TENANT_ROLE: the role of the tenant. The valueSTANDBYindicates a standby tenant. The valuePRIMARYindicates a primary tenant.SYNC_SCN: the synchronization timestamp of the tenant.REPLAYABLE_SCN: the replayable timestamp of the tenant. Logs whose SCNs are greater than this timestamp cannot be replayed.RECOVERY_UNTIL_SCN: the end point to which the tenant can be restored. The value4611686018427387903indicates infinity.
For more information about the
DBA_OB_TENANTSview, see DBA_OB_TENANTS.
Issue 2: No log restore source is configured for the tenant, or the log restore source is empty or damaged
Symptom
- After a standby tenant is created by using the
BACKUP DATABASE PLUS ARCHIVELOGstatement, no log restore source is configured, or the configured log restore source is empty. For example, an empty log source is configured by using theALTER SYSTEM SET log_restore_source = '';statement. - The log restore source is cleared for the standby tenant.
- After the primary tenant is switched to the
STANDBYrole, no log restore source is configured for it.
Possible causes
In OceanBase Database, a standby tenant pulls logs from the log restore source. By default, after you execute the RECOVER statement on a standby tenant that is created by using the physical backup and restore feature (with complete logs), the standby tenant continues to synchronize logs from the archive directory specified during the restore. By default, a standby tenant created by using the CREATE STANDBY TENANT statement directly synchronizes logs from the primary tenant over the network. Continuous log synchronization cannot be directly enabled for a standby tenant created by using the BACKUP DATABASE PLUS ARCHIVELOG statement because no log restore source is specified for the tenant.
In addition, if the log restore source of a standby tenant is empty or damaged, log synchronization of the standby tenant will also be stuck. For example, if the primary tenant is deleted or the password of the replication account used to access the primary tenant is changed for a network-based physical standby database, or the log archive directory used to restore logs for an archive-based physical standby database is deleted, log synchronization of the standby tenant will be stuck.
Troubleshooting procedure
You can query the CDB_OB_LOG_RESTORE_SOURCE or DBA_OB_LOG_RESTORE_SOURCE view to verify whether a log restore source is configured for the problematic tenant.
Log in as the administrator to the problematic tenant or the
systenant of the cluster to which the problematic tenant belongs.Verify whether a log restore source is configured for the problematic tenant.
Query information about the problematic tenant from the
systenantSELECT * FROM oceanbase.CDB_OB_LOG_RESTORE_SOURCE WHERE TENANT_ID=1xxx;Query information about the problematic tenant from the current tenant
MySQL modeOracle modeHere is a sample statement:
SELECT * FROM oceanbase.DBA_OB_LOG_RESTORE_SOURCE;Here is a sample statement:
SELECT * FROM SYS.DBA_OB_LOG_RESTORE_SOURCE;
Pay attention to values in the following columns in the query result:
TYPE: The type of the physical standby database. The valueLOCATIONindicates an archive-based physical standby database, in which the current tenant synchronizes logs from the source tenant through archive logs. The valueSERVICEindicates a network-based physical standby database, in which the current tenant directly synchronizes logs from the primary tenant over the network. You can execute theALTER SYSTEM SET log_restore_sourcestatement to change the log restore source of the standby tenant.VALUE: The value of this field corresponds to that ofTYPE, and can be the log archive directory of the source tenant or the access point of the primary tenant.
In the following sample query result, if the log restore source of the
restore_oracle_tenanttenant is empty, no log restore source is configured for this tenant. In this case, log synchronization of the standby tenant will be stuck. If the value ofVALUEis the log archive directory of the source tenant, you must verify whether the log archive directory is accessible from all servers of the primary and standby tenants. If the value ofVALUEis the access point of the primary tenant, you must verify whether the IP address list, username, and password are correctly configured.*************************** 1. row *************************** tenant_id: 1002 type: LOCATION value: file:///data/1 recovery_until_scn: 4611686018427387903 1 row in set
Issue 3: The log synchronization of the standby tenant is abnormal
Symptom
Issue 1 and Issue 2 do not exist, but the log synchronization of the standby tenant is still abnormal.
Possible causes
The logs of the standby tenant may lag behind or the log synchronization may be stuck due to reasons such as log stream recycling of the primary tenant or network factors.
Troubleshooting procedure
The V$OB_LS_LOG_RESTORE_STATUS view aggregates the synchronization status information of all log streams. You can query this view to check the synchronization status of all log streams.
Notice
The V$OB_LS_LOG_RESTORE_STATUS view displays the synchronization status information of only log stream leaders. Therefore, if a log stream of the standby tenant has no leader, you cannot query this view for the log synchronization status of this log stream. In this case, you must first troubleshoot the issue of leader absence.
Log in as the administrator to the problematic tenant or the
systenant of the cluster to which the problematic tenant belongs.Query the synchronization status of all log stream leaders of the problematic tenant.
Query information about the problematic tenant from the
systenantSELECT * FROM oceanbase.V$OB_LS_LOG_RESTORE_STATUS WHERE TENANT_ID=1xxx;Query information about the problematic tenant from the current tenant
MySQL modeOracle modeHere is a sample statement:
SELECT * FROM oceanbase.V$OB_LS_LOG_RESTORE_STATUS;Here is a sample statement:
SELECT * FROM SYS.V$OB_LS_LOG_RESTORE_STATUS;Pay attention to values in the following columns in the query result:
SYNC_SCN: the synchronization timestamp of the log stream.SYNC_STATUS: the synchronization status of the log stream. The valueNORMALindicates that the log synchronization of the log stream is normal. Other values indicate abnormal log synchronization. The following table describes the log synchronization states.
For more information about the
V$OB_LS_LOG_RESTORE_STATUSview, see V$OB_LS_LOG_RESTORE_STATUS.
Log synchronization state Description NORMAL Log synchronization is properly in progress. SOURCE HAS A GAP A gap exists between the logs of the standby tenant and those of the source tenant. This is because logs of the primary tenant are recycled before being synchronized to the standby tenant. STANDBY LOG NOT MATCH The restored logs conflict with those of the standby tenant. This is because two primary tenants exist or the log restore source is incorrectly configured. CHECK USER OR PASSWORD The username or password of the replication account is incorrect and therefore cannot be used to access the original primary tenant. CHECK NETWORK The primary tenant is unreachable. In this case, you need to check for network exceptions. RESTORE SUSPEND The standby tenant has been restored to the specified timestamp. STANDBY NEED UPGRADE The binary version of the standby tenant lags behind and must be upgraded. PRIMARY TENANT DROPPED The primary tenant is dropped. STANDBY IN THROTTLING Write throttling is enabled for the standby tenant. STANDBY LOG DISK IS FULL The disk space of the standby tenant is insufficient. WAIT LOG STREAM CREATED Log streams are being created for the standby tenant. NOT AVAILABLE Log synchronization is unavailable due to other exceptions.
Issue 4: The synchronization timestamp of the standby tenant stops advancing
Symptom
Issue 1, Issue 2, and Issue 3 do not exist, but the log synchronization of the standby tenant is still stuck. In this case, you can try to verify whether the synchronization timestamp of the standby tenant stops advancing.
Possible causes
All logs are restored in a synchronized manner across log streams of the standby tenant. Therefore, all log streams will check the SYNC_SCN value of the tenant and pull logs in advance based on the specified strategy. Log synchronization stops if the SYNC_SCN value of the tenant stops advancing, and resumes until the SYNC_SCN value of the tenant continues to advance.
Troubleshooting procedure
Log in as the administrator to the problematic tenant or the
systenant of the cluster to which the problematic tenant belongs.Query the
DBA_OB_TENANTSview for theSYNC_SCNvalue of the tenant, which specifies the synchronization timestamp.You can query the
SYNC_SCNvalue of the problematic tenant based on itsTENANT_IDorTENANT_NAMEvalue.Query the
SYNC_SCNvalue of the problematic tenant from thesystenantSELECT TENANT_ID, TENANT_NAME, TENANT_ROLE, STATUS, SWITCHOVER_STATUS, SYNC_SCN, REPLAYABLE_SCN,READABLE_SCN, RECOVERY_UNTIL_SCN FROM oceanbase.DBA_OB_TENANTS;Query the
SYNC_SCNvalue of the problematic tenant from the current tenantMySQL modeOracle modeHere is a sample statement:
SELECT TENANT_ID, TENANT_NAME, TENANT_ROLE, STATUS, SWITCHOVER_STATUS, SYNC_SCN, REPLAYABLE_SCN,READABLE_SCN, RECOVERY_UNTIL_SCN FROM oceanbase.DBA_OB_TENANTS;Here is a sample statement:
SELECT TENANT_ID, TENANT_NAME, TENANT_ROLE, STATUS, SWITCHOVER_STATUS, SYNC_SCN, REPLAYABLE_SCN,READABLE_SCN, RECOVERY_UNTIL_SCN FROM SYS.DBA_OB_TENANTS;
Pay attention to values in the following columns in the query result:
TENANT_ROLE: the role of the tenant. The valueSTANDBYindicates a standby tenant. The valuePRIMARYindicates a primary tenant.SYNC_SCN: the synchronization timestamp of the tenant.
Query the
V$OB_LS_LOG_RESTORE_STATUSview for theSYNC_SCNvalues of all log stream leaders.Query information about the problematic tenant from the
systenantSELECT * FROM oceanbase.V$OB_LS_LOG_RESTORE_STATUS WHERE TENANT_ID=1xxx;Query information about the problematic tenant from the current tenant
MySQL modeOracle modeHere is a sample statement:
SELECT * FROM oceanbase.V$OB_LS_LOG_RESTORE_STATUS;Here is a sample statement:
SELECT * FROM SYS.V$OB_LS_LOG_RESTORE_STATUS;Pay attention to the
SYNC_SCNcolumn in the query result, which indicates the synchronization timestamps of log streams.Compare the
SYNC_SCNvalues of all log streams with that of the tenant based on the query results. If theSYNC_SCNvalues of all log streams are greater than that of the tenant and remain unchanged, the recorded log synchronization progress of the tenant is incorrect. In this case, you need to troubleshoot this issue.For more information about the
V$OB_LS_LOG_RESTORE_STATUSview, see V$OB_LS_LOG_RESTORE_STATUS.
Issue 5: A log stream of the standby tenant has no leader
Symptom
The preceding issues do not exist, but the log synchronization of the standby tenant is still stuck. In this case, you can try to check whether a log stream of the standby tenant has no leader.
Possible causes
During log synchronization, a log stream leader in the standby tenant first synchronizes logs from the primary tenant or archive media and then synchronizes the logs to other log stream replicas. If a log stream of the standby tenant has no leader, the log stream cannot synchronize logs and the tenant-level SYNC_SCN value cannot advance. As a result, the log synchronization of all log streams is stuck.
Troubleshooting procedure
You can query the GV$OB_LOG_STAT view to check whether any log stream of the standby tenant has no leader.
Log in as the administrator to the problematic tenant or the
systenant of the cluster to which the problematic tenant belongs.Query the
GV$OB_LOG_STATview for the status information of all log streams of the tenant.Query information about the problematic tenant from the
systenantSELECT TENANT_ID, LS_ID, ROLE FROM oceanbase.GV$OB_LOG_STAT WHERE TENANT_ID=1xxx;Query information about the problematic tenant from the current tenant
MySQL modeOracle modeHere is a sample statement:
SELECT TENANT_ID, LS_ID, ROLE FROM oceanbase.GV$OB_LOG_STAT;You can also execute the following statement to query all log streams without a leader.
SELECT DISTINCT TENANT_ID, LS_ID FROM oceanbase.GV$OB_LOG_STAT WHERE (TENANT_ID, LS_ID) NOT IN (SELECT DISTINCT TENANT_ID, LS_ID FROM oceanbase.GV$OB_LOG_STAT WHERE ROLE='LEADER');Here is a sample statement:
SELECT TENANT_ID, LS_ID, ROLE FROM SYS.GV$OB_LOG_STAT;You can also execute the following statement to query all log streams without a leader.
SELECT DISTINCT TENANT_ID, LS_ID FROM SYS.GV$OB_LOG_STAT WHERE (TENANT_ID, LS_ID) NOT IN (SELECT DISTINCT TENANT_ID, LS_ID FROM SYS.GV$OB_LOG_STAT WHERE ROLE='LEADER');
Fields in the query result are described as follows:
TENANT_ID: the ID of the tenant.LS_ID: the ID of the log stream.ROLE: the role of the log stream replica. The valueLEADERindicates that the log stream replica is a leader. The valueFOLLOWERindicates that the log stream replica is a follower.If the roles of all log stream replicas are
FOLLOWER, the log stream has no leader.
You can also execute the following statements to query all log streams without a leader.
Query information about the problematic tenant from the
systenantSELECT DISTINCT TENANT_ID, LS_ID FROM oceanbase.GV$OB_LOG_STAT WHERE (TENANT_ID, LS_ID) NOT IN (SELECT DISTINCT TENANT_ID, LS_ID FROM oceanbase.GV$OB_LOG_STAT WHERE ROLE='LEADER');Query information about the problematic tenant from the current tenant
MySQL modeOracle modeHere is a sample statement:
SELECT DISTINCT TENANT_ID, LS_ID FROM oceanbase.GV$OB_LOG_STAT WHERE (TENANT_ID, LS_ID) NOT IN (SELECT DISTINCT TENANT_ID, LS_ID FROM oceanbase.GV$OB_LOG_STAT WHERE ROLE='LEADER');Here is a sample statement:
SELECT DISTINCT TENANT_ID, LS_ID FROM SYS.GV$OB_LOG_STAT WHERE (TENANT_ID, LS_ID) NOT IN (SELECT DISTINCT TENANT_ID, LS_ID FROM SYS.GV$OB_LOG_STAT WHERE ROLE='LEADER');
When the query result indicates that all log streams of the standby tenant has a leader, you can check whether any log stream of the primary tenant has no leader in the same way. The leader absence of any log stream of the primary tenant will also cause log synchronization to be stuck for log streams in the standby tenant.
Issue 6: The roles of both the primary and standby tenants are STANDBY
Symptom
In a scenario without cascaded standby databases, the role of at least one of the two tenants in the primary/standby relationship must be PRIMARY. Otherwise, log synchronization between the two tenants will be stuck.
Possible causes
After you perform a switchover to change the role of the primary tenant to STANDBY, the roles of the original primary and standby tenants are both STANDBY.
Troubleshooting procedure
You can query the DBA_OB_TENANTS view to check whether the roles of the primary and standby tenants are both STANDBY.
Log in as the administrator to the primary and standby tenants respectively or to the
systenant of the clusters where the primary and standby tenants belong.Execute the following statements to query the roles of the primary and standby tenants.
Query the role of the primary or standby tenant from the
systenantSELECT TENANT_ID, TENANT_NAME, TENANT_ROLE, STATUS FROM oceanbase.DBA_OB_TENANTS;Query the role of the primary or standby tenant from the current tenant
MySQL modeOracle modeHere is a sample statement:
SELECT TENANT_ID, TENANT_NAME, TENANT_ROLE, STATUS FROM oceanbase.DBA_OB_TENANTS;Here is a sample statement:
SELECT TENANT_ID, TENANT_NAME, TENANT_ROLE, STATUS FROM SYS.DBA_OB_TENANTS;
Fields in the query result are described as follows:
TENANT_ID: the ID of the tenant.TENANT_NAME: the name of the tenant.TENANT_ROLE: the role of the tenant. The valuePRIMARYindicates a primary tenant. The valueSTANDBYindicates a standby tenant.
Check whether the roles of the primary and standby tenants are both
STANDBYin the query result.For more information about the
DBA_OB_TENANTSview, see DBA_OB_TENANTS.