This topic describes how to locate and troubleshoot issues that occur when log synchronization is stuck for the physical standby database.
Issue 1: The RECOVER command is not executed, the RECOVER command is executed to restore the tenant to a specified end point, or the ALTER SYSTEM RECOVER xxx CANCEL command has been executed
Symptom
- After you restore a standby tenant through physical restore, if you do not execute the
RECOVERcommand to specify a new restore end point for the standby tenant, the log synchronization for the standby tenant is stuck when log streams are restored to the previously specified end point. - After you restore a standby tenant through physical restore, if you run the
RECOVERcommand to specify a restore end point for the standby tenant, the log synchronization for the standby tenant is stuck when log streams are restored to the specified end point. - After you restore a standby tenant through physical restore, if you run the
RECOVERcommand and then theALTER SYSTEM RECOVER xxx CANCELcommand, the log synchronization for the standby tenant is stuck when log streams are restored to the specified end point.
Possible causes
Generally, after you restore a standby tenant through physical restore, if you do not execute the RECOVER command or if you execute the RECOVER command to restore the standby tenant to a specified end point, the log synchronization for the standby tenant will stop at the specified restore end point. If you execute the RECOVER command and then the ALTER SYSTEM RECOVER xxx CANCEL command on the standby tenant, the log synchronization for the standby tenant will also stop at the specified restore end point.
As shown in the following example, the role of the restore_oracle_tenant tenant is STANDBY and the synchronization progress is stuck at a point in time. This is because you did not execute the RECOVER command, you executed the RECOVER command restore the standby tenant to a specified end point, or you executed the ALTER SYSTEM RECOVER xxx CANCEL command after the RECOVER command. When the synchronization checkpoint SYNC_SCN equals the restorable checkpoint RECOVERY_UNTIL_SCN, no more logs need to be synchronized for the standby tenant.
*************************** 1. row ***************************
TENANT_ID: 1002
TENANT_NAME: restore_oracle_tenant
TENANT_ROLE: STANDBY
STATUS: NORMAL
SWITCHOVER_STATUS: NORMAL
SYNC_SCN: 1690425747241344851
REPLAYABLE_SCN: 1690425747241344851
READABLE_SCN: 1690425747241344851
RECOVERY_UNTIL_SCN: 1690425747241344851
1 row in set
Troubleshooting procedure
Query the DBA_OB_TENANTS view for the synchronization checkpoint.
Log on as the administrator to the problematic tenant or the sys tenant of the cluster to which the problematic tenant belongs.
Query the basic status information of the problematic tenant, such as the role and synchronization checkpoint of the tenant, based on its
TENANT_IDorTENANT_NAMEvalue.Query information about the problematic tenant from the sys tenant
SELECT TENANT_ID, TENANT_NAME, TENANT_ROLE, STATUS, SWITCHOVER_STATUS, SYNC_SCN, REPLAYABLE_SCN,READABLE_SCN, RECOVERY_UNTIL_SCN FROM oceanbase.DBA_OB_TENANTS;Query information about the problematic tenant from the current tenant
MySQL modeOracle modeThe syntax is as follows:
SELECT TENANT_ID, TENANT_NAME, TENANT_ROLE, STATUS, SWITCHOVER_STATUS, SYNC_SCN, REPLAYABLE_SCN,READABLE_SCN, RECOVERY_UNTIL_SCN FROM oceanbase.DBA_OB_TENANTS;The syntax is as follows:
SELECT TENANT_ID, TENANT_NAME, TENANT_ROLE, STATUS, SWITCHOVER_STATUS, SYNC_SCN, REPLAYABLE_SCN,READABLE_SCN, RECOVERY_UNTIL_SCN FROM SYS.DBA_OB_TENANTS;
Pay attention to values in the following columns in the query result:
TENANT_ROLE: the role of the tenant.STANDBYindicates a standby tenant andPRIMARYindicates a primary tenant.SYNC_SCN: the synchronization checkpoint of the tenant.REPLAYABLE_SCN: the replayable checkpoint of the tenant. Logs whose SCNs are greater than this checkpoint cannot be replayed.RECOVERY_UNTIL_SCN: the end point to which the tenant can be restored. The value4611686018427387903indicates infinity.
For more information about the
DBA_OB_TENANTSview, see DBA_OB_TENANTS.
Issue 2: No log restore source is configured for the tenant, or the log restore source is empty or damaged
Symptom
- After a standby tenant is created by using the
BACKUP DATABASE PLUS ARCHIVELOGstatement, no log restore source is configured, or the configured log restore source is empty. For example, an empty log source is configured by using theALTER SYSTEM SET log_restore_source = '';statement. - The log restore source is cleared for the standby tenant.
- After the primary tenant is switched to the STANDBY role, no log restore source is configured for it.
Possible causes
In OceanBase Database, a standby tenant pulls logs from the log restore source. By default, after you run the RECOVER command on a standby tenant that is created by using the physical backup and restore feature, the standby tenant continues to synchronize logs from the archive directory specified during the restore. By default, a standby tenant created by using the CREATE STANDBY TENANT statement directly synchronizes logs from the primary tenant over the network. Continuous log synchronization cannot be directly enabled for a standby tenant created by using the BACKUP DATABASE PLUS ARCHIVELOG statement because no log restore source is specified for the tenant.
In addition, if the log restore source of a standby tenant is empty or damaged, log synchronization of the standby tenant will also be stuck. For example, if the primary tenant is deleted or the password of the replication account used to access the primary tenant is changed for a network-based physical standby database, or the log archive directory used to restore logs for an archive-based physical standby database is deleted, log synchronization of the standby tenant will be stuck.
Troubleshooting procedure
You can query the CDB_OB_LOG_RESTORE_SOURCE or DBA_OB_LOG_RESTORE_SOURCE view to verify whether a log restore source is configured for the problematic tenant.
Log on as the administrator to the problematic tenant or the sys tenant of the cluster to which the problematic tenant belongs.
Verify whether a log restore source is configured for the problematic tenant.
Query information about the problematic tenant from the sys tenant
SELECT * FROM oceanbase.CDB_OB_LOG_RESTORE_SOURCE WHERE TENANT_ID=1xxx;Query information about the problematic tenant from the current tenant
MySQL modeOracle modeThe syntax is as follows:
SELECT * FROM oceanbase.DBA_OB_LOG_RESTORE_SOURCE;The syntax is as follows:
SELECT * FROM SYS.DBA_OB_LOG_RESTORE_SOURCE;
Pay attention to values in the following columns in the query result:
TYPE: The type of the physical standby database. The valueLOCATIONindicates an archive-based physical standby database, in which the current tenant synchronizes logs from the source tenant through archive logs. The valueSERVICEindicates a network-based physical standby database, in which the current tenant directly synchronizes logs from the primary tenant over the network. You can execute theALTER SYSTEM SET log_restore_sourcestatement to change the log restore source of the standby tenant.VALUE: The value of this field corresponds to that ofTYPE, and can be the log archive directory of the source tenant or the access point of the primary tenant.
In the following sample query result, if the log restore source of the
restore_oracle_tenanttenant is empty, no log restore source is configured for this tenant. In this case, log synchronization of the standby tenant will be stuck. If the value ofVALUEis the log archive directory of the source tenant, you must verify whether the log archive directory is accessible from all servers of the primary and standby tenants. If the value ofVALUEis the access point of the primary tenant, you must verify whether the IP address list, username, and password are correctly configured.*************************** 1. row *************************** tenant_id: 1002 type: LOCATION value: file:///data/1 recovery_until_scn: 4611686018427387903 1 row in setFor more information about the
CDB_OB_LOG_RESTORE_SOURCEandDBA_OB_LOG_RESTORE_SOURCEviews, see CDB_OB_LOG_RESTORE_SOURCE and DBA_OB_LOG_RESTORE_SOURCE.
Issue 3: The log synchronization of the standby tenant is abnormal
Symptom
Issue 1 and Issue 2 do not exist, but the log synchronization of the standby tenant is still abnormal.
Possible causes
The logs of the standby tenant may lag behind or the log synchronization may be stuck due to reasons such as log stream recycling of the primary tenant or network factors.
Troubleshooting procedure
The V$OB_LS_LOG_RESTORE_STATUS view aggregates the synchronization status information of all log streams. You can query this view to check the synchronization status of all log streams.
Notice
The V$OB_LS_LOG_RESTORE_STATUS view displays the synchronization status information of only log stream leaders. Therefore, if a log stream of the standby tenant has no leader, you cannot query this view for the log synchronization status of this log stream. In this case, you must first troubleshoot the issue of leader absence.
Log on as the administrator to the problematic tenant or the sys tenant of the cluster to which the problematic tenant belongs.
Query the synchronization status of all log stream leaders of the problematic tenant.
Query information about the problematic tenant from the sys tenant
SELECT * FROM oceanbase.V$OB_LS_LOG_RESTORE_STATUS WHERE TENANT_ID=1xxx;Query information about the problematic tenant from the current tenant
MySQL modeOracle modeThe syntax is as follows:
SELECT * FROM oceanbase.V$OB_LS_LOG_RESTORE_STATUS;The syntax is as follows:
SELECT * FROM SYS.V$OB_LS_LOG_RESTORE_STATUS;Pay attention to values in the following columns in the query result:
SYNC_SCN: the synchronization checkpoint of the log stream.SYNC_STATUS: the synchronization status of the log stream. The valueNORMALindicates that the log synchronization of the log stream is normal. Other values indicate abnormal log synchronization. The following table describes the log synchronization states.
For more information about the
V$OB_LS_LOG_RESTORE_STATUSview, see V$OB_LS_LOG_RESTORE_STATUS.
Log synchronization state Description NORMAL Log synchronization is properly in progress. SOURCE HAS A GAP A gap exists between the logs of the standby tenant and those of the source tenant. This is because logs of the primary tenant are recycled before being synchronized to the standby tenant. STANDBY LOG NOT MATCH The restored logs conflict with those of the standby tenant. This is because two primary tenants exist or the log restore source is incorrectly configured. CHECK USER OR PASSWORD The username or password of the replication account is incorrect and therefore cannot be used to access the original primary tenant. CHECK NETWORK The primary tenant is unreachable. In this case, you need to check for network exceptions. RESTORE SUSPEND The standby tenant has been restored to the specified checkpoint. STANDBY NEED UPGRADE The binary version of the standby tenant lags behind and must be upgraded. PRIMARY TENANT DROPPED The primary tenant is dropped. FETCH LOG TIMEOUT The log synchronization timed out. In this case, you need to check the network health or increase the value of the tenant-level parameter standby_db_fetch_log_rpc_timeoutof the standby tenant, which specifies the log synchronization timeout period. For more information aboutstandby_db_fetch_log_rpc_timeout, see standby_db_fetch_log_rpc_timeout.STANDBY IN THROTTLING Write throttling is enabled for the standby tenant. STANDBY LOG DISK IS FULL The disk space of the standby tenant is insufficient. WAIT LOG STREAM CREATED Log streams are being created for the standby tenant. NOT AVAILABLE Log synchronization is unavailable due to other exceptions.
Issue 4: The synchronization checkpoint of the standby tenant stops advancing
Symptom
Issue 1, Issue 2, and Issue 3 do not exist, but the log synchronization of the standby tenant is still stuck. In this case, you can try to verify whether the synchronization checkpoint of the standby tenant stops advancing.
Possible causes
All logs are restored in a synchronized manner across log streams of the standby tenant. Therefore, all log streams will check the SYNC_SCN value of the tenant and pull logs in advance based on the specified strategy. Log synchronization stops if the SYNC_SCN value of the tenant stops advancing, and resumes until the SYNC_SCN value of the tenant continues to advance.
Troubleshooting procedure
Log on as the administrator to the problematic tenant or the sys tenant of the cluster to which the problematic tenant belongs.
Query the
DBA_OB_TENANTSview for theSYNC_SCNvalue of the tenant, which specifies the synchronization checkpoint.You can query the
SYNC_SCNvalue of the problematic tenant based on itsTENANT_IDorTENANT_NAMEvalue.Query the
SYNC_SCNvalue of the problematic tenant from the sys tenantSELECT TENANT_ID, TENANT_NAME, TENANT_ROLE, STATUS, SWITCHOVER_STATUS, SYNC_SCN, REPLAYABLE_SCN,READABLE_SCN, RECOVERY_UNTIL_SCN FROM oceanbase.DBA_OB_TENANTS;Query the
SYNC_SCNvalue of the problematic tenant from the current tenantMySQL modeOracle modeThe syntax is as follows:
SELECT TENANT_ID, TENANT_NAME, TENANT_ROLE, STATUS, SWITCHOVER_STATUS, SYNC_SCN, REPLAYABLE_SCN,READABLE_SCN, RECOVERY_UNTIL_SCN FROM oceanbase.DBA_OB_TENANTS;The syntax is as follows:
SELECT TENANT_ID, TENANT_NAME, TENANT_ROLE, STATUS, SWITCHOVER_STATUS, SYNC_SCN, REPLAYABLE_SCN,READABLE_SCN, RECOVERY_UNTIL_SCN FROM SYS.DBA_OB_TENANTS;
Pay attention to values in the following columns in the query result:
TENANT_ROLE: the role of the tenant.STANDBYindicates a standby tenant andPRIMARYindicates a primary tenant.SYNC_SCN: the synchronization checkpoint of the tenant.
Query the
V$OB_LS_LOG_RESTORE_STATUSview for theSYNC_SCNvalues of all log stream leaders.Query information about the problematic tenant from the sys tenant
SELECT * FROM oceanbase.V$OB_LS_LOG_RESTORE_STATUS WHERE TENANT_ID=1xxx;Query information about the problematic tenant from the current tenant
MySQL modeOracle modeThe syntax is as follows:
SELECT * FROM oceanbase.V$OB_LS_LOG_RESTORE_STATUS;The syntax is as follows:
SELECT * FROM SYS.V$OB_LS_LOG_RESTORE_STATUS;Pay attention to the
SYNC_SCNcolumn in the query result, which indicates the synchronization checkpoints of log streams.Compare the
SYNC_SCNvalues of all log streams with that of the tenant based on the query results. If theSYNC_SCNvalues of all log streams are greater than that of the tenant and remain unchanged, the recorded log synchronization progress of the tenant is incorrect. In this case, you need to troubleshoot this issue.For more information about the
V$OB_LS_LOG_RESTORE_STATUSview, see V$OB_LS_LOG_RESTORE_STATUS.
Issue 5: A log stream of the standby tenant has no leader
Symptom
The preceding issues do not exist, but the log synchronization of the standby tenant is still stuck. In this case, you can try to check whether a log stream of the standby tenant has no leader.
Possible causes
During log synchronization, a log stream leader in the standby tenant first synchronizes logs from the primary tenant or archive media and then synchronizes the logs to other log stream replicas. If a log stream of the standby tenant has no leader, the log stream cannot synchronize logs and the tenant-level SYNC_SCN value cannot advance. As a result, the log synchronization of all log streams is stuck.
Troubleshooting procedure
You can query the GV$OB_LOG_STAT view to check whether any log stream of the standby tenant has no leader.
Log on as the administrator to the problematic tenant or the sys tenant of the cluster to which the problematic tenant belongs.
Query the
GV$OB_LOG_STATview for the status information of all log streams of the tenant.Query information about the problematic tenant from the sys tenant
SELECT TENANT_ID, LS_ID, ROLE FROM oceanbase.GV$OB_LOG_STAT WHERE TENANT_ID=1xxx;Query information about the problematic tenant from the current tenant
MySQL modeOracle modeThe syntax is as follows:
SELECT TENANT_ID, LS_ID, ROLE FROM oceanbase.GV$OB_LOG_STAT;You can also execute the following statement to query all log streams without a leader.
SELECT DISTINCT TENANT_ID, LS_ID FROM oceanbase.GV$OB_LOG_STAT WHERE (TENANT_ID, LS_ID) NOT IN (SELECT DISTINCT TENANT_ID, LS_ID FROM oceanbase.GV$OB_LOG_STAT WHERE ROLE='LEADER');The syntax is as follows:
SELECT TENANT_ID, LS_ID, ROLE FROM SYS.GV$OB_LOG_STAT;You can also execute the following statement to query all log streams without a leader.
SELECT DISTINCT TENANT_ID, LS_ID FROM SYS.GV$OB_LOG_STAT WHERE (TENANT_ID, LS_ID) NOT IN (SELECT DISTINCT TENANT_ID, LS_ID FROM SYS.GV$OB_LOG_STAT WHERE ROLE='LEADER');
Fields in the query result are described as follows:
TENANT_ID: the ID of the tenant.LS_ID: the ID of the log stream.ROLE: the role of the log stream replica. The valueLEADERindicates that the log stream replica is a leader. The valueFOLLOWERindicates that the log stream replica is a follower.If the roles of all log stream replicas are
FOLLOWER, the log stream has no leader.
You can also execute the following statements to query all log streams without a leader.
Query information about the problematic tenant from the sys tenant
SELECT DISTINCT TENANT_ID, LS_ID FROM oceanbase.GV$OB_LOG_STAT WHERE (TENANT_ID, LS_ID) NOT IN (SELECT DISTINCT TENANT_ID, LS_ID FROM oceanbase.GV$OB_LOG_STAT WHERE ROLE='LEADER');Query information about the problematic tenant from the current tenant
MySQL modeOracle modeThe syntax is as follows:
SELECT DISTINCT TENANT_ID, LS_ID FROM oceanbase.GV$OB_LOG_STAT WHERE (TENANT_ID, LS_ID) NOT IN (SELECT DISTINCT TENANT_ID, LS_ID FROM oceanbase.GV$OB_LOG_STAT WHERE ROLE='LEADER');The syntax is as follows:
SELECT DISTINCT TENANT_ID, LS_ID FROM SYS.GV$OB_LOG_STAT WHERE (TENANT_ID, LS_ID) NOT IN (SELECT DISTINCT TENANT_ID, LS_ID FROM SYS.GV$OB_LOG_STAT WHERE ROLE='LEADER');
For more information about the
GV$OB_LOG_STATview, see GV$OB_LOG_STAT.When the query result indicates that all log streams of the standby tenant has a leader, you can check whether any log stream of the primary tenant has no leader in the same way. The leader absence of any log stream of the primary tenant will also cause log synchronization to be stuck for log streams in the standby tenant.
Issue 6: The roles of both the primary and standby tenants are STANDBY
Symptom
In a scenario without cascaded standby databases, the role of at least one of the two tenants in the primary/standby relationship must be PRIMARY. Otherwise, log synchronization between the two tenants will be stuck.
Possible causes
After you perform a switchover to change the role of the primary tenant to STANDBY, the roles of the original primary and standby tenants are both STADNBY.
Troubleshooting procedure
You can query the DBA_OB_TENANTS view to check whether the roles of the primary and standby tenants are both STANDBY.
Log on as the administrator to the primary and standby tenants respectively or to the sys tenant of the clusters where the primary and standby tenants belong.
Execute the following statements to query the roles of the primary and standby tenants.
Query the role of the primary or standby tenant from the sys tenant
SELECT TENANT_ID, TENANT_NAME, TENANT_ROLE, STATUS FROM oceanbase.DBA_OB_TENANTS;Query the role of the primary or standby tenant from the current tenant
MySQL modeOracle modeThe syntax is as follows:
SELECT TENANT_ID, TENANT_NAME, TENANT_ROLE, STATUS FROM oceanbase.DBA_OB_TENANTS;The syntax is as follows:
SELECT TENANT_ID, TENANT_NAME, TENANT_ROLE, STATUS FROM SYS.DBA_OB_TENANTS;
Fields in the query result are described as follows:
TENANT_ID: the ID of the tenant.TENANT_NAME: the name of the tenant.TENANT_ROLE: the role of the tenant. The valuePRIMARYindicates a primary tenant and the valueSTANDBYindicates a standby tenant.
Check whether the roles of the primary and standby tenants are both
STANDBYin the query result.For more information about the
DBA_OB_TENANTSview, see DBA_OB_TENANTS.