This topic describes the requirements and the preparations for a failover and impact of a failover, including selecting a failover command and obtaining cluster information.
When the primary cluster is unavailable, you can switch a standby cluster to the primary role to provide services. You can select an appropriate standby cluster and failover command based on the protection mode, protection level, and synchronization status of each standby cluster.
Select a failover command
Query the PROTECTION_MODE and PROTECTION_LEVEL fields in the V$OB_CLUSTER view of each standby cluster. The failover command varies with the following two case:
Both
PROTECTION_LEVELandPROTECTION_MODEareMAXIMUM AVAILABILITYorMAXIMUM PROTECTION.In this case, the primary and standby clusters are in SYNC mode before the primary cluster becomes unavailable, and the data in all partitions of the standby cluster is consistent and has not voids. In this case, you can run the lossless failover command to directly switch the standby cluster to the primary role without data loss.
The lossless failover command is as follows:
obclient> ALTER SYSTEM FAILOVER TO cluster_name CLUSTER_ID = cluster_id [FORCE];You can also add the
FORCEkeyword to skip the protection mode and protection level check of the standby cluster.PROTECTION_LEVELandPROTECTION_MODEare set to other values.In this case, it is impossible to determine whether the data in all partitions of the standby cluster is consistent and whether each partition has data voids. In this case, you can run the lossy failover command to switch the standby cluster to the primary role. During the failover, the data in all partitions is restored to the consistency snapshot, to ensure the integrity of the partition data before the snapshot point. After the failover is completed, the data in the system is consistent but not necessarily lossless.
The lossy failover command is as follows:
obclient> ALTER SYSTEM ACTIVATE PHYSICAL STANDBY CLUSTER [FORCE];
Requirements and impact
The requirements and impact of a failover are:
Requirements on the cluster status
Before you perform a failover, make sure that the cluster status meets the following requirements:
All OBServers in the primary cluster are unavailable, to avoid simultaneous writes to the original and new primary clusters and enable the original primary cluster to connect to the new primary cluster after a lossless failover.
All OBServers in the target standby cluster for which a lossy failover is to be performed are in the
ACTIVEstate, to ensure successful execution of the failover command.
Impact on the protection mode
After the failover, the clusters involved in the failover are automatically set to the maximum performance mode, regardless of the original protection mode. You need to reconfigure the protection mode.
Impact on indexes
During the failover, OceanBase Database automatically deletes invalid and redundant indexes. During a lossless failover, some valid indexes in the original primary cluster may be deleted from the new primary cluster. This is because indexes in the standby cluster are created asynchronously and thus inconsistent with those in the primary cluster. Therefore, even a lossless failover cannot avoid index losses.
Impact on replicas
The failover does not apply to the replicas being created. When a partition is created for a standby cluster, metadata is synchronized from the primary cluster. If the primary cluster is unavailable, the replica being created for the standby cluster may remain in the
standby restorestate.If a replica is being created, the failover may get stuck until timeout. To ensure a successful failover, we recommend that you run the following SQL command to query partition replicas in the
standby restorestate. NoticeBefore the query, make sure that the primary cluster is unavailable. Otherwise, the query result may be inaccurate.
obclient> SELECT * FROM __ALL_VIRTUAL_META_TABLE WHERE IS_RESTORE = 100;Valid values of the
IS_RESTOREfield:0: indicates a normal replica.100: indicates a replica in thestandby restorestate.
Check the query result:
If all replicas of a partition are in the
standby restorestate, the failover process skips this partition without requiring your intervention, but you need to manually process this issue after the failover is completed.If some replicas, including the leader, of a partition have been created and others are in the
standby restorestate, we recommend that you wait until the replicas in thestandby restorestate are created and then perform the failover.If minority replicas in a partition have been restored and majority replicas are in the
standby restorestate, the failover may get stuck. In this case, you need to forcibly delete all replicas in the partition so that the failover process skips this partition.
Obtain cluster information
Assume that one primary cluster and one standby cluster exist. This is only an example, and the actual environment prevails.
Obtain information about the primary cluster
obclient> SELECT CLUSTER_ID, CLUSTER_NAME, CLUSTER_ROLE FROM V$OB_CLUSTER; +------------+--------------+--------------+ | CLUSTER_ID | CLUSTER_NAME | CLUSTER_ROLE | +------------+--------------+--------------+ | 1 | obcluster | PRIMARY | +------------+--------------+--------------+ 1 row in setObtain information about the standby cluster
obclient> SELECT CLUSTER_ID, CLUSTER_NAME, CLUSTER_ROLE FROM V$OB_CLUSTER; +------------+--------------+------------------+ | CLUSTER_ID | CLUSTER_NAME | CLUSTER_ROLE | +------------+--------------+------------------+ | 2 | obcluster | PHYSICAL STANDBY | +------------+--------------+------------------+ 1 row in set