This topic describes the requirements and the preparations for a failover and the impact of a failover, including selecting a failover statement and obtaining the cluster information.
When the primary cluster is unavailable, you can switch a standby cluster to the primary role to provide services. You can select an appropriate standby cluster and a failover statement based on the protection mode, protection level, and synchronization status of each standby cluster.
Select a failover statement
When the primary cluster becomes unavailable, you can use the failover feature to switch a standby cluster to the primary role. Failover operations are divided into lossless failover and lossy failover. You must select an appropriate failover statement to perform the failover.
Query the PROTECTION_MODE and PROTECTION_LEVEL fields in the V$OB_CLUSTER view of each standby cluster. The failover statement varies based on the following two cases:
Both
PROTECTION_LEVELandPROTECTION_MODEareMAXIMUM AVAILABILITYorMAXIMUM PROTECTION.In this case, the standby cluster is in MAXIMUM AVAILABILITY and MAXIMUM PROTECTION mode. The primary cluster and the standby cluster are in SYNC mode before the primary cluster becomes unavailable. The data in all partitions of the standby cluster is consistent and has no data voids. In this case, you can execute the lossless failover statement to directly switch the standby cluster to the primary role without data loss.
The following example shows the syntax of a lossless failover statement:
obclient> ALTER SYSTEM FAILOVER TO cluster_name CLUSTER_ID = cluster_id [FORCE];You can also add the
FORCEkeyword to skip the protection mode and protection level check of the standby cluster.PROTECTION_LEVELandPROTECTION_MODEare set to other values.In this case, it is impossible to determine whether the data in all partitions of the standby cluster is consistent and whether each partition has data voids. In this case, you can execute the lossy failover statement to switch the standby cluster to the primary role. During the failover, the data in all partitions is restored to the consistency snapshot to ensure the integrity of the partition data before the snapshot point. After the failover is completed, the data in the system is consistent but not necessarily lossless.
The following example shows the syntax of a lossy failover statement:
obclient> ALTER SYSTEM ACTIVATE PHYSICAL STANDBY CLUSTER [FORCE];
Requirements and impact
The requirements and impact of a failover are:
Requirements on the cluster status
Before you perform a failover, make sure that the cluster status meets the following requirements:
All OBServer nodes in the primary cluster are unavailable. This avoids simultaneous writes to the original and new primary clusters and allows the original primary cluster to connect to the new primary cluster after a lossless failover.
All OBServer nodes in the target standby cluster for which a lossy failover is to be performed are in the
ACTIVEstate to ensure the successful execution of the failover statement.If all the
inactiveOBServer nodes in the standby cluster are permanently offline, you can specify theFORCEoption in the failover statement to force a lossy failover. Syntax:obclient> ALTER SYSTEM ACTIVATE PHYSICAL STANDBY CLUSTER FORCE;
Impact on the protection mode
After the failover, the clusters involved in the failover are automatically set to the MAXIMUM PERFORMANCE mode, regardless of the original protection mode. You need to reconfigure the protection mode. For more information about the operation of protection mode switching, see Switch the protection mode.
Impact on indexes
During the failover, OceanBase Database automatically deletes invalid and redundant indexes. During a lossless failover, some valid indexes in the original primary cluster may be deleted from the new primary cluster. This is because indexes in the standby cluster are created asynchronously and thus inconsistent with those in the primary cluster. Therefore, even a lossless failover cannot avoid index losses.
Impact on replicas
The failover does not apply to the replicas being created. When a partition is created for a standby cluster, metadata is synchronized from the primary cluster. If the primary cluster is unavailable, the replica being created for the standby cluster may remain in the
standby restorestate.If a replica is being created, the failover may get stuck until timeout. To ensure a successful failover, we recommend that you execute the following SQL statement to query partition replicas in the
standby restorestate.Notice
Before the query, make sure that the primary cluster is unavailable. Otherwise, the query result may be inaccurate.
obclient> SELECT * FROM oceanbase.__ALL_VIRTUAL_META_TABLE WHERE IS_RESTORE = 100;Valid values of the
IS_RESTOREfield:0: indicates a normal replica.100: indicates a replica in thestandby restorestate.
Check the query result:
If all replicas of a partition are in the
standby restorestate, the failover process skips this partition without requiring your intervention, but you need to manually process this issue after the failover is completed.If some replicas, including the leader, of a partition have been created and others are in the
standby restorestate, we recommend that you wait until the replicas in thestandby restorestate are created and then perform the failover. During this process, the leader replica must be normal.If minority replicas in a partition have been restored and majority replicas are in the
standby restorestate, the failover may get stuck. In this case, you need to forcibly delete all replicas in the partition so that the failover process skips this partition.
Obtain the cluster information
Assume that you have two clusters, one primary cluster and one standby cluster. Perform the following steps to obtain the cluster information:
Obtain information about the primary cluster
obclient> SELECT CLUSTER_ID, CLUSTER_NAME, CLUSTER_ROLE FROM V$OB_CLUSTER; +------------+--------------+--------------+ | CLUSTER_ID | CLUSTER_NAME | CLUSTER_ROLE | +------------+--------------+--------------+ | 1 | obcluster | PRIMARY | +------------+--------------+--------------+ 1 row in setObtain information about the standby cluster
obclient> SELECT CLUSTER_ID, CLUSTER_NAME, CLUSTER_ROLE FROM V$OB_CLUSTER; +------------+--------------+------------------+ | CLUSTER_ID | CLUSTER_NAME | CLUSTER_ROLE | +------------+--------------+------------------+ | 2 | obcluster | PHYSICAL STANDBY | +------------+--------------+------------------+ 1 row in setCLUSTER_ID: the ID of the cluster.CLUSTER_NAME: the name of the cluster.CLUSTER_ROLE: the role of the cluster.PRIMARYindicates the primary cluster, andPHYSICAL STANDBYindicates a standby cluster.