Preparations for a failover|V3.2.4|OceanBase Database| docs|Distributed Database

Preparations for a failover

Last Updated：2023-10-27 09:57:43 Updated

This topic describes the requirements and the preparations for a failover and the impact of a failover, including selecting a failover statement and obtaining the cluster information.

When the primary cluster is unavailable, you can switch a standby cluster to the primary role to provide services. You can select an appropriate standby cluster and a failover statement based on the protection mode, protection level, and synchronization status of each standby cluster.

Select a failover statement

When the primary cluster becomes unavailable, you can use the failover feature to switch a standby cluster to the primary role. Failover operations are divided into lossless failover and lossy failover. You must select an appropriate failover statement to perform the failover.

Query the PROTECTION_MODE and PROTECTION_LEVEL fields in the V$OB_CLUSTER view of each standby cluster. The failover statement varies based on the following two cases:

Both PROTECTION_LEVEL and PROTECTION_MODE are MAXIMUM AVAILABILITY or MAXIMUM PROTECTION.

In this case, the standby cluster is in MAXIMUM AVAILABILITY and MAXIMUM PROTECTION mode. The primary cluster and the standby cluster are in SYNC mode before the primary cluster becomes unavailable. The data in all partitions of the standby cluster is consistent and has no data voids. In this case, you can execute the lossless failover statement to directly switch the standby cluster to the primary role without data loss.

The following example shows the syntax of a lossless failover statement:
```
obclient> ALTER SYSTEM FAILOVER TO cluster_name CLUSTER_ID = cluster_id [FORCE];
```
You can also add the FORCE keyword to skip the protection mode and protection level check of the standby cluster.
PROTECTION_LEVEL and PROTECTION_MODE are set to other values.

In this case, it is impossible to determine whether the data in all partitions of the standby cluster is consistent and whether each partition has data voids. In this case, you can execute the lossy failover statement to switch the standby cluster to the primary role. During the failover, the data in all partitions is restored to the consistency snapshot to ensure the integrity of the partition data before the snapshot point. After the failover is completed, the data in the system is consistent but not necessarily lossless.

The following example shows the syntax of a lossy failover statement:
```
obclient> ALTER SYSTEM ACTIVATE PHYSICAL STANDBY CLUSTER [FORCE];
```

Requirements and impact

The requirements and impact of a failover are:

Requirements on the cluster status

Before you perform a failover, make sure that the cluster status meets the following requirements:
- All OBServer nodes in the primary cluster are unavailable. This avoids simultaneous writes to the original and new primary clusters and allows the original primary cluster to connect to the new primary cluster after a lossless failover.
- All OBServer nodes in the target standby cluster for which a lossy failover is to be performed are in the ACTIVE state to ensure the successful execution of the failover statement.
  
  If all the inactive OBServer nodes in the standby cluster are permanently offline, you can specify the FORCE option in the failover statement to force a lossy failover. Syntax:
```
obclient> ALTER SYSTEM ACTIVATE PHYSICAL STANDBY CLUSTER FORCE;
```
Impact on the protection mode

After the failover, the clusters involved in the failover are automatically set to the MAXIMUM PERFORMANCE mode, regardless of the original protection mode. You need to reconfigure the protection mode. For more information about the operation of protection mode switching, see Switch the protection mode.
Impact on indexes

During the failover, OceanBase Database automatically deletes invalid and redundant indexes. During a lossless failover, some valid indexes in the original primary cluster may be deleted from the new primary cluster. This is because indexes in the standby cluster are created asynchronously and thus inconsistent with those in the primary cluster. Therefore, even a lossless failover cannot avoid index losses.
Impact on replicas

The failover does not apply to the replicas being created. When a partition is created for a standby cluster, metadata is synchronized from the primary cluster. If the primary cluster is unavailable, the replica being created for the standby cluster may remain in the standby restore state.

If a replica is being created, the failover may get stuck until timeout. To ensure a successful failover, we recommend that you execute the following SQL statement to query partition replicas in the standby restore state.

Notice

Before the query, make sure that the primary cluster is unavailable. Otherwise, the query result may be inaccurate.
```
obclient> SELECT * FROM oceanbase.__ALL_VIRTUAL_META_TABLE WHERE IS_RESTORE = 100;
```
Valid values of the IS_RESTORE field:
- 0: indicates a normal replica.
- 100: indicates a replica in the standby restore state.
Check the query result:
- If all replicas of a partition are in the standby restore state, the failover process skips this partition without requiring your intervention, but you need to manually process this issue after the failover is completed.
- If some replicas, including the leader, of a partition have been created and others are in the standby restore state, we recommend that you wait until the replicas in the standby restore state are created and then perform the failover. During this process, the leader replica must be normal.
- If minority replicas in a partition have been restored and majority replicas are in the standby restore state, the failover may get stuck. In this case, you need to forcibly delete all replicas in the partition so that the failover process skips this partition.

Obtain the cluster information

Assume that you have two clusters, one primary cluster and one standby cluster. Perform the following steps to obtain the cluster information:

Obtain information about the primary cluster

obclient> SELECT CLUSTER_ID, CLUSTER_NAME, CLUSTER_ROLE FROM V$OB_CLUSTER;
+------------+--------------+--------------+
| CLUSTER_ID | CLUSTER_NAME | CLUSTER_ROLE |
+------------+--------------+--------------+
|          1 | obcluster    | PRIMARY      |
+------------+--------------+--------------+
1 row in set

Obtain information about the standby cluster

obclient> SELECT CLUSTER_ID, CLUSTER_NAME, CLUSTER_ROLE FROM V$OB_CLUSTER;
+------------+--------------+------------------+
| CLUSTER_ID | CLUSTER_NAME | CLUSTER_ROLE     |
+------------+--------------+------------------+
| 2          | obcluster    | PHYSICAL STANDBY |
+------------+--------------+------------------+
1 row in set

CLUSTER_ID: the ID of the cluster.
CLUSTER_NAME: the name of the cluster.
CLUSTER_ROLE: the role of the cluster. PRIMARY indicates the primary cluster, and PHYSICAL STANDBY indicates a standby cluster.