Perform a failover between primary and standby OceanBase clusters for disaster recovery in a scenario with two IDCs|V4.3.0| docs|Distributed Database

Perform a failover between primary and standby OceanBase clusters for disaster recovery in a scenario with two IDCs

Last Updated：2024-07-30 09:31:29 Updated

This topic describes how to perform a failover between primary and standby OceanBase clusters for disaster recovery in a scenario with two IDCs.

Scenarios

When the IDC that hosts the primary OceanBase cluster fails due to, for example, power outage, you can convert a standby cluster in the other IDC to the primary cluster to provide services. This way, your business can be recovered soon.

Prerequisites

The current cluster is a standby cluster and is in Running state.
The primary cluster corresponding to the current cluster is stopped or unavailable, and all OBServer nodes of the primary cluster are inactive.
The current cluster is in Maximum Performance mode.
The version of the current cluster is OceanBase Database V2.2.76 or later and earlier than V4.0.

Procedure

Log on to the follower OceanBase Cloud Platform (OCP) cluster and click Clusters in the left-side navigation pane. On the Clusters page, click the name of a standby cluster.
On the page that appears, click the More icon in the upper-right corner and select Switch to Primary Cluster: Disaster Recovery.
In the dialog box that appears, click Switching Disaster Recovery.
Wait for the Failover ob cluster task to complete. Then, the standby cluster provides services as the new primary cluster.

Note

Standby clusters are not fully synchronized with the primary cluster. Therefore, the disaster recovery failover involves data loss. We recommend that you select the standby cluster with the smallest synchronization latency. You can go to the overview page of a standby cluster and view the synchronization latency value of the cluster in the Synchronization Status field in the basic information section.

After the disaster recovery failover is completed, the original primary cluster is discarded as a separate cluster instead of becoming a standby cluster. We recommend that you delete the discarded cluster from OCP.

After the disaster recovery failover is completed, data cannot be synchronized between the new primary cluster and other standby clusters of the original primary cluster. We recommend that you delete other standby clusters of the original primary cluster from OCP.

FAQ

What can I do if the disaster recovery failover fails and the error message prompts that an active OBServer node exists?

Before you perform a disaster recovery failover, check whether all OBServer nodes in the primary cluster are inactive. If not, stop the observer process on active OBServer nodes in the primary cluster.

What can I do to accelerate the failover of a standby cluster if I use OCP earlier than V3.3.0?

You can manually modify the following two hidden parameters and restore their original values after the failover is completed:

_mini_merge_concurrency: specifies the concurrency of a minor compaction. The default value is 3. We recommend that you change it to 16.
_ob_minor_merge_schedule_interval: specifies the interval for scheduling minor compactions. The default value is 20s. We recommend that you change it to 3s.

# View parameter values
SQL> SELECT NAME, VALUE FROM __ALL_VIRTUAL_SYS_PARAMETER_STAT WHERE NAME = '_mini_merge_concurrency';
+-------------------------+-------+
| NAME                    | VALUE |
+-------------------------+-------+
| _mini_merge_concurrency | 3     |
+-------------------------+-------+
1 row in set
   
SQL> SELECT NAME, VALUE FROM __ALL_VIRTUAL_SYS_PARAMETER_STAT WHERE NAME = '_ob_minor_merge_schedule_interval';
+-----------------------------------+-------+
| NAME                              | VALUE |
+-----------------------------------+-------+
| _ob_minor_merge_schedule_interval | 20s   |
+-----------------------------------+-------+
1 row in set

# Change parameter values
SQL> ALTER SYSTEM SET _mini_merge_concurrency = 16;
Query OK, 0 rows affected
   
SQL> ALTER SYSTEM SET _ob_minor_merge_schedule_interval = '3s';
Query OK, 0 rows affected

Note

OCP V3.3.0 and later automatically change the values of the preceding two parameters to accelerate the failover of a standby cluster.

How long does a failover last?

The speed of a failover is related to the number of zones, OBServer nodes in a zone, and partitions. In most cases, a failover takes 30 seconds to 25 minutes to complete.

After a failover, how long does it take for my business to connect to the new primary cluster by using OBProxy?

After OCP completes a failover task, it takes 20 seconds to 1 minute for your business to connect to the new primary cluster by using OBProxy.