Perform a lossy failover|V3.2.4|OceanBase Database| docs|Distributed Database

Perform a lossy failover

Last Updated：2023-10-27 09:57:43 Updated

This topic describes how to perform a lossy failover.

Procedure

Log on to the sys tenant as the root user.
Make sure that the primary cluster is unavailable, which means that all OBServer nodes in the primary cluster are inactive.

To ensure data security, only one primary cluster can be deployed in the primary/standby cluster configuration. If the primary cluster is available, an error will be returned when you execute the failover statement on the standby cluster. Therefore, to ensure the successful execution of the failover statement, make sure that the primary cluster is unavailable before the failover.
Query the status of each standby cluster and select an appropriate standby cluster as a new primary cluster.
1. Query the V$OB_CLUSTER view of the standby cluster to obtain the cluster information.
```
obclient> SELECT CLUSTER_ROLE, PROTECTION_MODE, PROTECTION_LEVEL, CURRENT_SCN FROM V$OB_CLUSTER;
+------------------+---------------------+---------------------+------------------+
| CLUSTER_ROLE     | PROTECTION_MODE     | PROTECTION_LEVEL    | CURRENT_SCN      |
+------------------+---------------------+---------------------+------------------+
| PHYSICAL STANDBY | MAXIMUM PERFORMANCE | MAXIMUM PERFORMANCE | 1613813589631620 |
+------------------+---------------------+---------------------+------------------+
1 row in set
```
  Parameters in the return result:
  - CLUSTER_ROLE: indicates the role of the current cluster. For a standby cluster, the value is PHYSICAL STANDBY.
  - PROTECTION_MODE: indicates the protection mode.
  - PROTECTION_LEVEL: indicates the protection level.
  - CURRENT_SCN: indicates the synchronization progress. The value of this field is displayed only for a standby cluster.
  Distinguish the protection mode and protection level:
  - When the values of the PROTECTION_MODE and PROTECTION_LEVEL fields of the cluster are both MAXIMUM AVAILABILITY or MAXIMUM PROTECTION, the execution result of the lossy failover statement is the same as that of the lossless failover statement. All partition data is considered consistent, and no rollback will be performed. For more information, see Protection modes and protection levels.
  - When the values of the PROTECTION_MODE and PROTECTION_LEVEL fields are returned in any other combinations, all partition data will be rolled back to the consistent state by tenant after a lossy failover.
    
    CURRENT_SCN indicates the minimum synchronization progress among all tenants. A smaller value of CURRENT_SCN indicates more serious data loss after the failover. When CURRENT_SCN is 0 or 1, some tenants in the standby cluster are still being created, and the tenant data is incomplete. After the failover, the tenant data may be cleared.
2. Query the V$OB_CLUSTER_STATS view of the standby cluster to check the synchronization progress of each tenant.
```
obclient> SELECT TENANT_ID, MIN_SYS_TABLE_SCN, MIN_USER_TABLE_SCN FROM V$OB_CLUSTER_STATS;
+-----------+-------------------+--------------------+
| TENANT_ID | MIN_SYS_TABLE_SCN | MIN_USER_TABLE_SCN |
+-----------+-------------------+--------------------+
|         1 |  1613813707942627 |   1613813707942627 |
|      1001 |  1613813589631620 |   1613813589631620 |
|      1002 |  1613813589631620 |   1613813589631620 |
+-----------+-------------------+--------------------+
3 rows in set
```
  Parameters in the return result:
  - TENANT_ID: indicates the tenant ID. If the returned value is `1```, the tenant is the sys tenant. Otherwise, it is a user tenant.
  - MIN_SYS_TABLE_SCN: indicates the minimum synchronization progress of the system tables.
  - MIN_USER_TABLE_SCN: indicates the minimum synchronization progress of the user tables.
  The sys tenant of the primary cluster is independent of the sys tenant of the standby cluster. The sys tenants of these clusters are not physically synchronized. A lossy failover does not roll back data of the sys tenants.
  
  User tenants of the primary and standby clusters are physically synchronized, and the synchronization progress varies in different partitions. A lossy failover rolls back data of the user tenants:
  - Rollback point for all system table partitions: MIN_SYS_TABLE_SCN
  - Rollback point for all user table partitions: MIN_USER_TABLE_SCN or MIN_SYS_TABLE_SCN, whichever is smaller
Select an appropriate standby cluster, log on to the standby cluster, and then modify the related parameters to speed up the failover.

You can modify the following parameters:
- _mini_merge_concurrency: specifies the concurrency of a minor compaction. The default value is 3. You can change it to 16.
- _ob_minor_merge_schedule_interval: specifies the interval for scheduling minor compactions. The default value is 20s. We recommend that you change it to 3s.
- Concurrency-related parameters for load balancing:
  - data_copy_concurrency: specifies the maximum number of concurrent data migration and replication tasks allowed in the system. The default value is 20. We recommend that you change it to 600.
    
    For more information about the parameter, see data_copy_concurrency.
  - server_data_copy_out_concurrency: specifies the maximum number of concurrent tasks of migrating data out of a single node in the system. The default value is 2. We recommend that you change it to 300.
    
    For more information about the parameter, see server_data_copy_out_concurrency.
  - server_data_copy_in_concurrency: specifies the maximum number of concurrent tasks of migrating data into a single node in the system. The default value is 2. We recommend that you change it to 300.
    
    For more information about the parameter, see server_data_copy_in_concurrency.
Procedure:
1. Check the values of the parameters.
  
  We recommend that you record the original values of the parameters so that you can restore them after the failover.
```
obclient> SELECT NAME,VALUE FROM __ALL_VIRTUAL_SYS_PARAMETER_STAT WHERE NAME = '_mini_merge_concurrency';
+--------------------------+-------+
|  NAME      | VALUE |
+-----------------------------+----+
| _mini_merge_concurrency  | 3     |
+--------------------------+-------+
1 row in set

obclient> SELECT NAME,VALUE FROM __ALL_VIRTUAL_SYS_PARAMETER_STAT WHERE NAME ='_ob_minor_merge_schedule_interval';
+-----------------------------------+-------+
| NAME  | VALUE |
+-----------------------------------+-------+
| _ob_minor_merge_schedule_interval |   20s |
+-----------------------------------+-------+
1 row in set

obclient> SHOW PARAMETERS LIKE '%data_copy%';
```
2. Change the values of the parameters.
```
obclient> ALTER SYSTEM SET _mini_merge_concurrency = 16;
Query OK, 0 rows affected

obclient> ALTER SYSTEM SET _ob_minor_merge_schedule_interval = '3s';
Query OK, 0 rows affected

obclient> ALTER SYSTEM SET data_copy_concurrency = 600;
Query OK, 0 rows affected

obclient> ALTER SYSTEM SET server_data_copy_out_concurrency = 300;
Query OK, 0 rows affected

obclient> ALTER SYSTEM SET server_data_copy_in_concurrency = 300;
Query OK, 0 rows affected
```
Perform the following steps to switch the standby cluster to the primary role:
1. Specify the system variable ob_query_timeout to set the timeout period for SQL statement execution in microseconds. Default value: 10000000, which means 10 seconds. For more information about the system variable ob_query_timeout, see ob_query_timeout.
  
  Note
  
  This operation is optional. You can adjust the timeout period as required. We recommend that you set it to 100000000, which means 100s.
```
obclient> SET ob_query_timeout = 100000000;
Query OK, 0 rows affected
```
2. Execute the failover statement.
```
obclient> ALTER SYSTEM ACTIVATE PHYSICAL STANDBY CLUSTER;
Query OK, 0 rows affected
```
3. If the following error message is returned, some OBServer nodes are offline. To execute a failover statement, all OBServer nodes must be online. Check whether all inactive OBServer nodes in the standby cluster are permanently offline. If yes, specify the FORCE option in the failover statement to force a lossy failover.
  
  Error message:
```
Failover is not allowed when cluster has permanent offline server. Make sure all inactive servers are permanent offline, and use FORCE option to skip this check
```
  Perform the following steps to check for permanently offline OBServer nodes:
  1. Execute the following statement to check the value of the server_permanent_offline_time parameter:
```
obclient> SHOW PARAMETERS LIKE "%server_permanent_offline_time%";
```
    The server_permanent_offline_time parameter specifies the period allowed for missing a heartbeat. If no heartbeat from a server is received after this period, the server is considered permanently offline. For more information about the server_permanent_offline_time parameter, see server_permanent_offline_time.
  2. Execute the following statement to check the value of the last_offline_time parameter for an OBServer node. You can verify whether an OBServer node is permanently offline based on the values of the last_offline_time and server_permanent_offline_time parameters.
```
obclient> SELECT svr_ip, svr_port, zone, status, last_offline_time FROM oceanbase.__all_server;
```
    If inactive OBServer nodes in the standby cluster are permanently offline, execute the following statement to force a lossy failover:
```
obclient> ALTER SYSTEM ACTIVATE PHYSICAL STANDBY CLUSTER FORCE;
```

Query the V$OB_CLUSTER_EVENT_HISTORY view to check the failover progress.

The failover operation restores the data of all partitions to the consistent snapshot point. This ensures that the data of all partitions before the snapshot point is complete. During this process, the replica data is pruned. The duration of the failover varies with the number of replicas in the cluster.

A failover is executed in the following stages:

failover to primary. flag:"start": The failover starts.
Flash back system tables:
1. prepare partitions for flashback: The system is preparing for a flashback.
2. prepare partitions for flashback end: The flashback preparation is completed.
3. wait flashback info dump: The minor compaction of system tables is pending.
4. wait partitions ready for flashback: The flashback of system table partitions is pending.
5. wait partitions ready for flashback end: The flashback of system table partitions can be executed.
6. do flashback partitions: The flashback of system table partitions is in progress.
7. do flashback partitions end: The flashback of system table partitions is completed.
Flash back user tables:
1. prepare partitions for flashback: The system is preparing for a flashback.
2. prepare partitions for flashback end: The flashback preparation is completed.
3. wait flashback info dump: The minor compaction of user tables is pending.
4. wait partitions ready for flashback: The flashback of user table partitions is pending.
5. wait partitions ready for flashback end: The flashback of user table partitions can be executed.
6. do flashback partitions: The flashback of user table partitions is in progress.
7. do flashback partitions end: The flashback of user table partitions is completed.
flashback end: The flashback phase is completed.

failover to primary: flag:"finish": The failover is completed.

To query the failover progress, perform the following steps:

Query the trace ID of the failover statement execution.

obclient> SELECT FACILITY, TIMESTAMP, MESSAGE FROM V$OB_CLUSTER_EVENT_HISTORY WHERE FACILITY LIKE '%failover to primary%';

View the execution process of the failover statement by using the obtained trace ID.

obclient> SELECT FACILITY, TIMESTAMP FROM V$OB_CLUSTER_EVENT_HISTORY WHERE MESSAGE LIKE '%trace_id:XXXXX%';

Example:

obclient> SELECT FACILITY, TIMESTAMP, MESSAGE FROM V$OB_CLUSTER_EVENT_HISTORY WHERE FACILITY LIKE '%failover to primary%';
+---------------------+----------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| FACILITY            | TIMESTAMP    | MESSAGE              |
+---------------------+----------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| failover to primary | 2021-09-28 17:39:01.233870 | flag:"start", ddl_stmt_str:"ALTER SYSTEM ACTIVATE PHYSICAL STANDBY CLUSTER", trace_id:Y39B7644564D0-0005CD0AF66D9CBA   |
| failover to primary | 2021-09-28 17:39:38.574946 | flag:"finish", failover#:1632821941446656, cost:37341082, ddl_stmt_str:"ALTER SYSTEM ACTIVATE PHYSICAL STANDBY CLUSTER", trace_id:Y39B7644564D0-0005CD0AF66D9CBA |
+---------------------+----------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set

obclient> SELECT FACILITY, TIMESTAMP FROM V$OB_CLUSTER_EVENT_HISTORY WHERE MESSAGE LIKE '%trace_id:Y39B7644564D0-0005CD0AF66D9CBA%';
+-----------------------------------------+----------------------------+
| FACILITY    | TIMESTAMP    |
+-----------------------------------------+----------------------------+
| failover to primary       | 2021-09-28 17:39:01.233870 |
| switch cluster            | 2021-09-28 17:39:01.528406 |
| prepare partitions for flashback        | 2021-09-28 17:39:02.565025 |
| prepare partitions for flashback end    | 2021-09-28 17:39:04.078347 |
| wait flashback info dump  | 2021-09-28 17:39:11.085528 |
| wait partitions ready for flashback end | 2021-09-28 17:39:11.130451 |
| switch cluster            | 2021-09-28 17:39:11.133681 |
| do flashback partitions   | 2021-09-28 17:39:11.197417 |
| do flashback partitions end             | 2021-09-28 17:39:21.897984 |
| switch cluster            | 2021-09-28 17:39:21.901060 |
| switch cluster            | 2021-09-28 17:39:22.174017 |
| prepare partitions for flashback end    | 2021-09-28 17:39:23.590653 |
| wait flashback info dump  | 2021-09-28 17:39:31.490610 |
| wait partitions ready for flashback end | 2021-09-28 17:39:31.535690 |
| switch cluster            | 2021-09-28 17:39:31.538925 |
| do flashback partitions end             | 2021-09-28 17:39:37.279887 |
| flashback end             | 2021-09-28 17:39:37.279934 |
| switch cluster            | 2021-09-28 17:39:37.283146 |
| switch cluster            | 2021-09-28 17:39:38.554347 |
| switch cluster            | 2021-09-28 17:39:38.573346 |
| failover to primary       | 2021-09-28 17:39:38.574946 |
+-----------------------------------------+----------------------------+
21 rows in set

Query the V$OB_CLUSTER and V$OB_CLUSTER_FAILOVER_INFO views to check the cluster status and determine the system change numbers (SCNs) of the failover.

The V$OB_CLUSTER_FAILOVER_INFO view records the failover data of each tenant in each failover. SYS_TABLE_SCN indicates the failover SCN of system tables. USER_TABLE_SCN indicates the failover SCN of user tables. The failover SCN of the sys tenant is meaningless.

obclient> SELECT CLUSTER_ROLE, STANDBY_BECAME_PRIMARY_SCN FROM V$OB_CLUSTER;
+--------------+----------------------------+
| CLUSTER_ROLE | STANDBY_BECAME_PRIMARY_SCN |
+--------------+----------------------------+
| PRIMARY      |           1613813589631620 |
+--------------+----------------------------+
1 row in set

obclient> SELECT 'FAILOVER#', TENANT_ID, SYS_TABLE_SCN, USER_TABLE_SCN FROM V$OB_CLUSTER_FAILOVER_INFO;
+------------------+-----------+------------------+------------------+
|    FAILOVER#     | TENANT_ID |    SYS_TABLE_SCN | USER_TABLE_SCN   |
+------------------+-----------+------------------+------------------+
| 1613813770317824 |         1 | 1613813772434321 | 1613813772434321 |
| 1613813770317824 |      1001 | 1613813589631620 | 1613813589631620 |
| 1613813770317824 |      1002 | 1613813589631620 | 1613813589631620 |
+------------------+-----------+------------------+------------------+
3 rows in set

After the failover, the role of the standby cluster changes from PHYSICAL STANDBY to PRIMARY. STANDBY_BECAME_PRIMARY_SCN indicates the minimum failover SCN among all tenants. Data with a version earlier than or equal to this SCN is consistent with that in the original primary cluster.

Restore the original values of the parameters.

Notice

If the parameters are modified before the failover, restore their original values after the failover to ensure proper operation of the cluster.

obclient> ALTER SYSTEM SET _mini_merge_concurrency = 3;
Query OK, 0 rows affected

obclient> ALTER SYSTEM SET _ob_minor_merge_schedule_interval = '20s';
Query OK, 0 rows affected

obclient> ALTER SYSTEM SET data_copy_concurrency = 20;
Query OK, 0 rows affected

obclient> ALTER SYSTEM SET server_data_copy_out_concurrency = 2;
Query OK, 0 rows affected

obclient> ALTER SYSTEM SET server_data_copy_in_concurrency = 2;
Query OK, 0 rows affected

Delete permanently offline OBServer nodes.

After you perform the lossy failover by adding the FORCE keyword, you need to execute the following statement to delete the permanently offline OBServer nodes in the standby cluster:
```
obclient> ALTER SYSTEM DELETE SERVER 'IP:PORT';
```
Query the oceanbase.__all_server table to check whether the permanently offline OBServer nodes are deleted.
```
obclient> SELECT * FROM oceanbase.__all_server ;
```
If the query results do not contain the record of the OBServer nodes, the OBServer node are deleted.

What to do next

After a lossy failover, data is inconsistent between the new primary cluster and the original primary cluster and between the new primary cluster and other standby clusters. In this case, you cannot connect the original primary cluster and other standby clusters to the new primary cluster. You need to rebuild the primary/standby relationship.

For more information about the operation of rebuilding the primary/standby relationship, see Restore other clusters after a lossy failover.