This topic describes how to perform a lossy failover.
Procedure
Log on to the
systenant as therootuser.Make sure that the primary cluster is unavailable, which means that all OBServer nodes in the primary cluster are inactive.
To ensure data security, only one primary cluster can be deployed in the primary/standby cluster configuration. If the primary cluster is available, an error will be returned when you execute the failover statement on the standby cluster. Therefore, to ensure the successful execution of the failover statement, make sure that the primary cluster is unavailable before the failover.
Query the status of each standby cluster and select an appropriate standby cluster as a new primary cluster.
Query the
V$OB_CLUSTERview of the standby cluster to obtain the cluster information.obclient> SELECT CLUSTER_ROLE, PROTECTION_MODE, PROTECTION_LEVEL, CURRENT_SCN FROM V$OB_CLUSTER; +------------------+---------------------+---------------------+------------------+ | CLUSTER_ROLE | PROTECTION_MODE | PROTECTION_LEVEL | CURRENT_SCN | +------------------+---------------------+---------------------+------------------+ | PHYSICAL STANDBY | MAXIMUM PERFORMANCE | MAXIMUM PERFORMANCE | 1613813589631620 | +------------------+---------------------+---------------------+------------------+ 1 row in setParameters in the return result:
CLUSTER_ROLE: indicates the role of the current cluster. For a standby cluster, the value isPHYSICAL STANDBY.PROTECTION_MODE: indicates the protection mode.PROTECTION_LEVEL: indicates the protection level.CURRENT_SCN: indicates the synchronization progress. The value of this field is displayed only for a standby cluster.
Distinguish the protection mode and protection level:
When the values of the
PROTECTION_MODEandPROTECTION_LEVELfields of the cluster are bothMAXIMUM AVAILABILITYorMAXIMUM PROTECTION, the execution result of the lossy failover statement is the same as that of the lossless failover statement. All partition data is considered consistent, and no rollback will be performed. For more information, see Protection modes and protection levels.When the values of the
PROTECTION_MODEandPROTECTION_LEVELfields are returned in any other combinations, all partition data will be rolled back to the consistent state by tenant after a lossy failover.CURRENT_SCNindicates the minimum synchronization progress among all tenants. A smaller value ofCURRENT_SCNindicates more serious data loss after the failover. WhenCURRENT_SCNis0or1, some tenants in the standby cluster are still being created, and the tenant data is incomplete. After the failover, the tenant data may be cleared.
Query the
V$OB_CLUSTER_STATSview of the standby cluster to check the synchronization progress of each tenant.obclient> SELECT TENANT_ID, MIN_SYS_TABLE_SCN, MIN_USER_TABLE_SCN FROM V$OB_CLUSTER_STATS; +-----------+-------------------+--------------------+ | TENANT_ID | MIN_SYS_TABLE_SCN | MIN_USER_TABLE_SCN | +-----------+-------------------+--------------------+ | 1 | 1613813707942627 | 1613813707942627 | | 1001 | 1613813589631620 | 1613813589631620 | | 1002 | 1613813589631620 | 1613813589631620 | +-----------+-------------------+--------------------+ 3 rows in setParameters in the return result:
TENANT_ID: indicates the tenant ID. If the returned value is `1```, the tenant is the sys tenant. Otherwise, it is a user tenant.MIN_SYS_TABLE_SCN: indicates the minimum synchronization progress of the system tables.MIN_USER_TABLE_SCN: indicates the minimum synchronization progress of the user tables.
The sys tenant of the primary cluster is independent of the sys tenant of the standby cluster. The sys tenants of these clusters are not physically synchronized. A lossy failover does not roll back data of the sys tenants.
User tenants of the primary and standby clusters are physically synchronized, and the synchronization progress varies in different partitions. A lossy failover rolls back data of the user tenants:
Rollback point for all system table partitions:
MIN_SYS_TABLE_SCNRollback point for all user table partitions:
MIN_USER_TABLE_SCNorMIN_SYS_TABLE_SCN, whichever is smaller
Select an appropriate standby cluster, log on to the standby cluster, and then modify the related parameters to speed up the failover.
You can modify the following parameters:
_mini_merge_concurrency: specifies the concurrency of a minor compaction. The default value is3. You can change it to16._ob_minor_merge_schedule_interval: specifies the interval for scheduling minor compactions. The default value is20s. We recommend that you change it to3s.Concurrency-related parameters for load balancing:
data_copy_concurrency: specifies the maximum number of concurrent data migration and replication tasks allowed in the system. The default value is20. We recommend that you change it to600.For more information about the parameter, see data_copy_concurrency.
server_data_copy_out_concurrency: specifies the maximum number of concurrent tasks of migrating data out of a single node in the system. The default value is2. We recommend that you change it to300.For more information about the parameter, see server_data_copy_out_concurrency.
server_data_copy_in_concurrency: specifies the maximum number of concurrent tasks of migrating data into a single node in the system. The default value is2. We recommend that you change it to300.For more information about the parameter, see server_data_copy_in_concurrency.
Procedure:
Check the values of the parameters.
We recommend that you record the original values of the parameters so that you can restore them after the failover.
obclient> SELECT NAME,VALUE FROM __ALL_VIRTUAL_SYS_PARAMETER_STAT WHERE NAME = '_mini_merge_concurrency'; +--------------------------+-------+ | NAME | VALUE | +-----------------------------+----+ | _mini_merge_concurrency | 3 | +--------------------------+-------+ 1 row in set obclient> SELECT NAME,VALUE FROM __ALL_VIRTUAL_SYS_PARAMETER_STAT WHERE NAME ='_ob_minor_merge_schedule_interval'; +-----------------------------------+-------+ | NAME | VALUE | +-----------------------------------+-------+ | _ob_minor_merge_schedule_interval | 20s | +-----------------------------------+-------+ 1 row in set obclient> SHOW PARAMETERS LIKE '%data_copy%';Change the values of the parameters.
obclient> ALTER SYSTEM SET _mini_merge_concurrency = 16; Query OK, 0 rows affected obclient> ALTER SYSTEM SET _ob_minor_merge_schedule_interval = '3s'; Query OK, 0 rows affected obclient> ALTER SYSTEM SET data_copy_concurrency = 600; Query OK, 0 rows affected obclient> ALTER SYSTEM SET server_data_copy_out_concurrency = 300; Query OK, 0 rows affected obclient> ALTER SYSTEM SET server_data_copy_in_concurrency = 300; Query OK, 0 rows affected
Perform the following steps to switch the standby cluster to the primary role:
Specify the system variable
ob_query_timeoutto set the timeout period for SQL statement execution in microseconds. Default value:10000000, which means 10 seconds. For more information about the system variableob_query_timeout, see ob_query_timeout.Note
This operation is optional. You can adjust the timeout period as required. We recommend that you set it to 100000000, which means 100s.
obclient> SET ob_query_timeout = 100000000; Query OK, 0 rows affectedExecute the failover statement.
obclient> ALTER SYSTEM ACTIVATE PHYSICAL STANDBY CLUSTER; Query OK, 0 rows affectedIf the following error message is returned, some OBServer nodes are offline. To execute a failover statement, all OBServer nodes must be online. Check whether all inactive OBServer nodes in the standby cluster are permanently offline. If yes, specify the
FORCEoption in the failover statement to force a lossy failover.Error message:
Failover is not allowed when cluster has permanent offline server. Make sure all inactive servers are permanent offline, and use FORCE option to skip this checkPerform the following steps to check for permanently offline OBServer nodes:
Execute the following statement to check the value of the
server_permanent_offline_timeparameter:obclient> SHOW PARAMETERS LIKE "%server_permanent_offline_time%";The
server_permanent_offline_timeparameter specifies the period allowed for missing a heartbeat. If no heartbeat from a server is received after this period, the server is considered permanently offline. For more information about theserver_permanent_offline_timeparameter, see server_permanent_offline_time.Execute the following statement to check the value of the
last_offline_timeparameter for an OBServer node. You can verify whether an OBServer node is permanently offline based on the values of thelast_offline_timeandserver_permanent_offline_timeparameters.obclient> SELECT svr_ip, svr_port, zone, status, last_offline_time FROM oceanbase.__all_server;If inactive OBServer nodes in the standby cluster are permanently offline, execute the following statement to force a lossy failover:
obclient> ALTER SYSTEM ACTIVATE PHYSICAL STANDBY CLUSTER FORCE;
Query the
V$OB_CLUSTER_EVENT_HISTORYview to check the failover progress.The failover operation restores the data of all partitions to the consistent snapshot point. This ensures that the data of all partitions before the snapshot point is complete. During this process, the replica data is pruned. The duration of the failover varies with the number of replicas in the cluster.
A failover is executed in the following stages:
failover to primary. flag:"start": The failover starts.Flash back system tables:
prepare partitions for flashback: The system is preparing for a flashback.prepare partitions for flashback end: The flashback preparation is completed.wait flashback info dump: The minor compaction of system tables is pending.wait partitions ready for flashback: The flashback of system table partitions is pending.wait partitions ready for flashback end: The flashback of system table partitions can be executed.do flashback partitions: The flashback of system table partitions is in progress.do flashback partitions end: The flashback of system table partitions is completed.
Flash back user tables:
prepare partitions for flashback: The system is preparing for a flashback.prepare partitions for flashback end: The flashback preparation is completed.wait flashback info dump: The minor compaction of user tables is pending.wait partitions ready for flashback: The flashback of user table partitions is pending.wait partitions ready for flashback end: The flashback of user table partitions can be executed.do flashback partitions: The flashback of user table partitions is in progress.do flashback partitions end: The flashback of user table partitions is completed.
flashback end: The flashback phase is completed.failover to primary: flag:"finish": The failover is completed.To query the failover progress, perform the following steps:
Query the trace ID of the failover statement execution.
obclient> SELECT FACILITY, TIMESTAMP, MESSAGE FROM V$OB_CLUSTER_EVENT_HISTORY WHERE FACILITY LIKE '%failover to primary%';View the execution process of the failover statement by using the obtained trace ID.
obclient> SELECT FACILITY, TIMESTAMP FROM V$OB_CLUSTER_EVENT_HISTORY WHERE MESSAGE LIKE '%trace_id:XXXXX%';
Example:
obclient> SELECT FACILITY, TIMESTAMP, MESSAGE FROM V$OB_CLUSTER_EVENT_HISTORY WHERE FACILITY LIKE '%failover to primary%'; +---------------------+----------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | FACILITY | TIMESTAMP | MESSAGE | +---------------------+----------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | failover to primary | 2021-09-28 17:39:01.233870 | flag:"start", ddl_stmt_str:"ALTER SYSTEM ACTIVATE PHYSICAL STANDBY CLUSTER", trace_id:Y39B7644564D0-0005CD0AF66D9CBA | | failover to primary | 2021-09-28 17:39:38.574946 | flag:"finish", failover#:1632821941446656, cost:37341082, ddl_stmt_str:"ALTER SYSTEM ACTIVATE PHYSICAL STANDBY CLUSTER", trace_id:Y39B7644564D0-0005CD0AF66D9CBA | +---------------------+----------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 2 rows in set obclient> SELECT FACILITY, TIMESTAMP FROM V$OB_CLUSTER_EVENT_HISTORY WHERE MESSAGE LIKE '%trace_id:Y39B7644564D0-0005CD0AF66D9CBA%'; +-----------------------------------------+----------------------------+ | FACILITY | TIMESTAMP | +-----------------------------------------+----------------------------+ | failover to primary | 2021-09-28 17:39:01.233870 | | switch cluster | 2021-09-28 17:39:01.528406 | | prepare partitions for flashback | 2021-09-28 17:39:02.565025 | | prepare partitions for flashback end | 2021-09-28 17:39:04.078347 | | wait flashback info dump | 2021-09-28 17:39:11.085528 | | wait partitions ready for flashback end | 2021-09-28 17:39:11.130451 | | switch cluster | 2021-09-28 17:39:11.133681 | | do flashback partitions | 2021-09-28 17:39:11.197417 | | do flashback partitions end | 2021-09-28 17:39:21.897984 | | switch cluster | 2021-09-28 17:39:21.901060 | | switch cluster | 2021-09-28 17:39:22.174017 | | prepare partitions for flashback end | 2021-09-28 17:39:23.590653 | | wait flashback info dump | 2021-09-28 17:39:31.490610 | | wait partitions ready for flashback end | 2021-09-28 17:39:31.535690 | | switch cluster | 2021-09-28 17:39:31.538925 | | do flashback partitions end | 2021-09-28 17:39:37.279887 | | flashback end | 2021-09-28 17:39:37.279934 | | switch cluster | 2021-09-28 17:39:37.283146 | | switch cluster | 2021-09-28 17:39:38.554347 | | switch cluster | 2021-09-28 17:39:38.573346 | | failover to primary | 2021-09-28 17:39:38.574946 | +-----------------------------------------+----------------------------+ 21 rows in set
Query the
V$OB_CLUSTERandV$OB_CLUSTER_FAILOVER_INFOviews to check the cluster status and determine the system change numbers (SCNs) of the failover.The
V$OB_CLUSTER_FAILOVER_INFOview records the failover data of each tenant in each failover.SYS_TABLE_SCNindicates the failover SCN of system tables.USER_TABLE_SCNindicates the failover SCN of user tables. The failover SCN of the sys tenant is meaningless.obclient> SELECT CLUSTER_ROLE, STANDBY_BECAME_PRIMARY_SCN FROM V$OB_CLUSTER; +--------------+----------------------------+ | CLUSTER_ROLE | STANDBY_BECAME_PRIMARY_SCN | +--------------+----------------------------+ | PRIMARY | 1613813589631620 | +--------------+----------------------------+ 1 row in set obclient> SELECT 'FAILOVER#', TENANT_ID, SYS_TABLE_SCN, USER_TABLE_SCN FROM V$OB_CLUSTER_FAILOVER_INFO; +------------------+-----------+------------------+------------------+ | FAILOVER# | TENANT_ID | SYS_TABLE_SCN | USER_TABLE_SCN | +------------------+-----------+------------------+------------------+ | 1613813770317824 | 1 | 1613813772434321 | 1613813772434321 | | 1613813770317824 | 1001 | 1613813589631620 | 1613813589631620 | | 1613813770317824 | 1002 | 1613813589631620 | 1613813589631620 | +------------------+-----------+------------------+------------------+ 3 rows in setAfter the failover, the role of the standby cluster changes from
PHYSICAL STANDBYtoPRIMARY.STANDBY_BECAME_PRIMARY_SCNindicates the minimum failover SCN among all tenants. Data with a version earlier than or equal to this SCN is consistent with that in the original primary cluster.Restore the original values of the parameters.
Notice
If the parameters are modified before the failover, restore their original values after the failover to ensure proper operation of the cluster.
obclient> ALTER SYSTEM SET _mini_merge_concurrency = 3; Query OK, 0 rows affected obclient> ALTER SYSTEM SET _ob_minor_merge_schedule_interval = '20s'; Query OK, 0 rows affected obclient> ALTER SYSTEM SET data_copy_concurrency = 20; Query OK, 0 rows affected obclient> ALTER SYSTEM SET server_data_copy_out_concurrency = 2; Query OK, 0 rows affected obclient> ALTER SYSTEM SET server_data_copy_in_concurrency = 2; Query OK, 0 rows affectedDelete permanently offline OBServer nodes.
After you perform the lossy failover by adding the
FORCEkeyword, you need to execute the following statement to delete the permanently offline OBServer nodes in the standby cluster:obclient> ALTER SYSTEM DELETE SERVER 'IP:PORT';Query the
oceanbase.__all_servertable to check whether the permanently offline OBServer nodes are deleted.obclient> SELECT * FROM oceanbase.__all_server ;If the query results do not contain the record of the OBServer nodes, the OBServer node are deleted.
What to do next
After a lossy failover, data is inconsistent between the new primary cluster and the original primary cluster and between the new primary cluster and other standby clusters. In this case, you cannot connect the original primary cluster and other standby clusters to the new primary cluster. You need to rebuild the primary/standby relationship.
For more information about the operation of rebuilding the primary/standby relationship, see Restore other clusters after a lossy failover.