This topic describes how to perform a lossy failover.
Procedure
Ensure that the primary cluster is unavailable, that is, all OBServers in the primary cluster are down.
To ensure security, only one primary cluster can be deployed in the primary/standby cluster configuration. If the primary cluster is available, an error will be generated when you run the failover command on the target standby cluster. Therefore, to ensure the successful execution of the failover command, you need to ensure that the primary cluster is unavailable before the failover.
Query the status of each standby cluster and select an appropriate standby cluster as a new primary cluster.
Query the
V$OB_CLUSTERview of the standby cluster to obtain cluster information.obclient> SELECT CLUSTER_ROLE, PROTECTION_MODE, PROTECTION_LEVEL, CURRENT_SCN FROM V$OB_CLUSTER; +------------------+---------------------+---------------------+------------------+ | CLUSTER_ROLE | PROTECTION_MODE | PROTECTION_LEVEL | CURRENT_SCN | +------------------+---------------------+---------------------+------------------+ | PHYSICAL STANDBY | MAXIMUM PERFORMANCE | MAXIMUM PERFORMANCE | 1613813589631620 | +------------------+---------------------+---------------------+------------------+ 1 row in setCheck the following fields in the query result:
CLUSTER_ROLE: indicates the role of the current cluster, which isPHYSICAL STANDBYfor a standby cluster.PROTECTION_MODE: indicates the protection mode.PROTECTION_LEVEL: indicates the protection level.CURRENT_SCN: indicates the synchronization progress. This field is displayed only for a standby cluster.
Distinguish the protection mode and protection level:
When
PROTECTION_MODEandPROTECTION_LEVELare bothMAXIMUM AVAILABILITYorMAXIMUM PROTECTION, the execution result of the lossy failover command is the same as that of the lossless failover command. All partition data is considered consistent, and no rollback will be performed.When
PROTECTION_MODEandPROTECTION_LEVELare set to other values, all partition data will be rolled back to the consistent state by tenant after a lossy failover.CURRENT_SCNindicates the minimum synchronization progress among all tenants. A smaller value ofCURRENT_SCNindicates more serious data loss after the failover. WhenCURRENT_SCNis0or1, some tenants in the standby cluster are still being created, and the tenant data is incomplete. After the failover, the tenant data may be cleared.
Query the
V$OB_CLUSTER_STATSview of the standby cluster to check the synchronization progress of each tenant.obclient> SELECT TENANT_ID, MIN_SYS_TABLE_SCN, MIN_USER_TABLE_SCN FROM V$OB_CLUSTER_STATS; +-----------+-------------------+--------------------+ | TENANT_ID | MIN_SYS_TABLE_SCN | MIN_USER_TABLE_SCN | +-----------+-------------------+--------------------+ | 1 | 1613813707942627 | 1613813707942627 | | 1001 | 1613813589631620 | 1613813589631620 | | 1002 | 1613813589631620 | 1613813589631620 | +-----------+-------------------+--------------------+ 3 rows in setCheck the following fields in the query result:
TENANT_ID: indicates the tenant ID.1indicates the sys tenant, and other values indicate common tenants.MIN_SYS_TABLE_SCN: indicates the minimum synchronization progress of the system table.MIN_USER_TABLE_SCN: indicates the minimum synchronization progress of the user table.
The system tenant of the primary cluster is independent of the system tenant of the standby cluster, and the system tenants are not physically synchronized. During a lossy failover, data of the system tenants is not rolled back.
Common tenants of the primary and standby clusters are physically synchronized, and the synchronization progress varies in different partitions. During a lossy failover, data of the common tenants is rolled back:
Rollback point for all system table partitions:
MIN_SYS_TABLE_SCNRollback point for all user table partitions:
MIN_USER_TABLE_SCNorMIN_SYS_TABLE_SCN, whichever is smaller
Select an appropriate standby cluster, log on to the standby cluster, and modify related configuration items to speed up the failover.
You can modify the following two hidden configuration items:
_mini_merge_concurrency: specifies the concurrency of a minor compaction. The default value is3. You can change it to16._ob_minor_merge_schedule_interval: specifies the interval for scheduling minor compactions. The default value is20s. We recommend that you change it to3s.
To modify the configuration items, perform the following operations:
Check the values of the configuration items.
We recommend that you record the original values of the configuration items so that you can restore them after the failover.
obclient> SELECT NAME, VALUE FROM __ALL_VIRTUAL_SYS_PARAMETER_STAT WHERE NAME = '_mini_merge_concurrency'; +-----------------------------+-------+ | NAME | VALUE | +-----------------------------+-------+ | _mini_merge_concurrency | 3 | +-----------------------------+-------+ 1 row in set obclient> SELECT NAME, VALUE FROM __ALL_VIRTUAL_SYS_PARAMETER_STAT WHERE NAME = '_ob_minor_merge_schedule_interval'; +-----------------------------------+-------+ | NAME | VALUE | +-----------------------------------+-------+ | _ob_minor_merge_schedule_interval | 20s | +-----------------------------------+-------+ 1 row in setChange the values of the configuration items.
obclient> ALTER SYSTEM SET _mini_merge_concurrency = 16; Query OK, 0 rows affected obclient> ALTER SYSTEM SET _ob_minor_merge_schedule_interval = '3s'; Query OK, 0 rows affected
Perform the following operations to switch the standby cluster to the primary role:
Specify the system variable
ob_query_timeoutto set the command timeout duration in microseconds. The default value is10000000, indicating 10 seconds. For more information about the system variableob_query_timeout, see ob_query_timeout.Note
This operation is optional. You can adjust the command timeout duration as required. We recommend that you set it to 100000000 (100s).
obclient> SET OB_QUERY_TIMEOUT = 100000000; Query OK, 0 rows affectedRun the failover command.
obclient> ALTER SYSTEM ACTIVATE PHYSICAL STANDBY CLUSTER; Query OK, 0 rows affected
Query the
V$OB_CLUSTERandV$OB_CLUSTER_FAILOVER_INFOviews to check the cluster status and determine the system change numbers (SCNs) of the failover.The
V$OB_CLUSTER_FAILOVER_INFOview records the failover data of each tenant in each failover.SYS_TABLE_SCNindicates the failover SCN of the system table.USER_TABLE_SCNindicates the failover SCN of the user table. The failover SCN of the system tenant is meaningless.obclient> SELECT CLUSTER_ROLE, STANDBY_BECAME_PRIMARY_SCN FROM V$OB_CLUSTER; +---------------- +--------------------------------------+ | CLUSTER_ROLE | STANDBY_BECAME_PRIMARY_SCN | +----------------+---------------------------------------+ | PRIMARY | 1613813589631620 | +-----------------+--------------------------------------+ 1 row in set obclient> SELECT 'FAILOVER#', TENANT_ID, SYS_TABLE_SCN, USER_TABLE_SCN FROM V$OB_CLUSTER_FAILOVER_INFO; +----------------------+-------------+---------------------+-----------------------+ | FAILOVER# | TENANT_ID | SYS_TABLE_SCN | USER_TABLE_SCN | +----------------------+-------------+---------------------+-----------------------+ | 1613813770317824 | 1 | 1613813772434321 | 1613813772434321 | | 1613813770317824 | 1001 | 1613813589631620 | 1613813589631620 | | 1613813770317824 | 1002 | 1613813589631620 | 1613813589631620 | +----------------------+-------------+---------------------+-----------------------+ 3 rows in setAfter the failover, the role of the standby cluster changes from
PHYSICAL STANDBYtoPRIMARY.STANDBY_BECAME_PRIMARY_SCNindicates the minimum failover SCN among all tenants. Data with a version earlier than or equal to this SCN is consistent with that in the original primary cluster.Restore the original values of the configuration items.
Notice
If the configuration items are modified before the failover, restore their original values after the failover to ensure proper operation of the cluster.
obclient> ALTER SYSTEM SET _mini_merge_concurrency = 3; Query OK, 0 rows affected obclient> ALTER SYSTEM SET _ob_minor_merge_schedule_interval = '20s'; Query OK, 0 rows affected