Modify HA configurations

2023-06-29 11:11:39  Updated

This topic describes the high availability (HA) feature of OceanBase Database, including the concepts, parameters, and scenarios of the feature, and how to configure HA parameters in the OceanBase Migration Service (OMS) console.

Overview

HA clusters are an effective solution to ensure business continuity. During 24/7 business operation of enterprise users, the interruption of data synchronization even within minutes causes impact on the business.

With years of application in enterprise scenarios, OMS provides a high-availability cluster architecture that allows recovery in seconds. It supports disaster recovery for hosts, IDCs, and even regions, to meet the requirements of latency-sensitive enterprise users. When a fault occurs on a server or a single OMS node, the entire OMS does not fail. Tasks of the faulty node are distributed to other nodes for execution. Typically, tasks of a failed component are assigned to stores in other nodes for pulling. If an exception occurs, an error is reported, indicating that an exception occurs during the execution of the project.

The Store, Full-Import, and Incr-Sync components of OMS support HA. OMS manages the Store HA feature at the subtopic level. HA is enabled for stores under a subtopic based on the exceptions of the subtopic. Therefore, when Store HA is enabled, a store exception may not trigger the generation of a new store for disaster recovery. Store HA is enabled based on the following basic principles:

  • When the HA module detects a store exception, if the difference between the latest heartbeat time of the store and the current time does not exceed the value of the checkModuleExceptionIntervalSec parameter, the HA module determines that the store is normal.

  • Store HA is not enabled for a subtopic if at least one store under the subtopic is normal.

  • When none of the stores under a subtopic is normal, the following rules apply:

    • If a store is scheduled for a primary/standby switchover, the HA module does not trigger the generation of a new store, but waits for the store to restore.

    • If none of the stores under a subtopic is normal and the number of stores under the subtopic does not reach the value of the subtopicStoreNumberThreshold parameter, the HA module triggers a ticket to create new stores. Otherwise, the HA module does not take any action.

    • If some stores are faulty and some stores are stopped under a subtopic, the HA module cannot determine the cause.

      If the number of stores under the subtopic does not reach the value of the subtopicStoreNumberThreshold parameter, the HA module triggers a ticket to create new stores. Otherwise, the HA module does not take any action.

    • If all stores under a subtopic are stopped, the HA module does not take any action.

    • If a store exits not because of an exception, the HA module takes it as a normally stopped store.

HA parameters and definitions

The ha.config file is as follows:

{
  "enable":false,
  "enableHost":false,
  "enableStore":true,
  "perceiveStoreClientCheckpoint":false,
  "enableConnector":true,
  "enableJdbcWriter":true,
  "subtopicStoreNumberThreshold":5,
  "checkRequestIntervalSec":600,
  "checkJdbcWriterIntervalSec":600,
  "checkHostDownIntervalSec":540,
  "checkModuleExceptionIntervalSec":240,
  "clearAbnormalResourceHours":72,
  "refetchStoreIntervalMin":30
  }
Parameter Description Default value
enable Specifies whether to enable HA. Valid values: true and false. false
enableHost Specifies whether to enable HA for host nodes in the OMS cluster. Valid values: true and false.
When hosts are down, the HA module triggers tickets for the batch migration of the Incr-Sync component instances. For more information about the alert, see oms_host_down.
false
enableStore Specifies whether to enable HA for the store component instances. Valid values: true and false. The HA module supports primary/standby store switchover and can recycle stores with exceptions:
  • Primary/Standby store switchover: If a store under a subtopic is scheduled for a primary/standby switchover, the HA module creates a ticket to restore the store.
  • Recycling stores with exceptions: When an exception occurs to a store under a subtopic, the HA module checks whether the difference between the latest heartbeat time of the store and the current time exceeds the value of the clearAbnormalResourceHours parameter. If yes, the ticket for the store is deleted.
true
perceiveStoreClientCheckpoint An advanced feature of Store HA.
  • The advanced feature of Store HA is enabled only when enable, enableStore, and perceiveStoreClientCheckpoint are set to true. Then, stores can be automatically created based the consumption capability of downstream components. The prerequisite for automatic store creation is that the number of stores under the subtopic does not exceed the value of the subtopicStoreNumberThreshold parameter.
    In this case, if the subtopic has no normal stores that meet the consumption requirements of downstream components, the HA module will pull back from the stores that were created ahead of the time of consumption by the interval specified by refetchStoreIntervalMin.
  • When both enable and enableStore are set to true, and perceiveStoreClientCheckpoint is set to false, only Store HA is enabled.
    In this case, if the subtopic has an abnormal store, the HA module will pull back from the stores that were created ahead of the current time by the interval specified by refetchStoreIntervalMin.
false
enableConnector Specifies whether to enable the Incr-Sync daemon. Valid values: true and false. enableConnector is applicable to scenarios with abnormal Incr-Sync components. true
enableJdbcWriter Specifies whether to enable the JDBCWriter daemon in OMS of a version earlier than V4.0.1. Valid values: true and false. enableJDBCWriter is applicable to scenarios with abnormal JDBCWriters.
If enableConnector is set to false and enableJDBCWriter is set to true before the upgrade, you need to set enableConnector to true after the upgrade.
true
subtopicStoreNumberThreshold The maximum number of stores under a subtopic. The value of this parameter is verified before the HA module performs an operation. 5
checkRequestIntervalSec The interval, in seconds, for triggering HA tickets for the same operation object. 600
checkJdbcWriterIntervalSec The duration, in seconds, before HA is enabled for a faulty JDBCWriter component. HA is enabled for a faulty JDBCWriter component only when the JDBCWriter component has become inactive for the specified duration. 600
checkHostDownIntervalSec Specifies the time elapsed during which the host does not report a heartbeat, when a host is considered down in downtime migration scenarios. 540
checkModuleExceptionIntervalSec The time threshold, in seconds, for which a faulty store can stay active before HA is enabled. This parameter is available if store HA is enabled. 240
clearAbnormalResourceHours The time interval, in hours, for clearing redundant faulty stores when Store HA is enabled, based on the GmtModfied parameter of the store component. 72
refetchStoreIntervalMin The time interval, in minutes, for pulling back from stores. 30

Modify HA configurations

You can modify the parameters in the ha.config file in the OMS console.

  1. Log on to the OMS console.

  2. In the left-side navigation pane, choose System Management > System Parameters.

  3. On the System Parameters page, find ha.config.

  4. Click the edit icon in the Value column of the parameter.

  5. In the Modify Value dialog box, change the value as needed.

  6. Click OK.

Contact Us