This topic describes the high availability (HA) feature of OceanBase Database, including the concepts, parameters, and scenarios of the feature, and how to configure HA parameters in the OceanBase Migration Service (OMS) console.
Overview
HA clusters are an effective solution to ensure business continuity. During 24/7 business operation of enterprise users, the interruption of data synchronization even within minutes causes impact on the business.
With years of application in enterprise scenarios, OMS provides a high-availability cluster architecture that allows recovery in seconds. It supports disaster recovery for hosts, IDCs, and even regions, to meet the requirements of latency-sensitive enterprise users. When a fault occurs on a server or a single OMS node, the entire OMS does not fail. Tasks of the faulty node are distributed to other nodes for execution. Typically, tasks of a failed component are assigned to stores in other nodes for pulling. If an exception occurs, an error is reported, indicating that an exception occurs during the execution of the project.
The Store, Full-Import, and Incr-Sync components of OMS support HA. OMS manages the Store HA feature at the subtopic level. HA is enabled for stores under a subtopic based on the exceptions of the subtopic. Therefore, when Store HA is enabled, a store exception may not trigger the generation of a new store for disaster recovery. Store HA is enabled based on the following basic principles:
When the HA module detects a store exception, if the difference between the latest heartbeat time of the store and the current time does not exceed the value of the
checkModuleExceptionIntervalSecparameter, the HA module determines that the store is normal.Store HA is not enabled for a subtopic if at least one store under the subtopic is normal.
When none of the stores under a subtopic is normal, the following rules apply:
If a store is scheduled for a primary/standby switchover, the HA module does not trigger the generation of a new store, but waits for the store to restore.
If none of the stores under a subtopic is normal and the number of stores under the subtopic does not reach the value of the
subtopicStoreNumberThresholdparameter, the HA module triggers a ticket to create new stores. Otherwise, the HA module does not take any action.If some stores are faulty and some stores are stopped under a subtopic, the HA module cannot determine the cause.
If the number of stores under the subtopic does not reach the value of the
subtopicStoreNumberThresholdparameter, the HA module triggers a ticket to create new stores. Otherwise, the HA module does not take any action.If all stores under a subtopic are stopped, the HA module does not take any action.
If a store exits not because of an exception, the HA module takes it as a normally stopped store.
HA parameters and definitions
The ha.config file is as follows:
{
"enable":false,
"enableHost":false,
"enableStore":true,
"perceiveStoreClientCheckpoint":false,
"enableConnector":true,
"enableJdbcWriter":true,
"subtopicStoreNumberThreshold":5,
"checkRequestIntervalSec":600,
"checkJdbcWriterIntervalSec":600,
"checkHostDownIntervalSec":540,
"checkModuleExceptionIntervalSec":240,
"clearAbnormalResourceHours":72,
"refetchStoreIntervalMin":30
}
| Parameter | Description | Default value |
|---|---|---|
| enable | Specifies whether to enable HA. Valid values: true and false. |
false |
| enableHost | Specifies whether to enable HA for host nodes in the OMS cluster. Valid values: true and false. When hosts are down, the HA module triggers tickets for the batch migration of the Incr-Sync component instances. For more information about the alert, see oms_host_down. |
false |
| enableStore | Specifies whether to enable HA for the store component instances. Valid values: true and false. The HA module supports primary/standby store switchover and can recycle stores with exceptions:
|
true |
| perceiveStoreClientCheckpoint | An advanced feature of Store HA.
|
false |
| enableConnector | Specifies whether to enable the Incr-Sync daemon. Valid values: true and false. enableConnector is applicable to scenarios with abnormal Incr-Sync components. |
true |
| enableJdbcWriter | Specifies whether to enable the JDBCWriter daemon in OMS of a version earlier than V4.0.1. Valid values: true and false. enableJDBCWriter is applicable to scenarios with abnormal JDBCWriters. If enableConnector is set to false and enableJDBCWriter is set to true before the upgrade, you need to set enableConnector to true after the upgrade. |
true |
| subtopicStoreNumberThreshold | The maximum number of stores under a subtopic. The value of this parameter is verified before the HA module performs an operation. | 5 |
| checkRequestIntervalSec | The interval, in seconds, for triggering HA tickets for the same operation object. | 600 |
| checkJdbcWriterIntervalSec | The duration, in seconds, before HA is enabled for a faulty JDBCWriter component. HA is enabled for a faulty JDBCWriter component only when the JDBCWriter component has become inactive for the specified duration. | 600 |
| checkHostDownIntervalSec | Specifies the time elapsed during which the host does not report a heartbeat, when a host is considered down in downtime migration scenarios. | 540 |
| checkModuleExceptionIntervalSec | The time threshold, in seconds, for which a faulty store can stay active before HA is enabled. This parameter is available if store HA is enabled. | 240 |
| clearAbnormalResourceHours | The time interval, in hours, for clearing redundant faulty stores when Store HA is enabled, based on the GmtModfied parameter of the store component. | 72 |
| refetchStoreIntervalMin | The time interval, in minutes, for pulling back from stores. | 30 |
Modify HA configurations
You can modify the parameters in the ha.config file in the OMS console.
Log on to the OMS console.
In the left-side navigation pane, choose System Management > System Parameters.
On the System Parameters page, find
ha.config.Click the edit icon in the Value column of the parameter.
In the Modify Value dialog box, change the value as needed.
Click OK.