Application scenarios
Starting from OceanBase Database V4.1, the arbitration service capability is provided to achieve low-cost high availability with two replicas across two data centers (2F1A). However, in this deployment mode, the high availability of replica A itself cannot be guaranteed. If replica A becomes unavailable due to process exceptions or a server crash, OceanBase cannot maintain a highly available operation state. To address this issue, OCP V4.3.5 introduced the high availability capability of arbitration service groups.
What is an arbitration service group?
An arbitration service group contains multiple arbitration services. When an arbitration service is detected as unavailable, OCP automatically switches the associated cluster to another arbitration service within the group.
An arbitration service group consists of two parts: the arbitration service list and the switchover policy.
The arbitration services in an arbitration service group have priority attributes. When the associated OceanBase cluster triggers the switchover policy, it will preferentially switch to the arbitration service with a higher priority. If resources of the first priority are insufficient, some OceanBase clusters will be transferred to the arbitration service of the next priority.
The switchover policy is divided into two parts: the Unit Disconnection Policy and the Duration.
The Unit Disconnection Policy includes the following three types:
- Default Policy: When half of the tenant's replicas in the OceanBase cluster lose connection to the arbitration service.
- Number of Units: When a specified number of Unit Servers in the OceanBase cluster lose connection to the arbitration service.
- Percentage of Units: When a specified percentage of Unit Servers in the OceanBase cluster lose connection to the arbitration service.
Duration: How long after the OceanBase cluster triggers the Unit Disconnection Policy will the switchover be implemented.
Prerequisites
- There must be at least one arbitration service in OCP.
- When creating an arbitration service group, the specified arbitration service must be in the RUNNING state.
- The versions of the arbitration services in the arbitration service group must be consistent.
Technical principle
OCP Agent Reporting Rule: The monagent process on the cluster's RS node reports the connection status between the cluster and the arbitration service to MonitorDB every 5 seconds.
OCP Server Check Rule:
- Divided by shards based on the arbitration service group, each OCP-Server monitors the arbitration service groups within its own shard.
- Queries MonitorDB every 10 seconds to obtain a list of all servers that have lost connection to the arbitration service. Then, based on the user-defined switchover policy, determines whether the number of unit disconnections in the cluster meets the switchover conditions.
Considerations
OCP does not initiate automatic arbitration service switchover when the cluster is in an abnormal state or maintenance state.
There must be at least one additional arbitration service in the arbitration service group that is running normally, has sufficient resources, and has a network connection to the OceanBase cluster. Otherwise, OCP will not initiate an automatic switchover. The quantitative analysis for sufficient resources is as follows:
- The memory usage of the target arbitration service machine must be ≤ 90%.
- The disk of the target arbitration service machine must meet the 2F/4F requirement: 12 MiB for a 2F tenant and 24 MiB for a 4F tenant. Check whether the resources are sufficient when migrating to this arbitration machine.
Procedure
Log in to OCP.
In the left navigation bar, click Cluster to go to the Arbitration Service page.
Click the Service Group tab to go to the Service Group list page.
Click Create Service Group in the upper-right corner.
In the Create Service Group panel, configure the basic information for the arbitration service group.
