This topic describes how to handle the faults of OBServer nodes by using ob-operator.
Prerequisites
- You must deploy an OceanBase cluster with at least 3 OBServer nodes and a tenant with 3 replicas or above.
- To use the static IP address feature, you must install
Calicoas the network plugin for the Kubernetes cluster.
Restore policy
When a minority of OBServer nodes fail, the multi-replica mechanism of OceanBase Database ensures the availability of the cluster, and ob-operator detects a pod exception. Then, ob-operator creates an OBServer node, adds it to the cluster, and deletes the abnormal OBServer node. OceanBase Database replicates the data of replicas on the abnormal OBServer node to the new node.
If you have installed Calico for the Kubernetes cluster, this process can be easier. ob-operator can start a new observer process by using the IP address of the abnormal OBServer node. This way, the data on the abnormal OBServer node, if the data still exists, can be directly used without the replication step. Moreover, if a majority of OBServer nodes fail, this method can also restore the service after all new OBServer nodes are started.
Verification
You can verify the restore result of ob-operator by performing the following steps.
Delete the pod
kubectl delete pod obcluster-1-zone1-074bda77c272 -n oceanbase
View the restore result. The output shows that the pod of zone1 has been created and is ready.
kubectl get pods -n oceanbase
NAME READY STATUS RESTARTS AGE
obcluster-1-zone3-074bda77c272 2/2 Running 0 12d
obcluster-1-zone2-7ecbd89f84de 2/2 Running 0 12d
obcluster-1-zone1-94ecf05cb290 2/2 Running 0 1m
Deployment suggestions
To deploy a production cluster with high availability, we recommend that you deploy an OceanBase cluster of at least three OBServer nodes, create tenants that each has at least three replicas, distribute nodes of each zone on different servers, and use the network plug-in Calico. This minimizes the risk of an unrecoverable cluster disaster. ob-operator also provides high-availability solutions based on the backup and restore of tenant data and the primary and standby tenants. For more information, see Restore data from a backup and Back up a tenant.