You can replace a faulty node in an OceanBase cluster by using obshell in the following two ways:
Call an API
Run an obshell command
This topic describes how to replace a faulty node in a three-node OceanBase cluster by using obshell.
Prerequisites
The OceanBase cluster has been managed by obshell. For more information about how to check whether the cluster is managed by obshell and how to take over a cluster not managed by obshell, see Take over a cluster not managed by obshell.
All OBServer nodes and obshell nodes except the faulty nodes are running.
Deployment mode
The following table describes the usage of the three servers in this example (obshell uses port 2886 by default):
| Role | Server | Remarks |
|---|---|---|
| OBServer node | 10.10.10.1 | OceanBase Database zone1 |
| OBServer node | 10.10.10.2 | OceanBase Database zone2 |
| OBServer node | 10.10.10.3 | OceanBase Database zone3, the node is faulty (either it is down or the clog disk is damaged) |
| OBServer node | 10.10.10.4 | the server that replaces the faulty node (10.10.10.3) |
Replace a failed node by using an API
Note
obshell verifies the security of the called API. Therefore, before you call an API, you must encrypt the request by referring to API hybrid encryption and configure the encrypted request header (${request_headers}) and request body (${request_body}) in the curl command.
Step 1: Call the cluster scale-out interface
To initiate a scale-out operation, call the /api/v1/ob/scale_out interface of any obshell instance except the faulty node. This will add a new node to the zone where the faulty node is located.
For more information about how to call the corresponding API interface in the CLI, see Cluster scale-out.
[admin@test001 ~]$ curl -H "Content-Type: application/json" -H 'X-OCS-Header:${request_headers}' -X POST -d '${request_body}' http://10.10.10.1:2886/api/v1/ob/scale_out
The request body before encryption is as follows:
{
"agentInfo": {
"ip": "10.10.10.4",
"port": 2886
},
"obConfigs": {
"redoDir":"/data/workspace/redo",
"dataDir":"/data/workspace/data",
"datafile_size":"24G",
"cpu_count":"16",
"memory_limit":"16G",
"system_memory":"4G",
"log_disk_size":"24G"
},
"zone": "zone3"
}
For more information about how to request the corresponding API method using obshell-sdk-python, see Cluster scale-out.
···
client = ClientSet("10.10.10.1", 2886, PasswordAuth("****"))
configs = {"redoDir":"/data/workspace/redo", "dataDir":"/data/workspace/data",
"datafile_size":"24G", "cpu_count":"16", "memory_limit":"16G",
"system_memory":"4G", "log_disk_size":"24G"}
client.v1.scale_out_sync("10.10.10.4", 2886, "zone3", configs) # Call /api/v1/ob/scale_out
···
For more information about how to request the corresponding API method using obshell-sdk-go, see Cluster scale-out.
···
client, err := services.NewClientWithPassword("10.10.10.1", 2886, "****")
configs := map[string]string {
"redoDir":"/data/workspace/redo", "dataDir":"/data/workspace/data",
"datafile_size":"24G", "cpu_count":"16", "memory_limit":"16G",
"system_memory":"4G", "log_disk_size":"24G"}
req := client.V1().NewScaleOutRequest("10.10.10.4", 2886, "zone3", configs)
dag, err := client.V1().ScaleOutSyncWithRequest(req) // Call /api/v1/ob/scale_out
···
Step 2: Call the cluster scale-in interface
You can call the /api/v1/ob/scale_in interface of any obshell to remove the faulty node from the cluster.
For more information about how to call the API interface in the CLI, see Cluster scale-in.
[admin@test001 ~]$ curl -H "Content-Type: application/json" -H 'X-OCS-Header:${request_headers}' -X POST -d '${request_body}' http://10.10.10.1:2886/api/v1/ob/scale_in
The request body before encryption is as follows:
{
"agent_info": {
"ip": "10.10.10.3",
"port": 2886
},
"force_kill": true
}
For more information about how to request the corresponding API method through obshell-sdk-python, see Cluster scale-in.
···
client = ClientSet("10.10.10.1", 2886, PasswordAuth("****"))
client.v1.scale_in_sync("10.10.10.3", 2886, force_kill = True) # Call the /api/v1/ob/scale_in interface
···
For more information about how to request the corresponding API method through obshell-sdk-go, see Cluster scale-in.
···
client, err := services.NewClientWithPassword("10.10.10.1", 2886, "****")
req := client.V1().NewScaleInRequest("10.10.10.3", 2886).SetForceKill()
dag, err := client.V1().ScaleInSyncWithRequest(req) // Call the /api/v1/ob/scale_in interface
···
Replace a faulty node by using obshell
Step 1: Start obshell
Start obshell on the node to be added to the cluster (10.10.10.4). For more information, see Start obshell in Start and stop obshell.
[admin@test004 ~]$ /home/admin/oceanbase/bin/obshell agent start --ip 10.10.10.4 -P 2886
Step 2: Add the node to the cluster
When you add a node to the cluster by using the CLI, you must run the command on the node to be added to the cluster. Run the obshell cluster scale-out command on the node to be added to the cluster (10.10.10.4 in this example). For more information, see obshell cluster scale-out in obshell cluster commands.
If obshell is not deployed on the node to be added to the cluster, you can deploy obshell by using obshell-sdk-python or obshell-sdk-go. For more information, see Install obshell by using obshell-sdk-python or Install obshell by using obshell-sdk-go.
[admin@test004 ~]$ /home/admin/oceanbase/bin/obshell cluster scale-out -s '10.10.10.1:2886' -z 'zone3' -o 'memory_limit=16G,system_memory=8G,log_disk_size=24G,datafile_size=24G' --rp *****
Step 3: Remove the faulty node from the cluster
Run the obshell cluster scale-in command on any node in the cluster except the faulty node (for example, 10.10.10.1) to forcibly remove the faulty node from the cluster. For more information, see obshell cluster scale-in in obshell cluster commands.
[admin@test001 ~]$ /home/admin/oceanbase/bin/obshell cluster scale-in -s '10.10.10.3:2886' -f