This topic describes frequently asked questions about host management.
Basic concepts
Q1: What are the relationships between a region, a zone, and an IDC?
A: A region indicates a geological region. An IDC is deployed in a region and belongs to only one region. When you add a host, you must specify an IDC for the host. A zone is a logical concept of the OceanBase cluster and corresponds to the physical concept of IDC. When you add a zone for the cluster, you must specify the IDC corresponding to this zone.
Q2: The OCP server and OceanBase server are deployed across different IDCs. Is it true that I only need to apply for the network permissions to connect from an OCP server to SSH ports 22 and 2881 of the OceanBase server?
A: You also need to apply for ports from 62881 to port 63881 of the OCP-Agent. If logical backup recovery is used, you also need to apply for ports 2911 and 2912. For more information, see Component listening port list.
Add hosts
Q1: Why did I fail to add an OBServer?
A: We recommend that you perform the following steps to troubleshoot this issue:
Check whether the OBServer is within the management scope of OCP.
Check whether the OBServer time is much different from the OCP server time.
Check whether you have the required privileges on the disk of the OBServer and whether the disk space is sufficient.
Q2: Some OBServers are idle. Why no OBServer is available in the Host drop-down list when I try to add an OBServer?
A: When you add an OBServer, you must select a host that belongs to the IDC of the current zone.
Q3: An error occurred when I was adding a host. Error message: Failed to access the host through SSH . What can I do?
A. The SSHD of the container is not started. Reinstall tmp_ocp to solve this problem.
Q4: An error occurred when I was adding a host. Error message: Python dependency is missing. What can I do?
A: Add export PYTHONPATH=/home/admin/ocp_agent/site-packages:$PYTHONPAT to the last line of /root/.bashrc, and update OAT-CLI.
Q5: What can I do if I failed to add a host by using port 2022?
A: Change the values of the GSSAPIAuthentication and UseDNS parameters in the /etc/ssh/ssh_config file to yes.
Q6: An error occurred when I was adding a host. Error message: Failed to access the host whose IP address is xxx.xxx.xxx.xxx through SSH. Check whether the IP address, credentials, and port number are correct. What can I do?
A: The host is not configured with a DNS. We recommend that you configure DNS in the/etc/resolv.conf file of the host to be added in the format of nameserver xxx.xxx.xxx.xxx. Replace xxx.xxx.xxx.xxx with the IP address of the DNS server.
Q7: I have installed OCP on an ARM server. Then, I want to add an X86 host. Can I use OCP to install the X86 Agent on this host and add the host?
A: Yes. OCP runs in a Docker container, which is irrelevant to the hardware architecture at the underlying layer.
Note
OCP V2.5.x supports different clusters of different OceanBase architectures. However, servers in the same OceanBase cluster must have the same architecture. OCP will select an installation package based on the architecture of the host. Installation packages for different architectures can be uploaded by using the software package feature.
Q8: Why does the system report the inaccessible credentials error when I add a host by using the root user privileges?
A: OCP uses sudo -n true to verify credentials. We recommend that you check whether the sudoer file is modified.
Q9: An error occurred when I was adding a host that resides in a different network from that of the OCP server. What is the cause of this problem?
A: The ocp.site.url system parameter must be set so that an external host can access OCP. We recommend that you set this parameter and add the host again.
Q10: An error occurred when I was adding an ARM host by using the X86 OCP system. Error message: ocp-agent does not exist. What can I do?
A: The X86 OCP system does not contain the ARM OCP-Agent package. We recommend that you manually upload the ARM OCP-Agent package and try again.
Q11: An error occurred when I was adding an OBServer by using OCP. I cannot roll back this operation, either. What is the cause of this problem?
A: If you failed to create a cluster, the host is not installed by using the antman installation template. If you failed to roll back the operation, the pos_proxy process may be abnormal, causing a host connection failure. We recommend that you stop the pos_proxy process first. After the pos_proxy process automatically pulls up, you can roll back the failed operation, use the installation template of OAT-CLI to preconfigure the host, and then add the OBServer.
Q12: The primary cluster and standby cluster are deployed in different IDCs. An error occurred when I was adding a host by using OCP. Error message: message=rpc call failed, non-blocking recv response is null, may read timeout . What is the cause of this problem?
A: The problem may be caused by ports. We recommend that you check whether the ports are disabled by the firewall.
Q13: An error occurred when I was adding a host. Error message: empty pointer error. What is the cause of this problem?
A: An old API is used. The cache stored in the browser may apply to an earlier version, for example, OCP V2.4.5. When you use a version later than OCP V2.4.5, for example, OCP V2.5.0, maintenance can still be performed on the V2.4.5 page if the browser does not remove the expired API. We recommend that you clear the cache in the browser and retry adding the host.
Note
The host adding failure caused by the browser cache may be differently reflected in different OCP versions. We recommend that you clear the cache in the browser if the cause of the failure cannot be found.
Q14: When I redeploy OCP, an error occurred when adding a host. What may cause this problem except for the NTP issue?
A: Maybe you have not added credentials to the host. If yes, we recommend that you quit the task, clear the cache in the browser, and add the credentials to the host. Then, retry adding the host. You can try this method if you failed to add the host in both an upgrade and replacement scenario.
Q15: An error occurred when I was installing OCP-Agent. Error message: This operation is not supported when the host is in the {0} state. What is the cause of this problem?
A: It is likely that the OCP system is executing an upgrade task. During an OCP upgrade, it is normal that the host may go offline. You can reinstall the OCP-Agent after the upgrade is completed.
Q16: I failed many times to perform the add OBServer task on a server whose IP address has been changed. What can I do?
A: Maybe the server is not within the management scope of OCP. We recommend that you initialize the server and try to add the server again.
Q1: Why cannot I find the option to forcibly delete a host?
A: Typically, this option is displayed when the host is offline or unavailable.
Q2: If I added the same host repeatedly by using OCP, can I delete the duplicate hosts?
A: Yes, you can.
Manage hosts
Q1: Can I delete the region that I set when I use OCP to add a host ?
A: Yes, you can. In a CLI tool, execute the deletion statement in the compute_region table. The deletion operation cannot be implemented on the OCP console.
Q2: Why is the OBServer memory displayed in OCP inconsistent with the memory queried in the CLI tool?
A: OCP displays the OBServer memory based on the system_memory and system_memory_percentage parameters, including the storage used by the system. In a CLI tool, the storage used by the system is not included in the OBServer memory. OCP memory distribution varies with pages.
In the Server list, the system reserved memory is included in the total memory.
On the tenant management page, "system" refers to the system reserved memory size, and it is different from the "sys" tenant. The memory of the sys tenant is configured based on the unit specifications as the memory of the business tenant.
Q3: How can I view the version of OCP-Agent of all added hosts ?
A: Run the following command in the MetaDB database:
Select t1.inner_ip_address, t2.version from compute_host t1 inner join compute_host_agent t2 on t1.id = t2.host_id;