This topic describes how to troubleshoot issues related to host management and performance monitoring during the O&M of OceanBase clusters in OceanBase Cloud Platform (OCP).
Problem description
OCP-Agent monitors and manages the host by using processes that have different functions. For more information, see Processes. An exception of a process may impact the function of the process and can cause the following results:
Unavailability of monitoring data on the Tenants page, Clusters page, Performance Monitoring page of a tenant or cluster, and the Monitoring tab of the host.
You cannot create a tenant or a table partition due to insufficient disk or memory space on the host, but the configured monitoring alerts are not reported.
In the Hosts list, the status of the relevant host is Offline .
When you encounter any of the preceding situations, refer to the following sections to check whether it is caused by exceptions of ocp-agent processes.
Possible causes
The following table describes related symptoms and their possible causes for your reference. As ocp-agent processes depend on each other, we recommend that you troubleshoot all of them.
| Symptom | Possible cause | Log |
|---|---|---|
| The host status is Offline. | Exceptions of ocp_agent-related processes. | * /home/admin/ocp_agent/log/ocp_agentd.log * /home/admin/ocp_agent/log/ocp_agent_ctl.log |
| The resource monitoring data is missing. | Exceptions of the node_exporter process | /home/admin/ocp_agent/log/node_exporter.log |
| The alerts are triggered but are not reported. | Exceptions of the ocp_exporter process | /home/admin/ocp_agent/log/ocp_exporter.log |
| SQL diagnostic data is missing. | Exceptions of the obstat2 process | /home/admin/ocp_agent/log/obstat2.log |
| OBServer log alerts are not reported. | Exceptions of the ob_logtailer process | /home/admin/ocp_agent/log/ob_logtailer.log |
Solutions
Check the abnormal host for process exceptions.
Go to the Hosts list of the OCP console, select the abnormal host and click the OCP Agent tab. View the status of processes on the tab.
Log on to the abnormal host and run the
ps -efcommand to check whether the related processes are started. For more information, see Processes.
If you identify the process exceptions, proceed to the next step. Otherwise, the symptoms are caused by other factors that are not covered in this topic.
Restart the abnormal process. For more information, see OCP-Agent script.
If you fail to restart the process, check its dependencies. For more information, see Processes.
If you need to stop a process, stop its daemon first.
Go to the ${HOME}/ocp_agent/log directory and check the log file of the corresponding process for ERROR-level logs.
If any ERROR-level logs are identified, analyze them to find the causes.