Perform an inspection in OCP to detect potential risks in an OceanBase cluster|V4.3.5| docs|Distributed Database

Perform an inspection in OCP to detect potential risks in an OceanBase cluster

Last Updated：2025-03-28 08:05:55 Updated

This topic describes how to perform an inspection in the OceanBase Cloud Platform (OCP) console to detect potential risks in an OceanBase cluster.

Scenarios

OCP provides real-time monitoring services for you to monitor the health of a running OceanBase cluster and its supporting components. If the cluster health significantly drops when, for example, an OBServer node fails, the monitoring system immediately triggers a corresponding alert. Such certain and abrupt health emergencies, if not promptly handled, can lead to serious consequences. In addition to the preceding emergencies, an OceanBase cluster can also be affected by subhealth issues. Instead of causing immediate serious consequences, subhealth issues may impact your business in specific scenarios or after a certain amount of time.

Real-time or high-frequency monitoring is unavailable for such sneaky subhealth issues, which take more time to identify or diagnose. OCP provides the inspection feature for you to detect the potential risks, or subhealth issues, of a cluster. The inspection feature supports a one-day interval at the minimum. You can also manually trigger an inspection at any time. An inspection detects the potential risks of an object, such as an Oceanbase cluster or OBProxy, and generates an inspection report.

An inspection is not performed in real time. Therefore, an inspection report contains more details. In addition, you can specify the metrics to check in an inspection and the content to display in an inspection report as needed.

Technical mechanism

The inspection feature implements the following configurable processes: data query > data processing > result determination, and allows you to specify custom inspection items as needed before the OCP version is updated.

As the core part of the inspection feature, the inspection script can query the metrics of the inspected objects, internal tables, and host status data provided through the agent API by using built-in objects. The inspection script also analyzes and processes the obtained status data, and exports the results to the inspection report.

The built-in inspection items of OCP are provided based on general scenarios or practices, which can meet most general inspection requirements. In addition, the number of built-in inspection items is constantly increasing. OCP allows you to specify custom inspection items as needed. For example, you can add custom inspection items by configuring the preceding inspection processes for a custom partition boundary check based on the partitioning keys of your business.

Note

To add a custom inspection item, you must understand objects in the inspection feature and write the inspection script. Therefore, you cannot configure a custom collection item on the GUI. If you need a custom inspection item, contact OCP Technical Support.

Prerequisites

Make sure that you have the following permissions:
- Resource Permissions : Cluster Maintenance permission
- Menu Permissions : Permission on the Inspection Service menu of O&M Management
You use OCP V4.0.3 or later.
Objects to be inspected are managed by OCP.

Procedure

Manual inspection

If you do not need to filter inspection items, you can find the target inspection object on the inspection page and directly initiate an inspection.
After all inspection items are checked, you can view the inspection report.

Scheduled inspection

You can schedule the inspections of different objects at specific points in time, or a global inspection of all objects. The process is as follows:

Configure inspection rules. You can configure inspection rules for a specific object or global inspection rules for all objects.
On the inspection rule configuration page, you can specify the inspection cycle of different inspection types by month, week, or day.
A scheduled inspection will be automatically triggered at the specified time, and the inspection report will be displayed in the same way as that of a manually triggered inspection.

FAQ

What can I do about the can not query error in an inspection report?

To perform an inspection, OCP needs to first query the status of the object to be inspected, which corresponds to the first process data query described in Technical mechanism. The can not query error occurs during this process. Due to this error, OCP cannot obtain the object status, and cannot perform subsequent analysis and processing. For example, to check whether a cluster parameter is properly configured, OCP first queries the actual value of the parameter. If OCP cannot connect to the cluster, it cannot query the parameter value, and the can not query error is displayed for the corresponding inspection item in the report.

The can not query error is often caused by unexpected exceptions and indicates a serious symptom, such as the cluster query failure described in the preceding example. Therefore, OCP gives risk prompts for inspection items with the can not query error. To locate the direct cause of the can not query error, you need to find the corresponding inspection task in the inspection history, and view the corresponding error information on the task page.