OceanBase Cloud Platform (OCP) provides you with alert monitoring services at the OceanBase cluster, tenant, and host levels. You can use the built-in alerts to meet your basic alert requirements. This topic helps you understand the built-in alerts.
The following table lists the components of each alert.
| Component | Description |
|---|---|
| Description | Describes the meaning of each alert and its trigger conditions. |
| Principle | Introduces the principle of an alert, especially the source of a metric value. The principle of an expression-triggered alert is described by using the following terms: * Metric : the target metric of the alert. When the value of this metric meets the alert rule, the alert is triggered. * Source : the data source of the collected metrics. Unless otherwise specified, the values of the collected metrics are obtained by OCP-Agent by querying the system table of the sys tenant. The SQL query is provided in the Source row of the parameter table of each alert. * Collected metrics : They are collected from the source and then substituted into the metric expression to determine the value of the target metric. * Metric expression : In this expression, collected metrics are combined with labels to determine the value of the target metric. Note LABELS: Labels defined in the frontend are matched with labels in the source to accurately obtain data of the target object. GBLABELS: the labels that are used to aggregate the monitoring data by category. * Collection cycle : the interval at which OCP-Agent queries the system table of the sys tenant. The values of collected metrics are obtained by using the method described in the source in each collection cycle and substituted into the metric expression to determine the value of the target metric. |
| Alert rule | Introduces the trigger rule of each alert, including metric, default threshold, source, duration, detection cycle, and time before clearance. * Trigger rule: The system detects the metric once in each detection cycle. When the value of the metric exceeds the default threshold for the period specified for the duration, an alert is triggered. * Clearing rule: If an alert is not triggered in the period specified for the time before clearance, the system automatically clears this alert. |
| Alert information | Describes the trigger method, alert level, scope, and target of each alert. The following alert trigger methods are supported: * Metric expression: Specifies the case where an alert is triggered when the metric value determined by the metric expression meets the trigger condition. Generally, the values of the metrics are collected by the following means: * OCP-Agent queries the system table of the sys tenant. * OCP-Agent queries the local host by running Linux commands. * The metrics are collected by using the exporter process. You can query metrics collected by the exporter process by calling the corresponding APIs in a browser. In the following examples, xxx.xxx.xxx.xxx is the IP address of the host where the OBServer is deployed. * http://xxx.xxx.xxx.xxx:62889/metrics/node/host queries metrics of the host. * http://xxx.xxx.xxx.xxx:62889/metrics/node/ob queries metrics of the OceanBase cluster. * http://xxx.xxx.xxx.xxx:62889/metrics/ob/perSecond queries different types of metrics at the cluster level every second. * http://xxx.xxx.xxx.xxx:62889/metrics/ob/perMinute queries different types of metrics at the cluster level every minute. * Timed task of OCP: OCP-Server sets a timed task to check the local host and generates an alert when the trigger condition is met. |
| Alert templates | Describes the overview and details templates of each alert, and provides an example for each template. |
| Impact on the system | Describes the impact that may be caused on the system when the alert is triggered. |
| Possible causes | Describes the possible causes of an alert to help you locate and handle the alert. |
| Suggested solutions | Shows you how to solve the problems that caused the alert. |
When the default threshold cannot meet your requirement, you can modify the alert thresholds. For more information, see Modify alert thresholds. When you do not want to receive some alerts, you can block them. For more information, see Set an alert-blocking condition.
Concepts
Alert target
An alert target is a target that is monitored by the alert task and uniquely identifies an alert. It can be an OceanBase cluster, a server, or a service.
Based on the alert item, an alert target can be a tag value or a combination of tag values, such as obregion=obocp:svr_ip=*.*.*.* identifies a server in the OceanBase cluster or OCP cluster.
Alert scope
The alert scope defines the scope of an alert and is consistent with the metric scope. For example, when the CPU utilization exceeds the threshold, it can be a problem for the entire cluster, the tenant, or a single server.
The alert scope includes an OceanBase cluster (OBCluster), a tenant in the OceanBase cluster (OBTenant), an application cluster (AppCluster), a service, a host, and a process. The process is reserved.
Alert level
Each alert item has an alert level.
| Alert level | Meaning | Color | Description |
|---|---|---|---|
| 1 | Stopped | Purple | The system is completely unavailable and needs immediate recovery. For example, An OceanBase Database service cannot be started. |
| 2 | Critical | Red | The system availability decreases and the necessary measures must be taken to prevent the system from becoming completely unavailable. For example, The memory usage of a server exceeds the threshold of 90% and this condition has lasted for 3 minutes. |
| 3 | Warning | Orange | The system is still available but it is about to become unavailable. You must take measures to prevent the reduction of availability. For example, The proportion of connections of an OceanBase Database tenant exceeds the threshold of 80%. |
| 4 | Caution | Blue | Based on the trend, you can see that the important performance metrics of the system are declining. You can locate potential problems by using troubleshooting methods to avoid triggering alerts. This alert level is reserved but no alert item matches this level at present. |
| 5 | Reminder | Green | Technically, a reminder is not an alert. It usually indicates that an administrator has performed an important action. For example, The administrator deleted a cluster. After alerts at this level are cleared, no notification is generated. |