Overview

2025-03-28 08:05:55  Updated

OceanBase Cloud Platform (OCP) provides you with alert monitoring services at the OceanBase cluster, tenant, and host levels. You can use the built-in alerts to meet your basic alert requirements. This topic helps you understand the built-in alerts. You can also create alert items based on your business needs. For more information, see Create an alert rule.

The following table lists the components of each alert.

Component Description
Description Describes the meaning of each alert and its trigger conditions.
Principle Describes the principle of an alert, especially the source of a metric value. The principle of an expression-triggered alert is described by using the following terms:
  • Metric : the target metric of the alert. When the value of this metric meets the alert rule, the alert is triggered.
  • Source : the data source of the collected metrics. Unless otherwise specified, the values of the collected metrics are obtained by OCP-Agent by querying the system table of the sys tenant. The SQL query is provided in the Source row of the parameter table of each alert.
  • Collected metrics: They are collected from the source and then substituted into the metric expression to determine the value of the target metric.
  • Metric expression: In this expression, collected metrics are combined with labels to determine the value of the target metric. LABELS: Labels defined in the frontend are matched with labels in the source to accurately obtain data of the target object. GBLABELS: the labels that are used to aggregate the monitoring data by category.
  • Collection cycle: the interval at which OCP-Agent queries the system table of the sys tenant.
  • The values of collected metrics are obtained by using the method described in the source in each collection cycle and substituted into the metric expression to determine the value of the target metric.
    Alert rule Describes the trigger rule of each alert, including the metric, default threshold, duration, alert cycle, and time before clearance.
  • Trigger rule: The system detects the metric once in each detection cycle. When the value of the metric exceeds the default threshold for the number of cycles specified for the duration, an alert is triggered. By default, the detection cycle is 10 seconds. The system detects the metric once every 10 seconds before the alert is triggered. After the alert is triggered, the detection frequency changes to match the alert cycle.
  • Clearing rule: If an alert is not triggered in the period specified for the time before clearance, the system automatically clears this alert.
  • Alert information Describes the trigger method, alert level, scope, and target of each alert. The following alert trigger methods are supported:
  • Metric expression: Specifies the case where an alert is triggered when the metric value determined by the metric expression meets the trigger condition. Generally, the values of the metrics are collected by the following means:
  • OCP Agent queries the system table of the sys tenant.
  • OCP Agent queries the local host by running Linux commands.
  • The metrics are collected by using the exporter process. You can query metrics collected by the exporter process by calling the corresponding APIs in a browser. In the following examples, xxx.xxx.xxx.xxx is the IP address of the host where the OBServer node is deployed.
  • http://xxx.xxx.xxx.xxx:62889/metrics/node/host queries metrics of the host.
  • http://xxx.xxx.xxx.xxx:62889/metrics/node/ob queries metrics of the OceanBase cluster.
  • http://xxx.xxx.xxx.xxx:62889/metrics/ob/perSecond queries different types of metrics at the cluster level every second.
  • http://xxx.xxx.xxx.xxx:62889/metrics/ob/perMinute queries different types of metrics at the cluster level every minute.
  • Timed task of OCP: OCP-Server sets a timed task to check the local host and generates an alert when the trigger condition is met.
  • Alert templates Describes the overview and details templates of each alert, and provides an example for each template.
    Impact on the system Describes the impact that may be caused on the system when the alert is triggered.
    Possible causes Describes the possible causes of an alert to help you locate and handle the alert.
    Troubleshooting method Shows you how to solve the problems that caused the alert.

    When the default threshold cannot meet your requirement, you can modify the alert thresholds. When you do not want to receive some alerts, you can block them. For more information, see Create a blocking condition.

    Concepts

    Alert target

    An alert target is a target that is monitored by the alert task and uniquely identifies an alert. It can be an OceanBase cluster, a server, or a service.

    Based on the alert item, an alert target can be a tag value or a combination of tag values. For example, obregion=obocp:svr_ip=*.*.*.* identifies a server in the OceanBase cluster or OCP cluster.

    Alert scope

    The alert scope defines the scope of an alert and is consistent with the metric scope. For example, when the CPU utilization exceeds the threshold, it can be a problem for the entire cluster, the tenant, or a single server.

    The alert scope includes an OceanBase cluster (OBCluster), a tenant in the OceanBase cluster (OBTenant), an application cluster (AppCluster), a service, a host, and a process. The process is reserved.

    Alert level

    Each alert item has an alert level.

    Alert level Meaning Color Description
    1 Stopped Purple The system is completely unavailable and needs immediate recovery. For example, An OceanBase Database service cannot be started.
    2 Critical Red The system availability decreases and the necessary measures must be taken to prevent the system from becoming completely unavailable. For example, The memory usage of a server exceeds the threshold of 90% and this condition has lasted for 3 minutes.
    3 Warning Orange The system is still available but it is about to become unavailable. You must take measures to prevent the reduction of availability. For example, The proportion of connections of an OceanBase Database tenant exceeds the threshold of 80%.
    4 Caution Blue Based on the trend, you can tell that the important performance metrics of the system are declining. You can locate potential problems through troubleshooting to prevent the trigger of alerts. This alert level is reserved but no alert matches this level at present.
    5 Reminder Green Technically, a reminder is not an alert. It usually indicates that an administrator has performed an important operation. For example, The administrator deleted a cluster. After alerts at this level are cleared, no notification is generated.

    Contact Us