This topic describes the concepts of alert target, alert scope, alert item, alert, alert item group, alert level, template, alert aggregation, and alert clearance.
Alert target
An alert target is a target that is monitored by the alert task and uniquely identifies an alert. It can be an OceanBase cluster, a server, or a service.
Based on the alert item, an alert target can be a tag value or a combination of tag values, such as obregion=obocp:svr_ip=*.*.*.*identifies a server in an OceanBase cluster or OceanBase Cloud Platform (OCP) cluster.
Alert scope
The alert scope defines the scope of an alert and is consistent with the metric scope. For example, when the CPU utilization exceeds the threshold, it can be a problem for the entire cluster, the tenant, or a single server.
Valid values of the alert scope:
Cluster
Tenant
Server
Alert item
An alert item is the metadata of an alert. It consists of many elements such as the alert type, name, trigger rule, alert overview template, and alert details template.
Alert items can be divided into two types based on how they are generated:
Expression-triggered alert items : Alert items that are created on the console and generated by the alert rule engine based on the monitoring metrics.
Custom-triggered alert items : Alert items that are automatically triggered by other components.
For expression-triggered alert items, the alert rule expressions are configured in the alert items. For custom-triggered alert items, the alert rule expressions are empty.
Alert
An alert is a notification generated by the system when an alert item occurs on a notification target.
For example, when the alarm_b alert item occurs on Server A (an alert target), the alert signal is sent every minute. However, it is counted as one alert on OCP, and only one record is displayed on the Alert Events tab.
Alert item group
You can set multiple groups for each alert item for easy configuration of alert subscriptions.
Alert level
Each alert item has an alert level.
| Level | Meaning | Color | Description |
|---|---|---|---|
| 1 | Stopped | Purple | The system is completely unavailable and needs immediate recovery. For example, OBService cannot be started. |
| 2 | Critical | Red | The system availability decreases and the necessary measures must be taken to prevent the system from becoming completely unavailable. For example, the server memory usage exceeds the threshold of 90% and this condition has lasted for 3 minutes. |
| 3 | Warning | Orange | The system is still available but it is about to become unavailable. You must take measures to prevent the reduction of availability. For example, the proportion of connections of an OceanBase tenant exceeds the threshold of 80%. |
| 4 | Caution | Blue | Based on the trend, you can tell that the important performance metrics of the system are declining. You can locate potential problems through troubleshooting to prevent the trigger of alerts. This alert level is reserved but no alert matches this level at present. |
| 5 | Reminder | Green | Technically, a reminder is not an alert. It usually indicates that an administrator has performed an important action. For example, the administrator deleted a cluster. |
Template
Templates are used to generate dynamic content by using variables during the runtime. They can be used for:
Generating alerts (configuring the alert overview and and details templates for alert items)
Notification content (configuring the message and message aggregation templates for channels)
Channel parameters (for example, configuring the Header and Body content templates in the HTTP channel)
Sample template:
alarm_summary:
The CPU utilization of ${alarm_target} exceeds the threshold.``alarm_description:
The CPU utilization of ${alarm_target} has exceeded ${alarm_threshold} for ${alarm_duration}.
For more information about the supported template variables, see OCP alert template variables.
Alert aggregation
To avoid alert storms caused by too many alerts, you can aggregate the alert channels.
Aggregation rules:
Aggregate OceanBase log alerts by alert type, log error code, and OceanBase cluster.
Aggregate other OceanBase alerts by alert type and OceanBase cluster.
Aggregate application alerts by alert type and alert target.
Alert clearance
An alert is cleared when a fault is recovered. After that, the monitoring module identifies that the fault is resolved and notifies the alert service, or the alert service clears the alert automatically after the clearance timeout expires.
The logic of clearing alerts after a timeout period:
Each alert item has a check cycle and ignorance cycle.
During the new check cycle, the monitoring module calls the alert API to set the alert item as cleared if it finds that the alert item meets the clearance criteria.
When the ignorance cycle expires, the alert item is considered cleared if the alert item is no longer reported.
Alert API
OCP provides HTTP-based RESTful APIs to help you manage OceanBase resources and develop your applications. Alert APIs describes the API operations and syntax and provides some examples. The API document is intended for developers. Before you call an API, make sure that you have fully understood the OCP-related concepts.
You can call an API based on the access method.
Quick start
Alerts and alert notifications are important monitoring features in OCP. Users of different roles perform different operations when they use these features.
After OCP is deployed, an administrator must perform the initial settings of the alert modules, such as adding alert items and configuring alert channels. An administrator also manages alert items and alert channels based on business requirements. An OCP user can subscribe to alerts of different modules, view alert events and notifications, or choose to block some alerts.
We recommend that you perform the following steps to configure the alert module in OCP:
If you log on as an administrator:
Configure alert channels.
Create alert items.
If you log on as a general user:
Subscribe to alerts.
View alert events.
Block some alerts (optional).