Alert description
This alert monitors the status of exporters. If the status of an exporter is unavailable, an alert is triggered.
Currently, there are 8 exporters. They are categorized as follows:
- 3 exporters collect monitoring data from OBServer nodes. The endpoints are:
/metrics/node/ob,/metrics/ob/basic, and/metrics/ob/extra. - 2 exporters collect monitoring data from OBProxy nodes. The endpoints are:
/metrics/node/obproxyand/metrics/obproxy. - 1 exporter collects host monitoring data. The endpoint is:
/metrics/node/host. - 2 exporters collect agent self-monitoring data. The endpoints are:
:62888/metrics/statand:62889/metrics/stat. Assume that an OBServer and an OBProxy are deployed on the same host. In this case, 8 exporters are expected on the host. If the number of functioning exporters on the host is less than 8, this alert is triggered.
Note
An exporter is an API that collects monitoring data from a host.
Alert principle
The ocp_exporter_address table in the ocp_meta tenant records the information about each exporter. If the status field of an exporter is inactive, the exporter may fail to collect monitoring data, which requires attention.
The following table describes the key parameters involved in the monitoring logic of this alert.
| Parameter | Value |
|---|---|
| Monitoring metric | monitor_exporter_avaliable |
| Metric source | Collected from the status of exporters in memory. Note that the status in the ocp_exporter_address table has a certain latency. |
| Metric to be collected | monitor_exporter_avaliable |
| Monitoring expression | avg(monitor_exporter_avaliable{@LABELS}) by (@GBLABELS) max(ntp_offset{app="HOST",@LABELS}) by (@GBLABELS) |
| Collection interval | 60 seconds |
Note
This alert is different from other alerts triggered by expressions. It is triggered by OCP-Server executing the preceding statement to monitor and collect exporter status information from MetaDB.
The value of the monitor_exporter_avaliable metric indicates whether the exporter is available. A value of 1 indicates that the exporter is available, and a value of 0 indicates that it is unavailable.
When the value of the monitor_exporter_avaliable metric is 0, an alert is triggered.
Rule information
| Monitoring metric | Default threshold | Duration | Detection interval | Elimination interval |
|---|---|---|---|---|
| monitor_exporter_avaliable | 0 | 300 seconds | 15 seconds | 5 minutes |
Alert information
| Alert trigger method | Alert level | Scope |
|---|---|---|
| Expression based on monitoring metrics | Notice | Server |
Alert template
Alert overview
- Template: ${alarm_target} ${alarm_name}
- Example: exporter_addr=http://xxx.xxx.xxx.xxx:62889/metrics/ob/basic:exporter_type=OB_CLUSTER:scrape_interval=1 A monitoring exporter is abnormal on the server.
Alert details
- Template: Host: ${host}, Alert: Monitoring exporter ${exporter_addr} (Type: ${exporter_type}, Collection interval: ${scrape_interval} seconds) is abnormal.
- Example: Host: xxx.xxx.xxx.xxx, Alert: Monitoring exporter http://xxx.xxx.xxx.xxx:62889/metrics/ob/basic (Type: OB_CLUSTER, Collection interval: 1 second) is abnormal.
Alert recovery
- Template: Alert: ${alarm_name}, Monitoring exporter status: ${recover_value}
- Example: Alert: A monitoring exporter is abnormal on the server, Monitoring exporter status: 1
Here, ${alarm_target} indicates the object that triggered the alert. The format is exporter_addr=xxx:exporter_type=xxx:scrape_interval=xxx. exporter_addr specifies the monitoring collection address, exporter_type specifies the monitoring collection type, and scrape_interval specifies the monitoring collection interval.
Impact on the system
No monitoring data is displayed on the OCP console, and the system runtime cannot be viewed in real time. Additionally, alerts related to monitoring cannot be reported.
Possible causes
The monitoring collection process OCP-Agent is abnormal and does not return monitoring data.
The network is disconnected, and OCP cannot access the monitoring address.
Procedure
The ocp_exporter_address table records the status of the exporter, but the status may be updated later: If the status changes to inactive, OCP will reduce the number of exporters it collects. Therefore, the status of an occasional exporter may not be updated immediately in the ocp_exporter_address table.
You can request the Unix socket interface on the faulty server to check whether the exporter is accessible.
sudo curl -s --unix-socket /home/admin/ocp_agent/run/ocp_monagent.$(cat /home/admin/ocp_agent/run/ocp_monagent.pid).sock http://unix-socket-server/metrics/ob/basicCheck whether the network from OCP to the faulty server is normal: Run the following command on the OCP server to check whether the monitoring collection address exporter is accessible.
curl http://xxx.xxx.xxx.xxx:62889/metrics/ob/basicIf it is not accessible, the problem may be a network issue.
You can refer to Network troubleshooting to check whether the network is faulty and ensure that the network between OCP and the server is connected.
If it is accessible, the problem is that the OCP-Agent process is faulty.
You can refer to OCP-Agent O&M scripts to check and resolve the issue.
Note
- Starting from V3.2.0, authentication is enabled by default for each exporter. You can temporarily disable authentication by configuring the system parameter
ocp.agent.auth.metric-auth-enabled. http://xxx.xxx.xxx.xxx:62889/metrics/ob/basicis the value of exporter_addr in the alert information.