Description
This alert is triggered when an exporter becomes unavailable.
Currently, the following eight exporters are involved:
- Three exporters for collecting OBServer monitoring data:
/metrics/node/ob,/metrics/ob/basic, and/metrics/ob/extra - Two exporters for collecting OBProxy monitoring data:
/metrics/node/obproxyand/metrics/obproxy - One exporter for collecting host monitoring data:
/metrics/node/host - Two exporters for collecting OCP-Agent monitoring data:
:62888/metrics/statand:62889/metrics/stat
If both the observer and obproxy processes are deployed on the same host, eight exporters are expected to exist on the host. When it is detected that less than eight exporters are active, this alert is triggered.
Note
An exporter is an API for collecting monitoring data on a host.
Principle
The ocp_exporter_address table of the ocp_meta tenant records the information about each exporter. If the status of an exporter is inactive, the exporter may fail to collect monitoring data.
The following table describes the key parameters that are involved in the monitoring and alerting logic.
| Parameter | Value |
|---|---|
| Metric | monitor_exporter_available |
| Source | The exporter status that is collected from the memory. Note that the status in the ocp_exporter_address table may lag behind the actual status. |
| Collected metric | monitor_exporter_available |
| Metric expression | avg(monitor_exporter_available{@LABELS}) by (@GBLABELS) max(ntp_offset{app="HOST",@LABELS}) by (@GBLABELS) |
| Collection cycle | 60 seconds |
Note
Unlike other expression-triggered alerts, this alert is triggered based on the status information of an exporter. The status information is monitored and collected by the OCP-Server by executing the preceding statement in MetaDB.
The value of the monitor_exporter_available metric indicates the status of an exporter. The value 1 indicates that the exporter is available and the value 0 indicates that the exporter is unavailable.
This alert is triggered when the value of the metric is 0.
Alert rule
| Metric | Default threshold | Duration | Detection cycle | Time before clearance |
|---|---|---|---|---|
| monitor_exporter_avaliable | 0 | 300 seconds | 15 seconds | 5 minutes |
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| Based on the expression of the metric | Notice | Server |
Alert templates
Overview template: ${alarm_target} ${alarm_name}
Details: Host: ${ host}; Alert: The monitoring exporter ${exporter_addr}(Type: ${ exporter_type}, Collection interval: ${ scrape_interval} seconds) is abnormal.
Overview example: exporter_addr=http://xxx.xxx.xxx.xxx:62889/metrics/ob/basic:exporter_type=OB_CLUSTER:scrape_interval=1 monitoring exporter exception
Details example: Host: xxx.xxx.xxx.xxx; Alert: The monitoring exporter http://xxx.xxx.xxx.xxx:62889/metrics/ob/basic(Type: OB_CLUSTER, Collection interval: 1 second) is abnormal.
${alarm_target} indicates the object that generated the alert, in the exporter_addr=xxx:exporter_type=xxx:scrape_interval=xxx format. exporter_addr indicates the URL of the exporter. exporter_type indicates the type of the exporter. scrape_interval indicates the collection cycle.
Impact on the system
No monitoring data is available in the OCP console. The system running status cannot be viewed in real time, and monitoring-related alerts cannot be triggered.
Possible causes
OCP-Agent is abnormal and does not return monitoring data.
The network connection is disconnected. As a result, OCP cannot access the monitored URL.
Solutions
The ocp_exporter_address table records the exporter status, which may lag behind the actual status. If the status changes to inactive, OCP will decrease the frequency of collecting exporter status data. Therefore, the occasional inactive status of an exporter is not immediately updated to the ocp_exporter_address table.
You can call the unix-socket operation on the server where the exporter in the
inactivestate resides to check whether the exporter can be accessed.sudo curl -s --unix-socket /home/admin/ocp_agent/run/ocp_monagent.$(cat /home/admin/ocp_agent/run/ocp_monagent.pid).sock http://unix-socket-server/metrics/ob/basicCheck the network connectivity between OCP server and the server where the exporter in the
inactivestate resides: Run the following command on the OCP server to check whether the URL of the exporter is accessible: Since OCP V3.2.0, authentication is enabled for the exporters on each server by default. You can configure the system parameterocp.agent.auth.metric-auth-enabledto temporarily disable authentication.curl http://xxx.xxx.xxx.xxx:62889/metrics/ob/basicNote
http://xxx.xxx.xxx.xxx:62889/metrics/ob/basicindicates the value ofexporter_addrin the alert information.If this URL is inaccessible, a network connection issue may have occurred.
Check whether the network connection between OCP and this server is available. For more information, see Network troubleshooting.
If the URL is accessible, OCP-Agent is faulty.
Troubleshoot and resolve the problem. For more information, see OCP-Agent O&M script.