Description
This alert is triggered when the exporter becomes unavailable.
Principle
The following table describes the key parameters that are involved in the monitoring and alerting logic.
| Parameter | Value |
|---|---|
| Metric | monitor_exporter_avaliable |
| Source | select instance and status from ocp_exporter_address; The value of the status field is assigned to the collected metric. Other fields are used as labels. |
| Collected metric | monitor_exporter_avaliable |
| Metric expression | avg(monitor_exporter_avaliable{@LABELS}) by (@GBLABELS) max(ntp_offset{app="HOST",@LABELS}) by (@GBLABELS) |
| Collection cycle | 60 seconds |
Note
Unlike other expression-triggered alerts, this alert is triggered based on the status information of the exporter. The status information is monitored and collected by the OCP-Server by executing the preceding statement in MetaDB.
The value of the metric monitor_exporter_avaliable indicates the status of the exporter. The value 1 indicates that the exporter is available and the value 0 indicates that the exporter is unavailable.
This alert is triggered when the value of the metric is 0.
Alert rule
| Metric | Default threshold | Duration | Detection cycle | Time before clearance |
|---|---|---|---|---|
| monitor_exporter_avaliable | 0 | 300 seconds | 60 seconds | 5 minutes |
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| Metric expression | Caution | Server |
Alert templates
Overview: ${alarm_target} ${alarm_name}
Details: ${alarm_target} ${alarm_name}
Overview example: exporter_addr=
http://192.168.0.1:8089/metrics/ob/perSecond:exporter_type=OB_CLUSTER:scrape_interval=1. The monitored exporter is abnormal.Details example: exporter_addr=
http://192.168.0.1:8089/metrics/ob/perSecond:exporter_type=OB_CLUSTER:scrape_interval=1. The monitored exporter is abnormal.
${alarm_target} indicates the object that generated the alert, in the exporter_addr=xxx:exporter_type=xxx:scrape_interval=xxx format. exporter_addr indicates the address of the exporter. exporter_type indicates the type of the exporter. scrape_interval indicates the collection cycle.
Impact on the system
No monitoring data is available in the OCP console. The system running status cannot be viewed in real time, and monitoring-related alerts cannot be triggered.
Possible causes
OCP-Agent is abnormal and does not return monitoring data.
The network connection is disconnected. As a result, OCP cannot access the monitored URL.
Suggested solutions
Run the following command on the OCP host to check whether the monitored URL specified for exporter_addr is accessible:
curl http://192.168.0.1:8089/metrics/ob/perSecond
Note
http://192.168.0.1:8089/metrics/ob/perSecond indicates the value of the exporter_addr in the alert information.
If this URL is inaccessible, a network connection error may have occurred.
Check whether the network connection between OCP and this host is available. For more information, see Network troubleshooting.
If the URL is accessible, OCP-Agent fails.
Troubleshoot and resolve the problem. For more information, see OCP-Agent O&M script.