ob_host_ssd_wear_indicator_over_threshold

2023-08-15 11:20:54  Updated

Description

This alert is triggered when the server SSD usage in percentage exceeds the specified threshold.

Principle

The following table describes the key parameters that are involved in the monitoring and alerting logic.

Parameter Value
Metric ssd_wear_indicator
The server SSD usage in percentage. A higher value indicates a higher SSD usage.
Source The metric is obtained by executing the built-in script of StarAgent that detects the SSD usage.
Collected metric ssd_wear_indicator
Metric expression max(ssd_wear_indicator{@LABELS}) by (@GBLABELS)
Collection cycle 1 second

Alert rule

Metric Default threshold (unit: %) Duration Alert cycle Time before clearance
ssd_wear_indicator 95 0 seconds 60 seconds 5 minutes

Alert information

Trigger method Alert level Scope
Based on the metric expression Critical Server

Alert templates

  • Overview: ${alarm_target} ${alarm_name}

  • Details: Cluster: ${ob_cluster_name}. Host: ${host}. Alert: ${alarm_name}. The SSD usage reaches ${value_shown}, which exceeds ${alarm_threshold} %.

  • Overview example: svr_ip=192.168.0.1. The SSD usage of the server exceeds the specified threshold.

  • Details example: Cluster: obcluster-1. Host: 192.168.0.1. Alert: The SSD usage of the server exceeds the specified threshold. The SSD usage reaches 97 %, which exceeds 95 %.

Impact on the system

After the SSD usage reaches 100%, the SSD becomes unavailable. Therefore, it is necessary to monitor the SSD usage. Full SSD usage may have the following impacts:

  • Data write to the SSD fails. The data disk, log disk, and installation disk of OceanBase Database can be SSDs, and data is written to the SSDs during minor compactions, log recording, and major compactions.

  • Data read from the SSD fails. If data cannot be written to the SSD, the process can be stuck, which affects the data read.

Possible causes

The SSD is full.

Solutions

  1. Check whether the SSD is full or damaged.

  2. Contact OceanBase Technical Support to check the remaining write/erase cycles of the SSD by using specialized tools. If the SSD is almost worn out, replace the SSD in time.

Contact Us