Alert description
This alert is triggered when the disk of an OceanBase cluster server becomes read-only.
Alert principle
The following table describes the key parameters involved in the monitoring logic of this alert.
| Parameter | Description |
|---|---|
| Monitoring metric | ob_host_disk_readonly_flag |
| Metric source | Collected from node_exporter. |
| Collected metric | node_filesystem_readonly |
| Monitoring expression | max(node_filesystem_readonly{@LABELS}) by (@GBLABELS) |
| Collection interval | 1 second |
OCP-Agent periodically monitors whether the disk is readable based on the collection interval. The result is returned by the collected metric node_filesystem_readonly. If the disk is read-only, it returns 1; otherwise, it returns 0. For the disk that stores the installation directory, data directory, and log directory of OceanBase, the value of is_ob_disk is set to "1".
The value of monitoring metric ob_host_disk_readonly_flag indicates whether the disk that stores the installation directory, data directory, and log directory of the OBServer is read-only.
If the value of the monitoring metric is 1, the alert is triggered, indicating that the disk that stores the installation directory, data directory, and log directory of the OBServer is read-only.
Alert rule
| Monitoring metric | Default threshold | Duration | Detection interval | Elimination interval |
|---|---|---|---|---|
| ob_host_disk_readonly_flag | 1 | 0 seconds | 60 seconds | 5 minutes |
Alert information
| Alert trigger method | Alert level | Scope |
|---|---|---|
| Based on the expression of the monitoring metric | Critical | Server |
Alert template
Alert summary
- Template: ${alarm_target} ${alarm_name}
- Example: ob_cluster=obcluster-1:svr_ip=xxx.xxx.xxx.xxx OceanBase server disk is read-only
Alert details
- Template: Cluster: ${ob_cluster_name}, Host: ${host}, Alert: ${alarm_name}. The disk mount point ${mount_point} is read-only.
- Example: Cluster: obcluster-1, Host: xxx.xxx.xxx.xxx, Alert: OceanBase server disk is read-only. The disk mount point /dev/loop0 is read-only.
Alert recovery
- Template: Alert: ${alarm_name}, OBServer disk is read-only: ${recover_value}
- Example: Alert: OceanBase server disk is read-only, OBServer disk is read-only: 0
Here, ${alarm_target} is in the format of ob_cluster=xxxxxxx:svr_ip=xxxxxx.
ob_cluster specifies the name of the cluster that generated the alert.
svr_ip specifies the IP address of the OBServer that generated the alert.
Impact on the system
If the OB log disk and data disk are read-only, the OBServer cannot run normally.
If the log file disk is read-only, the system cannot output logs normally.
Possible causes
The disk is full.
Permission errors caused by the operating system.
Physical media errors, such as disk failures.
Procedure
Check whether the disk is full or the storage space is insufficient.
If the storage space is insufficient, the following alerts are reported. You can first refer to the documentation to handle the issue and then check whether the alerts described in this section are also resolved.
Other reasons.
The reasons for the Linux operating system disk being read-only are complex and fall within the scope of Linux operations and maintenance. We recommend that you contact a Linux O&M engineer to resolve the issue.
If you do not have a Linux O&M engineer, you can check the logs about the disk being read-only in
/var/log/messagesand search for solutions on www.baidu.com by pasting the error message.# List all read-only disks. grep "[[:space:]]ro[[:space:],]" /proc/mounts # The returned information is as follows, where /dev/loop0 indicates a read-only disk. tmpfs /sys/fs/cgroup tmpfs ro,nosuid,nodev,noexec,mode=755 0 0 /dev/loop0 /home/admin iso9660 ro,relatime,nojoliet,check=s,map=n,blocksize=2048 0 0 # You can also query the reason for the disk being read-only by replacing '/dev/loop0' with 'readonly'. grep '/dev/loop0' /var/log/messages -Rn