Alert description
This alert is triggered when no clock synchronization service (Chrony or NTP) is managed by OCP.
Alert principle
The following table lists the key parameters involved in the monitoring logic of this alert.
| Parameter | Value |
|---|---|
| Monitoring metric | host_ntp_service_exist |
| Metric source | pgrep chronyd # Check if the Chrony service exists. pgrep ntpd # Check if the NTP service exists. |
| Metric collection | ntp_service_exist |
| Monitoring expression | max(ntp_service_exist{app="HOST",@LABELS}) by (@GBLABELS) |
| Metric collection interval | 1 second |
Note
The metric source of this alert is special. OCP-Agent checks the local clock synchronization service using the above commands. By default, the local host uses the Chrony service to synchronize the clock. Therefore, OCP-Agent checks for the existence of the clock synchronization service in the order of Chrony first and then NTP.
The value of the monitoring metric host_ntp_service_exist indicates whether the clock service exists. A value of 1 indicates that the service exists, and a value of 0 indicates that it does not. If the value is 0, the alert is triggered.
Rule Information
| Monitoring metric | Default threshold | Duration | Detection cycle | Elimination cycle |
|---|---|---|---|---|
| host_ntp_service_exist | 0 | 0 | 60 seconds | 5 minutes |
Alert Information
| Alert trigger method | Alert level | Scope |
|---|---|---|
| Based on the monitoring metric expression | Critical | Server |
Alert templates
Overview
- Template: ${alarm_target} ${alarm_name}
- Example: svr_ip=xxx.xxx.xxx.xxx Server Clock Synchronization Service Does Not Exist
Details
- Template: Cluster: ${ob_cluster_name}, Host: ${host}, Alert: ${alarm_name}. The server clock synchronization service (NTP or Chrony) does not exist.
- Example: Cluster: obcluster-1, Host: xxx.xxx.xxx.xxx, Alert: Server Clock Synchronization Service Does Not Exist. The server clock synchronization service (NTP or Chrony) does not exist.
Alert recovery
- Template: Alert: ${alarm_name}, Server Clock Synchronization Service Does Not Exist: ${recover_value}
- Example: Alert: Server Clock Synchronization Service Does Not Exist, Server Clock Synchronization Service Does Not Exist: 1
Here, ${alarm_target} indicates the object that generated the alert and is in the format of svr_ip=xx.xx.xx.xx.
Impact on the system
After running for some time, clocks on servers may become out of synchronization, which can cause OceanBase clusters to become unavailable.
Possible causes
The server does not have a clock synchronization service (Chrony or NTP) installed.
The clock synchronization process (chronyd or ntpd) for the server unexpectedly exits.
Procedure
Run the following commands to check whether the server has a clock synchronization service (Chrony or NTP) installed.
rpm -qa|grep chrony #Check whether Chrony is installed. rpm -qa|grep ntp #Check whether NTP is installed.If the command returns the version information, the service is installed. Proceed to step 2.
If the command does not return any information, the service is not installed. If neither Chrony nor NTP is installed, install a clock synchronization service.
The installation and configuration of Chrony and NTP can be found in online examples. This section provides only a brief description.
Run the following command to install a clock synchronization service. You can install either Chrony or NTP.
yum install -y chrony #Install Chrony. yum install -y ntp #Install NTP.Run the following command to start a clock synchronization service.
systemctl start chronyd #Start Chrony. systemctl start ntpd #Start NTP.Manually clear the alert and observe whether the alert is reported again. If it is still reported, proceed to step 3.
Run the following commands to check whether the clock synchronization process (chronyd or ntpd) unexpectedly exits.
systemctl status chronyd #Check the status of the Chrony service. systemctl status ntpd #Check the status of the NTP service.If the Active field in the return value is active (running), proceed to step 3.
If the Active field in the return value is inactive (dead), the clock synchronization service is abnormal. Try the following command to restart the service.
systemctl restart chronyd #Restart the Chrony service. systemctl restart ntpd #Restart the NTP service.After restarting the service, manually clear the alert and observe whether the alert is reported again. If it is still reported, proceed to step 3.
Contact technical support to locate the cause.