Alert description
This alert is triggered when the clock offset between the server managed by OCP and the clock source exceeds 100 milliseconds.
Alerting principle
The following table describes the key parameters involved in the alerting monitoring logic.
| Parameter | Value |
|---|---|
| Monitoring metric | host_ntp_offset_milliseconds This metric indicates the time difference between the server and the clock source. An alert is triggered when the time difference exceeds the threshold (default 100 ms) or when the clock synchronization service is unavailable. |
| Metric source | The metric source is relatively special. OCP-Agent uses the clock service to obtain clock offset data. By default, the clock offset is collected using ntpq. The collection command is ntpq -p, and the offset field is taken from the data lines that start with an asterisk (*). If ntpq is not installed on the host but chronyc is available, the clock offset is collected using chronyc. The collection command is chronyc tracking -n, and the Last offset is taken from the output. |
| Metric collected | node_ntp_offset_seconds |
| Monitoring expression | max(abs(node_ntp_offset_seconds{@LABELS})) by (@GBLABELS) * 1000 |
| Collection interval | 1s |
Rule Information
| Monitoring Metric | Default Threshold (ms) | Duration | Detection Cycle | Elimination Cycle |
|---|---|---|---|---|
| host_ntp_offset_milliseconds | 100 | 0 | 60 seconds | 5 minutes |
Alert Information
| Alert Trigger Method | Alert Level | Scope |
|---|---|---|
| Expression based on monitoring metrics | Down | Server |
Alert Template
Alert Overview
- Template: ${alarm_target} ${alarm_name}
- Example: svr_ip=xxx.xxx.xxx.xxx The server has a large offset from the clock source.
Alert Details
- Template: Cluster: ${ob_cluster_name}, Host: ${host}, Alert: ${alarm_name}. The clock offset (${value_shown}) exceeds the threshold ( ${alarm_threshold} ms).
- Example: Cluster: obCluster: obcluster-1, Host: xxx.xxx.xxx.xxx, Alert: The server has a large offset from the clock source. The clock offset (104 ms) exceeds the threshold (100 ms).
Alert Recovery
- Template: Alert: ${alarm_name}, Server offset from the clock source: ${value_shown}
- Example: Alert: The server has a large offset from the clock source, Server offset from the clock source: 50 ms
Here, ${alarm_target} indicates the object that triggered the alert, in the format of svr_ip=xx.xx.xx.xx.
Impact on the system
The distributed architecture of OceanBase Database relies on the host clock offset being within a specified range. If the clock offset exceeds the threshold, the OBServer nodes cannot function properly.
Possible causes
A large time difference between the server managed by OCP and the NTP server may be caused by the following reasons:
The NTP server (clock source) is abnormal.
The network between the server managed by OCP and the clock source is abnormal, preventing automatic time synchronization from the clock source.
The clock synchronization service is abnormal, preventing automatic time synchronization from the clock source.
Method of processing
Check if the clock of the OCP server is normal.
Check if multiple instances of the same alert are reported on OCP.
If not, proceed to step 2.
If it is, OCP server exceptions
We recommend that you first check that the OCP server is working properly and then refer to Step 4 to adjust the OCP server time. Wait for 5 minutes and observe whether the alert continues to be reported.
Verify the network connection between the hosts managed by OCP and the OCP server.
If this test result is normal, proceed to step 3.
If an exception occurs, adjust the network connection to make it normal, and then adjust the OCP server time according to the procedure in Step 4. After 5 minutes, observe whether the alert is still reported.
Check whether the clock synchronization service is working normally.
When a server managed by the OCP component is synchronizing clocks and the synchronization fails, an alert is reported host_ntp_service_not_exist The NTP service does not exist on the server. To address this issue, follow these steps.
Resolve the alert by referencing host_ntp_service_not_exist server clock synchronization service does not exist.
Wait for 5 minutes and observe if the alert is still reported.
If reported, continue with step 4.
If not reported, the problem is resolved.
If both the clock source and host clock service are working properly, a short period of clock desynchronization between the host and clock source might be due to a power failure or other reason.
Clock synchronization can be manually triggered on a managed server to ensure the clock offset falls within the expected range.
Please run one of the following commands based on the clock synchronization service installed on the server.
#Use this command to synchronize data when the NTP service is in use. xxx.xxx.xxx.xxx indicates the IP address of the clock source, which you can define. ntpdate xxx.xxx.xxx.xxx # When using the Chrony service, use this command to synchronize. Please make sure that the clock source has been configured in the /etc/chrony.conf file. chronyc -a makestep