ob_cluster_sync_delay_time_too_long |V3.1.2|OceanBase Cloud Platform| docs|Distributed Database

ob_cluster_sync_delay_time_too_long

Last Updated：2023-08-15 11:21:17 Updated

Description

The primary and standby clusters synchronize the redo logs to keep the data consistent. In asynchronous log transmission mode, the primary and standby clusters transmit the logs with some delay. Typically, the delay does not exceed 10 minutes. Otherwise, the alert is triggered.

Principle

The following table describes the key parameters that are involved in the monitoring and alerting logic.

Parameter	Value
Metric	sync_delay_time
Source	OCP-Server uses the value of the CURRENT_SCN field of the internal table v$OB_CLUSTER as the value of the metric. Note OCP-Server starts to collect the data one hour after the OceanBase cluster is created and contains a user tenant other than a system tenant in the primary cluster. The value of the CURRENT_SCN field indicates the time point when the primary and standby clusters reach data consistency. The synchronization delay is the difference between this value and the current time.
Collected metric (unit: s)	sync_delay_time
Metric expression	max(sync_delay_time{@LABELS}) by (@GBLABELS)
Collection cycle	60 seconds

The value of the metric sync_delay_time indicates the synchronization delay between the primary and standby OceanBase clusters. When this value is greater than the threshold, this alert is triggered. The default threshold is 600s.

Alert rule

Metric	Default threshold (unit: s)	Duration	Detection cycle	Time before clearance
sync_delay_time	600 seconds	0 seconds	60 seconds	5 minutes

Alert information

Trigger method	Alert level	Scope
Metric expression	Warning	Cluster

Alert templates

Overview: ${alarm_target} ${alarm_name}
Details: ${alarm_target} ${alarm_name}. The log transmission latency is ${value}s, exceeding the threshold of ${alarm_threshold}s.
Overview example: ob_cluster=cluster-76. The latency of Oceanbase clusters synchronization is too long.
Details example: ob_cluster=cluster-76. The latency of Oceanbase clusters synchronization is too long. The log transmission latency is 3994.293s, exceeding the threshold of 600.0s.

${alarm_target} follows the ob_cluster=xxxxxxx format. ob_cluster indicates the name of the cluster that generated the alert.

Impact on the system

Extended latency of the standby cluster synchronization may lead to data inconsistency and interrupt the primary/standby switchover.

Possible causes

The network connection between the primary and standby clusters are disconnected.
The standby cluster is overloaded or has insufficient resources.
Abnormal servers exist in the primary or standby cluster.
The primary cluster is unavailable.
The standby cluster has a suspended tenant synchronization task.