ob_cluster_frozen_version_delta_over_threshold

Last Updated：2023-08-15 11:20:59 Updated

Description

This alert is triggered when the difference between the major freeze version number and the baseline version number exceeds the threshold.

Principle

The following table describes the key parameters that are involved in the monitoring and alerting logic.

Parameter	Value
Metric	ob_cluster_frozen_version_delta
Source	SQL: `javascript select zone, name, value, time_to_usec(now()) from __all_zone;` Note * The value of the time_to_usec(now()) field is used as the value of the current_timestamp metric. * The value of the value field is used as the value of the zone_value metric.
Collected metrics	current_timestamp and zone_value
Metric expression	max(zone_value{metric_group="all_zone",name="frozen_version",@LABELS}) by (@GBLABELS) - min(zone_value{metric_group="all_zone",name="last_merged_version",@LABELS}) by (@GBLABELS)
Collection cycle	1 second

The value of the metric ob_cluster_frozen_version_delta indicates the difference between the major freeze version number and the baseline version number of the OceanBase cluster. When the difference exceeds the threshold, this alert is triggered. The default threshold is 1.

Alert rule

Metric	Default threshold	Duration	Detection cycle	Time before clearance
ob_cluster_frozen_version_delta	1	0 seconds	60 seconds	5 minutes

Alert information

Trigger method	Alert level	Scope
Metric expression	Critical	Cluster

Alert templates

Overview: ${alarm_target} ${alarm_name}
Details: ${alarm_target} ${alarm_name}. The version difference is ${value}, exceeding the threshold of ${alarm_threshold}.
Overview example: ob_cluster=C1-1000. The difference between the major freeze version number and the baseline version number of the OceanBase cluster exceeds the threshold.
Details example: ob_cluster=C1-1000. The difference between the major freeze version number and the baseline version number of the OceanBase cluster is 2.0, exceeding the threshold of 1.0.

Impact on the system

This problem affects the statement response time. In extreme cases, the OBServer memory is overused, the application stops writing, and the clog disk is full.

Possible causes

This problem is commonly found in the following scenarios:

You start a major compaction task when an OceanBase cluster is undergoing an automatic compaction.
You continuously start several major compaction tasks.
You start a new major compaction task before the last major compaction is finished.

Community Edition

Enterprise Edition