Description
This alert is triggered when the major compaction of a zone of an OceanBase cluster fails to be completed in three hours.
Note
Major compaction timed out does not mean that the major compaction has stopped. Instead, the major compaction is still in progress. The timeout alert is sent to remind you that the operation remains unfinished after the expected finish time. You need to find out the causes and take necessary actions.
Principle
The following table describes the key parameters that are involved in the monitoring and alerting logic.
| Parameter | Value |
|---|---|
| Metric | ob_cluster_merge_timeout_flag |
| Source | SQL: select zone, name, value, time_to_usec(now()) from __all_zone;
NoteThe value of the value field is used as the value of the zone_value metric, and the values of other fields are used as labels. |
| Collected metric | zone_value |
| Metric expression | max(zone_value{metric_group="all_zone",name="is_merge_timeout",@LABELS}) by (@GBLABELS) |
| Collection cycle | 1 second |
For a major compaction task that is initiated automatically or manually, after the task run time exceeds 10,800 seconds (three hours), the flag bit is set and the major compaction status is updated in the __all_zone table.
Note
The default value of the zone_merge_timeout parameter is 10800 seconds, which can be customized.
The value of the metric indicates the cluster compaction timeout status. Valid values: 0 and 1.
Note
When the value is 0, the cluster compaction is normal. When the value is 1, the cluster compaction timed out.
Alert rule
| Metric | Default threshold | Duration | Detection cycle | Time before clearance |
|---|---|---|---|---|
| ob_cluster_merge_timeout_flag | 1 | 0 seconds | 60 seconds | 5 minutes |
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| Based on the expression of the metric | Critical | Cluster |
Alert templates
Alert overview: ${alarm_target} ${alarm_name}
Alert details: ${alarm_target} ${alarm_name}
Overview example: ob_cluster=C1-1000. OceanBase cluster compaction timed out.
Details example: ob_cluster=C1-1000. OceanBase cluster compaction timed out.
Impact on the system
When a major compaction times out, the storage pressure of the disk increases. If users go on writing data to it, the disk will be full, blocking services of the server. When you use OceanBase Database of an earlier version, the memory will be used up and the cluster will stop writing.
Possible causes
This problem is commonly found in the following scenarios:
The OceanBase cluster has a huge amount of data to compact.
An error occurred with the disk.
The major compaction efficiency of the disk medium is low.
When the CPU and memory specifications are the same, the major compaction efficiency depends on the disk medium. For example, a Serial Advanced Technology Attachment (SATA) disk has a lower major compaction efficiency than that of a Solid State Disk (SSD).
Solutions
If the problem is caused by a huge amount of data, modify the timeout value or the number of threads for major compaction. If the problem is caused by a disk failure, replace the faulty node. For more information, see Exception handling for OceanBase cluster compaction.
Check whether the major compaction is stuck on the Compaction Management page of the OceanBase Cloud Platform (OCP) console.
Go to the Compaction Management page.

View the major compaction details on the Details of Major Compaction tab. The tab displays the estimated progress, total number of partition replicas, and the number of replicas for which major compaction has been completed. Check the number of replicas for which major compaction has been completed every minute. If the number increases, the major compaction is not stuck.

View the major compaction statistics. If the elapsed time of major compaction significantly increases compared with the last three major compactions with no business traffic surge in the recent days, an error may have occurred.

Based on the preceding analysis, if the major compaction is stuck, check for the faulty OBServer. For more information, see Exception handling for OceanBase cluster compaction. If the faulty OBServer must be replaced, replace it by referring to Replace an OBServer.
Based on the preceding analysis, if the major compaction is not stuck, go to the next step.
Check whether a large amount of business data needs to be compacted or the major compaction efficiency of the disk medium is low. Adjust the timeout value or the number of threads for major compaction. Perform the following steps:
Choose Cluster Overview > Compaction Management > Configuration for Major Compaction > Major Compaction Strategy .
Click Edit .
Modify the Worker Threads for Major Compaction and Major Compaction Timeout Period parameters.

Click Save .