Description
This alert is triggered when the OBServer has long transactions that last more than the specified threshold. The threshold is 1200s by default.
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| Based on the expression of the metric | Critical | Server |
Alert rule
| Metric | Default threshold | Source | Duration | Detection cycle | Elimination cycle |
|---|---|---|---|---|---|
| ob_server_max_trans_duration_seconds | 1200 | Internal table __all_virtual_trans_stat of OceanBase Database | 0s | 60s | 5 min |
Alert templates
Overview: ${alarm_target} ${alarm_name}
Details: ${alarm_target} ${alarm_name}. The largest transaction lasts ${value}s, exceeding the threshold of ${alarm_threshold}s.
Overview example: ob_cluster=first:svr_ip=xxx.xxx.xxx.xxx. The OBServer has long transactions.
Details example: ob_cluster=first:svr_ip=xxx.xxx.xxx.xxx. The OBServer has long transactions. The longest transaction lasts 1500.0s, exceeding the threshold of 1200.0s.
${alarm_target} indicates the object that generates the alert. It follows the ob_cluster=xx:svr_ip=xx format. ob_cluster indicates the name of the OceanBase cluster that generates the alert. svr_ip indicates the IP address of the OBServer of the cluster that generates the alert.
Impact on the system
If the OBServer has long transactions, congestion and lock timeout may occur, which affects the execution of other transactions.
Possible causes
A minority problem occurred (the candidates fail to receive the majority of votes from other nodes), the disk is full, the memory is used up, or large transactions exist in the application.
Solution
Check the cluster status, and check whether problems such as a minority problem and network failure occur. The minority problem is usually caused by a server or network failure. View other alert items to check whether a server or network failure occurred.
When this alert is triggered, collect the following information. If the alert lasts more than 30 minutes, contact Technical Support.
View all server information and check for an abnormal server.
select * from __all_server;
View the current long transactions. The alert threshold of long transactions is 1200s by default.
SELECT * FROM __all_virtual_trans_stat WHERE part_trans_action <= 2 AND ctx_create_time < DATE_SUB(NOW(), INTERVAL 1200 SECOND) LIMIT 100;