Description
In OceanBase Cloud Platform (OCP), a transaction that satisfies either of the following two conditions is considered a large transaction:
The log volume of a single participant exceeds 0.5 MB. (This option is supported in OceanBase Database V3.2 and later.)
The transaction execution time exceeds 500 ms.
Note
This alert applies only when the log volume exceeds 0.5 MB. The ob_tenant_slow_sql_exists alert applies when the transaction execution time exceeds 500 ms.
Principle
The following table describes the key parameters that are involved in the monitoring and alerting logic.
| Parameter | Value |
|---|---|
| Metric | trans_max_log_size_mb |
| Source | select /*+ MONITOR_AGENT READ_CONSISTENCY(WEAK) QUERY_TIMEOUT(%d)*/ a.tenant_id, b.tenant_id as tenant_id_xa, a.trans_id, `partition`, floor(ctx_create_time) as ctx_create_time, session_id, participants, trans_type, part_trans_action, sql_no, log_size_byte from (select tenant_id, svr_ip, trans_id, `partition`, unix_timestamp(ctx_create_time) *1000000 as ctx_create_time, session_id, participants, (pending_log_size + flushed_log_size) as log_size_byte, trans_type, part_trans_action, sql_no from __all_virtual_trans_stat where svr_ip = ? and svr_port = ? and is_exiting != 1) a left join __all_virtual_global_transaction b on a.tenant_id = b.tenant_id and a.trans_id = b.trans_idselect max(log_size_byte)/1024/1024 as trans_max_log_size_mb from ob_hist_trans_stat |
| Collected metric | log_size_byte |
| Metric expression | max(log_size_byte)/1024/1024 |
| Collection cycle | 60 seconds |
Note
The system parameter ocp.alarm.datasource.trans-min-log-size-mb specifies the log volume threshold, in MB, that triggers an alert. The value of trans_max_log_size_mb must be equal to or greater than the value of this system parameter. The default value is 0.5 MB.
Alert rule
| Metric | Default threshold (unit: MB) | Source | Detection cycle | Time before clearance |
|---|---|---|---|---|
| trans_max_log_size_mb | 0.5 | Tenant | 60 seconds | 5 minutes |
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| Based on the expression of the metric | Critical | Tenant |
Alert templates
Alert overview
Template: ${alarm_target} ${alarm_name}
Example: ob_cluster=obcluster-2:tenant_name=orac2:trans_hash={hash:10801753558860391353, inc:59202486, addr:"xxx.xxx.xxx.xxx:2882", t:1646993121179509} A large transaction exists in the OceanBase Database tenant.
Alert details
Template: Cluster: ${ob_cluster_name}; Tenant: A large transaction exists in the ${tenant_name} tenant; Session ID: ${session_id}; Transaction ID: ${trans_hash}; Maximum Log Volume: ${trans_max_log_size_mb} MB; Transaction Type: ${trans_type}; Transaction Created At: ${trans_create_time}
Example: Cluster: obcluster; Tenant: A long-running transaction exists in the orac2 tenant; Session ID: 3221635048; Transaction ID: {hash:10801753558860391353, inc:59202486, addr:"xxx.xxx.xxx.xxx:2882", t:1646993121179509}; Maximum Log Volume: 0.7 MB; Transaction Type: distribute; Transaction Created At: 2022-03-11T18:05:21.184+08:00
Impact on the system
Large transactions may cause system memory problems. For example, the playback on the follower is out of order. If a minor compaction needs to be performed due to insufficient memory, transactions that are played back before they are committed need to be migrated. Both transaction migration and playback to the minor compaction point consume memory. This drains the memory so that both tasks cannot be completed. Similar issues may occur during transaction execution on the leader.
Possible causes
Too much data is committed in a single transaction.
Solutions
Use the transaction diagnostics feature in the tenant management module in OCP to identify the cause and perform troubleshooting accordingly.
If the transaction is expected to be a large one in business, check whether the business can be split to avoid a large transaction, or increase the alert threshold.