tenant_active_memstore_percent_over_threshold

2023-08-22 02:46:05  Updated

Description

This alert is triggered when the active memory usage exceeds the threshold. Active memory usage = Active memory used (active memstore used)/Threshold to trigger freeze (major freeze trigger)

Principle

The following table describes the key parameters that are involved in the monitoring and alerting logic.

Parameter Value
Metric ob_tenant_host_active_memstore_percent Note The value of this metric indicates the percentage of active memory in the tenant. When the value is greater than the threshold, the alert is triggered. The default threshold is 110%.
Source SQL: select /*+read_consistency(weak)*/ tenant_name, tenant_id, stat_id, value from v$sysstat, __all_tenant where stat_id IN (130000, 130002) and (con_id > 1000 or con_id = 1) and__all_tenant.tenant_id = v$sysstat.con_id;
Note
The value of the value field is used as the value of the sysstat_value metric.
Collected metric sysstat_value
Metric expression 100 * sum(sysstat_value{metric_group="sysstat",stat_id="130000",@LABELS}) by (@GBLABELS) / sum(sysstat_value{metric_group="sysstat",stat_id="130002",@LABELS}) by (@GBLABELS)
Note
When the statistical event ID (stat_id) is 130000, the active memory space of the tenant is collected. When the stat_id is 130002, the memory usage threshold that triggers a freeze is collected. The ratio of the sum of the values of the value field when stat_id is 130000 to the sum of the values of the value field when stat_id is 130002 is used as the active memory usage of a tenant.
Collection cycle 1 second

Alert rule

Metric Default threshold (unit: %) Duration Detection cycle Time before clearance
ob_tenant_host_active_memstore_percent 110 0 seconds 60 seconds 5 minutes

Alert information

Trigger method Alert level Scope
Metric expression Critical Tenant

Alert templates

  • Overview: ${alarm_target} ${alarm_name}

  • Details: Cluster: ${ob_cluster_name}, Alert: ${alarm_name}, The active memory usage is ${value}%, exceeding the threshold of ${alarm_threshold}%.

  • Overview example: ob_cluster=C1-1000:tenant_name=tenant-1:svr_ip=xxx.xxx.xxx.xxx. The active memory usage of an OceanBase tenant exceeds the threshold.

  • Details example: Cluster: obcluster-1, Alert: The active memory usage of an OceanBase tenant exceeds the threshold. The active memory usage is 201.0%, exceeding the threshold of 200.0%.

Impact on the system

When this alert is triggered, the OBServer memory may be full, causing the application to stop writing in extreme scenarios.

Possible causes

When the used active memory space reaches the freeze threshold, the freeze and compaction operations are triggered to freeze the active memory (memtable), generate a new active memtable, compact data in the frozen memtable with the dumped data of the previous version (if any), and then persist the compacted data to the disk to release the memory. This alert is triggered when the write speed is faster than the dump speed (the speed of freezing, compacting, and persisting the data in the active memory) and the active memory usage exceeds the threshold before it can be dumped (frozen, compacted, and persisted to the disk).

Suggested solutions

  • Check whether the amount of business traffic is too large. For example, the amount of traffic can be large during data import.

    On the tenant page of the cluster, choose SQL Diagnosis > TopSQL . On the TopSQL page, check whether the quantity of SQL statements and the number of executions are too large.

    • If yes, the amount of business traffic is large.

      Apply throttling to the OceanBase cluster. For more information, see Apply throttling to an OceanBase cluster.

      Check whether the alert is cleared in the OCP console 10 minutes later. If the alert is not cleared, the problem may have been caused by other issues.

    • Otherwise, the alert is caused by other issues.

  • Check for minor compaction or major compaction exceptions.

    Typically, minor compaction and major compaction exceptions can be fixed by restarting the OBServer. For more information, see Restart an OBServer.

  • Check for disk write errors.

    If the business traffic is normal but the problem persists after you restart the OBServer, check for disk write errors. For example, the disk is damaged or the disk write is slow.

    In the case of a disk error, see Exception handling for OceanBase cluster compaction.

Contact Us