Description
OceanBase Database is a read/write splitting system. Internal data of OceanBase Database is divided into baseline data in the SSTable format and incremental data in the MemTable format based on the storage method.
The incremental data is all the new data generated after the point of time of the current major compaction. It is usually stored in the MemTable and also instantiated into a commit log file on a disk. During a minor compaction in OceanBase Database, the MemTable is first frozen and then dumped to the disk, to release the memory space occupied by the MemTable. If the MemTable of an OceanBase Database tenant has been frozen for longer than 10 minutes, this alert is triggered.
Principle
The following table describes the key parameters that are involved in the monitoring and alerting logic.
| Parameter | Value |
|---|---|
| Metric | ob_memtable_snapshot_max_duration_seconds |
| Source | sql SELECT /*+ PARALLEL(2), ENABLE_PARALLEL_DML, MONITOR_AGENT READ_CONSISTENCY(WEAK) */ __all_tenant.tenant_id, __all_tenant.tenant_name, snapshot.max_snapshot_duration_seconds FROM __all_tenant INNER JOIN (SELECT tenant_id, max(UNIX_TIMESTAMP(NOW()) - snapshot_version/1000000) max_snapshot_duration_seconds FROM __all_virtual_table_mgr WHERE table_type=0 and is_active=0 and svr_ip=? and svr_port=? GROUP BY tenant_id ) snapshot ON __all_tenant.tenant_id = snapshot.tenant_id |
| Collected metric | ob_memtable_max_snapshot_duration_seconds |
| Metric expression | max(ob_memtable_max_snapshot_duration_seconds{@LABELS}) by (@GBLABELS) |
| Collection cycle | 60 seconds |
Alert rule
| Metric | Default threshold (unit: s) | Source | Detection cycle | Time before clearance |
|---|---|---|---|---|
| ob_memtable_snapshot_max_duration_seconds | 600 | SQL collection | 60 seconds | 5 minutes |
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| Based on the expression of the metric | Critical | Tenant |
Alert templates
Alert overview
Template: ${alarm_target} ${alarm_name}
Example: ob_cluster=obcluster:1001,tenant_name=tenant1,svr_ip=192.168.1.1 MemTables that have been frozen for a long time exist in the OceanBase Database tenant.
Alert details
Template: Cluster: ${ob_cluster_name}; Tenant: ${tenant_name}; Host: ${svr_ip} (zone: ${obzone}); Alert: MemTables that have been frozen for a long time exist in the OceanBase Database tenant. The longest freeze time is ${value_shown}, which has exceeded the threshold of ${alarm_threshold} seconds.
Example: Cluster: obcluster; Tenant: tenant1; Host: 192.168.1.1 (zone: zone1); Alert: MemTables that have been frozen for a long time exist in the OceanBase Database tenant. The longest freeze time is 1 hour, which has exceeded the threshold of 600 seconds.
Impact on the system
During a minor compaction in OceanBase Database, the MemTable is first frozen and then dumped to the disk, to release the MemStore space occupied by the MemTable, thereby ensuring continuous high performance of data write services. If MemTables that have been frozen for a long time exist, the memory space may be exhausted, resulting in insufficient resources and affecting the OBServer stability.
Possible causes
A large amount of data is concurrently imported.
The write speed of the MemStore is quicker than the minor compaction speed of the system. Error 4030 is reported, displaying "Over tenant memory limits".
Solutions
Figure out the cause of slow minor compaction and improve the minor compaction speed, or reduce the write speed to resolve this problem.
Contact the administrator of OceanBase Database to confirm whether the minor compaction speed is as expected and whether it is necessary to increase the minor compaction speed. The following parameters can be configured.
freeze_trigger_percentage: The MemTable is frozen when the MemStore usage reaches this threshold.minor_merge_concurrency: the number of universal minor compaction threads. If the value of this parameter is set to 0, 10 threads are used, as defined in the OceanBase Database kernel.
Reduce the MemStore write speed.
Notice
Reducing the MemStore write speed may affect the SQL response time on the user side. Set the query timeout duration to a proper value.