Alert description
OceanBase 500 tenants may face potential resource allocation issues: the STORAGE_SHORT_TERM_META_CTX_ID module has reached 50% of the system memory and will no longer allocate memory resources. Since SSTables are variable-length, using the number to calculate the result is inaccurate. Therefore, monitoring is performed by querying all_virtual_memory_info, and an alert is triggered when it exceeds 30% of the memory of OceanBase 500 tenants.
Alert principle
| Parameter | Value |
|---|---|
| Monitoring metric | tenant500_storage_short_meta_hold_percentage, tenant500_storage_short_meta_hold_gb |
| Metric source | Collected from OceanBase virtual tables. 500 Tenant memory usage of the STORAGE_SHORT_TERM module:
select /*+ MONITOR_AGENT READ_CONSISTENCY(WEAK) */ sum(hold) as hold, sum(used) as used from __all_virtual_memory_info where tenant_id = 500 and svr_ip = ? and svr_port = ? and ctx_name = 'STORAGE_SHORT_TERM_META_CTX_ID' |
| Collected metric | ob_tenant500_storage_short_meta_memory_hold_bytes, ob_tenant500_memory_hold_bytes, tenant500_storage_short_meta_hold_gb |
| Monitoring expression |
|
| Collection interval | 60 seconds |
Rule information
| Monitoring expression | Description | Default threshold | Detection cycle | Elimination cycle |
|---|---|---|---|---|
| tenant500_storage_short_meta_hold_percentage > 30 and tenant500_storage_short_meta_hold_gb > 0 | High memory usage of the STORAGE_SHORT_TERM module for tenant 500 | 30 | 20 seconds | 5 minutes |
Alert information
| Alert trigger method | Alert level | Scope |
|---|---|---|
| Based on the monitoring expression | Severe | Host |
Alert template
Alert overview
- Template: ${alarm_target} ${alarm_name}
- Example: alarm_template_id=0:ob_cluster=TEST-1:host=xxx.xxx.xxx.xxx OceanBase 500 Tenant STORAGE_SHORT_TERM module memory usage is high
Alert details
- Template: Cluster: ${ob_cluster_name}, Host: ${host}, Alert: ${alarm_name}. The memory usage of the STORAGE_SHORT_TERM_META_CTX_ID module for tenant 500 is ${tenant500_storage_short_meta_hold_percentage_value_zh_cn} (${tenant500_storage_short_meta_hold_gb_value_zh_cn} GB), which exceeds ${tenant500_storage_short_meta_hold_percentage_alarm_threshold} %.
- Example: Cluster: TEST, Host: xxx.xxx.xxx.xxx, Alert: OceanBase 500 Tenant STORAGE_SHORT_TERM module memory usage is high. The memory usage of the STORAGE_SHORT_TERM_META_CTX_ID module for tenant 500 is 35% (3.5 GiB), which exceeds 30%.
Alert recovery
- Template: Alert: ${alarm_name}, OceanBase 500 Tenant STORAGE_SHORT_TERM_META_CTX_ID module memory usage: ${recover_value}, OceanBase 500 Tenant STORAGE_SHORT_TERM_META_CTX_ID module memory size: ${recover_value}
- Example: Alert: OceanBase 500 Tenant STORAGE_SHORT_TERM module memory usage is high, OceanBase 500 Tenant STORAGE_SHORT_TERM_META_CTX_ID module memory usage: 15%, OceanBase 500 Tenant STORAGE_SHORT_TERM_META_CTX_ID module memory size: 1.5 GiB
Impact on the system
TableStore is a fixed-length array. When the number of tables exceeds the high watermark (56), new SSTables cannot be added to the array, which may prevent Mini Merge.
Possible causes
Too many partitions on a single node.
select count(*) from __all_virtual_table_mgr where svr_ip=? and table_type !=0;Too many SSTables of the partition replicas, possible causes:
The
undo_retentionvalue is set to a large value, which causes too many SSTables to be retained on a single node.The index is not built. If
index_status != 2, the index table is not built. If the index is not important, you can try to drop the index.select * from __all_virtual_table where data_table_id =? and index_status != 2For more information about index-related alerts, see OceanBase clusters contain tables that failed to build indexes.
Multiple versions need to be retained when the index is built.
select * from __all_acquired_snapshot;The value of
undo_retentionis not 0, which indicates that a large number of multi-version data is retained. This may result in the SSTables not being recycled in time.Run the
show variables like 'undo_retention'statement to query the number of seconds that multi-version data is retained for a tenant.Frequent major freezes.
A major freeze triggers a minor compaction, which stores the data of
major_freeze_tsin a minor SSTable. Therefore, each major freeze triggers at least one minor SSTable. Check the values offrozen_versionandlast_merged_versionin the__all_zonetable. If the values differ greatly, it indicates that a large number of major versions are generated. Each major merge triggers a forced freeze and generates a minor SSTable. If many major versions are not completed, many minor SSTables are generated, which may cause thetable_storeto be overwhelmed.High write pressure, which causes frequent minor compactions that overwhelm the arrays.
You can run the
__all_virtual_table_mgrstatement to check the types of the SSTables intable_store. The data of many minor SSTables is stored, and the value oftable_sizeis not 0.Check whether the minor DAGs of internal tables are piled up:
select * from __all_virtual_dag_scheduler where svr_ip =?Search for logs in
observer.logfor the keyword "dump_dag_status" to view the number of minor DAGs at that time.If a large number of minor DAGs are generated, we recommend that you increase the number of minor threads to improve the situation.
- [V4.x]
compaction_mid_thread_score: the default value of 0 corresponds to six background threads. - [V2.x/V3.x]
minor_merge_concurrency: the default value of 0 corresponds to ten background threads.
- [V4.x]
gc_snapshot_versioncannot be increased, which prevents the table from being recycled.Search for logs for the keywords "too old" and "update info commit". If
gc_snapshotcannot be increased,multi_version_startcannot be increased, and the SSTables cannot be recycled.minor_compact_triggeris set to a large value, which prevents the mini minor from being triggered in time.show parameters like "minor_compact%";minor_compact_triggerindicates that when the number of mini SSTables in the latestfreeze_infointerval exceeds the value, a mini minor merge is automatically triggered to merge multiple mini SSTables into one. Ifminor_compact_triggeris set to a large value, the trigger of mini minor merges is delayed.Ways to reduce the number of mini SSTables:
- Mini Minor Merge: merge multiple mini SSTables in the latest
freeze_infointerval into one. In OceanBase Database V3.1 and later, you can also perform history mini minor merges to merge multiple mini SSTables in thefreeze_infointerval of the history into one. - Major Merge: recycle mini SSTables.
Before the major merge, a
remove_old_tableoperation is triggered to recycle unnecessary multi-version data.- Mini Minor Merge: merge multiple mini SSTables in the latest
Check whether a minor merge is in progress.
select * from __all_virtual_sys_task_status;If a partition is being executed a minor merge, check the start time and whether the running time is long. Check the execution status on the corresponding server based on the trace. Check whether it is stuck.
The number of mini SSTables that triggers a minor merge is 2 by default. If this value is set to a large value, the trigger of minor merge is delayed, and the import of data is very fast, which causes the number of SSTables to not decrease in time:
show parameters like "minor_compact_trigger";
Procedure
For versions after V3.2.0, directly query the diagnostic virtual table
__all_virtual_compaction_diagnose_info.The number of major SSTables is determined by the
max_kept_major_version_numberparameter, which defaults to 2. If undo_retention or index retention is not configured, only 2 major SSTables are retained. Query statement:show parameters like "max_kept_major_version_number";