obproxy_core_dump

2025-03-26 07:47:21  Updated

Description

This alert is triggered when a core dump of the obproxy process occurs. This alert is automatically cleared 20 minutes later.

Principle

Parameter Value
Metric tenant500_storage_short_meta_hold_percentage,tenant500_storage_short_meta_hold_gb
Source Query OceanBase Database virtual tables.
  • Statement for querying the memory usage of the STORAGE_SHORT_TERM module of the sys500 tenant: select /*+ MONITOR_AGENT READ_CONSISTENCY(WEAK) / sum(hold) as hold from __all_virtual_memory_info where tenant_id = 500 and svr_ip = ? and svr_port = ? and ctx_name = 'STORAGE_SHORT_TERM_META_CTX_ID'
  • Statement for querying the memory usage of the sys500 tenant: select /+ MONITOR_AGENT READ_CONSISTENCY(WEAK) */ sum(hold) as hold from __all_virtual_memory_info where tenant_id = 500 and svr_ip = ? and svr_port = ? and mod_name <> 'OB_KVSTORE_CACHE_MB'
Collected metric
  • ob_tenant500_storage_short_meta_memory_hold_bytes
  • ob_tenant500_memory_hold_bytes
Metric expression
  • tenant500_storage_short_meta_hold_percentage: 100 * (sum(ob_tenant500_storage_short_meta_memory_hold_bytes{}) by (ob_cluster_name,ob_cluster_id,svr_ip) / sum(ob_tenant500_memory_hold_bytes{}) by (ob_cluster_name,ob_cluster_id,svr_ip))
  • tenant500_storage_short_meta_hold_gb:sum(ob_tenant500_storage_short_meta_memory_hold_bytes{}) by (ob_cluster_name,ob_cluster_id,svr_ip) / 1073741824
Collection cycle 60 seconds

Alert rule

Metric expression Metric description Default threshold Detection cycle Time before clearance
tenant500_storage_short_meta_hold_percentage > 30 and tenant500_storage_short_meta_hold_gb > 0 The memory usage of the STORAGE_SHORT_TERM module of the sys500 tenant 30 10 seconds 5 minutes

Alert information

Trigger method Alert level Scope
Metric expression Critical Host

Alert templates

  • Overview

    • Template: ${alarm_target} ${alarm_name}
    • Example: alarm_template_id=0:ob_cluster=TEST-1:host=xxx.xxx.xxx.xxx. The memory usage of the STORAGE_SHORT_TERM module of the sys500 tenant is high.
  • Details

    • Template: Cluster: ${ob_cluster_name}. Host: ${host}. Alert: ${alarm_name} The memory usage (percentage: ${tenant500_storage_short_meta_hold_percentage_value_zh_cn}, size: ${tenant500_storage_short_meta_hold_gb_value_zh_cn}) of the ORAGE_SHORT_TERM_META_CTX_ID module exceeds ${tenant500_storage_short_meta_hold_percentage_alarm_threshold} %.
    • Example: Cluster: TEST. Host: xxx.xxx.xxx.xxx. Alert: The memory usage of the STORAGE_SHORT_TERM module of the sys500 tenant is high. The memory usage (percentage: 35%, size: 3.5 GB) of the ORAGE_SHORT_TERM_META_CTX_ID module exceeds 30 %.

Impact on the system

Table Store is a fixed-length array. When the number of tables exceeds the threshold 56, new SSTables cannot be added to the array. As a result, minor compactions cannot be performed.

Possible causes

  • Too many partitions exist on a single server.

    select count(*) from __all_virtual_table_mgr where svr_ip=? and table_type !=0;
    
  • Too many SSTables exist in partition replicas. Possible causes are described as follows:

    • The value of the undo_retention parameter is too large, which results in too many SSTables on a single node.

    • Indexes are not properly created. You can use the index_status !=2 condition to filter incomplete indexes and then drop unimportant indexes.

      select * from __all_virtual_table where data_table_id =? and index_status != 2
      

      For information about index-related alerts, see ob_cluster_exists_index_fail_table.

    • Multiple versions of data are retained for index creation.

      select * from __all_acquired_snapshot; 
      
    • When the value of the undo_retention parameter is not 0, a large amount of multi-version data is stored, and SSTables may not be recycled in time.

      Run the show variables like 'undo_retention' command in the specified tenant to query the time period, in seconds, within which the multi-version data of the tenant are retained .

    • Major compactions are frequently performed.

      A major compaction triggers a minor compaction to store major_freeze_ts data. Therefore, each time a major compaction is triggered, at least one mini SSTable is generated. Check the frozen_version and last_merged_version parameters in the __all_zone table. If the value difference is great, major compactions are frequently performed. Each major compaction triggers a forced freeze to generate a mini SSTable. If too many major compaction versions are not executed, a large number of mini SSTables are generated and may exhaust the memory of Table Store.

    • Frequent data writes cause frequent minor compactions, which exceeds the array size limit.

      You can query the __all_virtual_table_mgr table to view the types of SSTables in Table Store. If minor compactions are frequently performed, mini SSTables contain a large amount of data, and table_size is not 0.

      Query the internal table to check whether a large number of MINOR DAGs exist: select * from __all_virtual_dag_scheduler where tenant_id =?

      Search the observer.log file with the dump_dag_status keyword to get the number of MINOR DAGs at that time.

      If a large number of MINOR DAGs exist, we recommend that you temporarily increase the number of MINOR threads by specifying the following parameters:.

      • In [OceanBase Database V4.x], adjust the value of compaction_mid_thread_score. The default value is 0, which corresponds to 6 background threads.
      • In [OceanBase Database V2.x and V3.x], adjust the value of minor_merge_concurrency. The default value is 0, which corresponds to 10 background threads.
    • The value of gc_snapshot_version fails to increase.

      Search for "too old" and "update info commit" in the log. If the value of gc_snapshot_version fails to increase, so does that of multi_version_start As a result, SSTables cannot be recycled.

    • The value of minor_compact_trigger is so large that the MINOR MERGE task cannot be executed in time.

      show parameters like "minor_compact%";
      

      minor_compact_trigger specifies the number of mini SSTables within the latest freeze_info range that triggers a MINOR MERGE task to merge multiple mini SSTables into a minor SSTable. If the value of minor_compact_trigger is too large, the MINOR MERGE task is not triggered in time.

      You can reduce the number of mini SSTables by using the following methods:

      1. Perform a MINOR MERGE task to merge multiple mini SSTables within the latest freeze_info range into a minor SSTable. In OceanBase Database of versions later than V3.1, you can perform HISTORY MINOR MERGE tasks to merge multiple mini SSTables in a historical freeze_info range into a minor SSTable.
      2. Perform a major compaction to recycle mini SSTables.

      Before you perform a major compaction, a remove_old_table operation is triggered to recycle multi-version data that is no longer needed.

    • Query for running MINOR MERGE tasks.

      select * from __all_virtual_sys_task_status;
      

      Check the start time in the trace of the MINOR MERGE task of the corresponding partition, if the task runs for a long period, log on to the corresponding server and check whether the task is stuck.

      Check the specified number of mini SSTables that triggers a MINOR MERGE task. The default value is 2. If a large number is specified, a MINOR MERGE task cannot be triggered in time when a large amount of data is imported. As a result, the number of SSTables cannot quickly decrease.

      show parameters like "minor_compact_trigger";
      

Suggested solutions

  1. If you use OceanBase Database V3.2.0 or later, query the __all_virtual_compaction_diagnose_info table table.

  2. The number of retained major SSTables is specified by max_kept_major_version_number. The default value is 2. This means that if you set undo_retention to 0 or no multi-version data is retained for index creation, only two major SSTables are retained by default. The query statement is as follows:

    show parameters like "max_kept_major_version_number";
    

Contact Us