This topic describes the methods for troubleshooting the cause of exhausted tenant MemStore memory and common scenarios that lead to the issue.
Symptom
The MemStore of an OBServer node stores the incremental data of OceanBase Database tenants. If total_memstore_used (the parameter indicates the memory space occupied by the MemStore) exceeds memstore_limit, the data cannot be written into the MemStore. In this case, you must troubleshoot the issue.
Troubleshooting methods
To troubleshoot exhausted MemStore memory, you can check the status of freezing, minor compaction, and release of the MemStore memory.
Check the status of freezing.
Query the
__all_virtual_tenant_memstore_infotable and check the value of theactive_memstore_usedcolumn.If the value of
active_memstore_usedexceeds that ofmajor_freeze_trigger, the freezing is not executed. You need to check further by performing the following steps:Check the status of the thread that triggers a freezing task.
Search for the key log
====== tenant manager timer task ======in the observer.log file, and check whether the key log is periodically printed.If not, the thread is stuck. You need to scrutinize the thread log or the pstack script to identify the cause of the timeout.
[admin@hostname log]$ grep "tenant manager timer task" observer.logIdentify the cause of the failed freezing task.
If the thread works normally, some partitions cannot be frozen. Execute the following SQL statement.
You can use the returned results to find the non-frozen MemStores and locate the cause by searching for pkey in the log. You can also search for non-frozen log files.
obclient> SELECT * FROM __all_virtual_tenant_memstore_allocator_info WHERE svr_ip='xxx.xxx.xx.xxx' AND tenant_id=xxx AND mt_is_frozen=0 ORDER BY mt_protection_clock limit 10;
Check the status of minor compaction.
In the preceding step, if the value of
total_memstore_usedis greater than that ofmajor_freeze_trigger, and the value ofactive_memstore_usedis less than that ofmajor_freeze_trigger, some frozen MemStores are not released. In this case, you need to check whether the minor compaction is performed on frozen MemStores.Confirm the unreleased MemStores.
Execute the following SQL statement to check the pkeys of the unreleased MemStores:
obclient> SELECT * FROM __all_virtual_tenant_memstore_allocator_info WHERE svr_ip='xxx.xxx.xx.xxx' AND tenant_id=xxx and mt_is_frozen=1 ORDER BY mt_protection_clock limit 10;Query the
__all_virtual_table_mgrtable by using the obtained pkeys and further analyze the cause of unreleased MemStores.Execute the following SQL statements to locate the unreleased MemTables.
Check the MemTable that involves active transactions, whose reference count is not
0, or whose read timestamp of the standby IDC is earlier than the snapshot point.obclient> SELECT * FROM (SELECT A.svr_ip, A.table_id, A.partition_id, A.is_active, A.table_type, A.ref, A.write_ref, A.trx_count, A.base_version, A.multi_version_start, A.snapshot_version, B.min_log_service_ts, B.min_trans_service_ts, B.min_replay_engine_ts FROM (SELECT * FROM __all_virtual_table_mgr WHERE table_type=0 AND is_active =0 ) AS A join (SELECT * FROM __all_virtual_partition_info WHERE svr_ip='xxx.xxx.xx.xxx') AS B ON A.table_id = B.table_id AND A.partition_id=B.partition_idx AND A.svr_ip =B.svr_ip) AS C WHERE C.write_ref >0 OR C.trx_count >0 OR (C.snapshot_version > least(least(min_trans_service_ts, min_replay_engine_ts), min_log_service_ts));Check the log recycling progress in the partitions corresponding to the unreleased MemTables, and check whether the log replay is delayed.
obclient> SELECT * FROM __all_virtual_partition_replay_status WHERE (table_id, partition_idx) IN (SELECT table_id, partition_id FROM (SELECT A.svr_ip, A.table_id, A.partition_id, A.is_active, A.table_type, A.ref, A.write_ref, A.trx_count, A.base_version, A.multi_version_start, A.snapshot_version, B.min_log_service_ts, B.min_trans_service_ts, B.min_replay_engine_ts FROM (SELECT * FROM __all_virtual_table_mgr WHERE table_type=0 AND is_active =0 ) AS A JOIN (SELECT * FROM __all_virtual_partition_info WHERE svr_ip="xxx.xxx.xx.xx") AS B ON a.table_id = b.table_id AND A.partition_id=B.partition_idx AND A.svr_ip =B.svr_ip) AS C WHERE C.write_ref >0 or C.trx_count >0 OR (C.snapshot_version > least(least(min_trans_service_ts, min_replay_engine_ts), min_log_service_ts))) AND svr_ip='xxx.xxx.xx.xxx';
Verify the scheduling of minor compaction.
If a MemStore does not generate any SSTable, it may be because the MemStore does not meet the following scheduling conditions:
The read timestamp of the standby cluster lags behind the
snapshot_versionof the MemStore.The MemStore still involves active transactions. In this case, the value of the
trx_countfield is not 0.Notice
The value of the
trx_countfield affects only OceanBase Database V2.2.x. If you use OceanBase Database V3.x, ignore this field.
To obtain the
snapshot_versionof the MemStore, execute the following statement to query the__all_virtual_table_mgrtable:obclient> SELECT * FROM __all_virtual_table_mgr WHERE svr_ip='xxx.xxx.xx.xxx' AND svr_port=xxx AND table_id=xxx AND partition_id=xxx;Leaderlessness, or slow scheduling or failed execution of partition rebuilding tasks can also result in the lag of the read timestamp of the standby cluster. Active transactions can be obtained by querying the
__all_virtual_trans_stattable and further analyzed by using the log. The read timestamp of the standby cluster is the minimum of the following three in the___all_virtual_partition_infotable:min_log_service_ts,min_trans_service_ts, andmin_replay_engine_ts.obclient> SELECT * FROM __all_virtual_partition_info WHERE svr_ip='xxx.xxx.xx.xxx' AND svr_port=xxx AND table_id=xxx AND partition_idx=xxx;;Verify the process of minor compaction.
If the scheduling conditions are met, the minor compaction is not executed or the execution fails. You can verify the execution process by using key logs. You must replace the pkeys in the following script with the actual pkey fields.
add dag success.*pkey task start process.*pkey task finish process.*pkey dag finish.*pkey
Check the release of MemStores
After the minor compaction of all primary tables and index tables in the MemStores, you can search for the key log
succeed to release memtable.*pkey. If the log exists, exceptions occurred during the release of MemStores. You can query the__all_virtual_table_mgrtable and view therefcolumn to confirm whether the reference count of the unreleased MemStores leaks.obclient> SELECT table_id, partition_id, base_version, snapshot_version FROM __all_virtual_table_mgr WHERE svr_ip='xxx.xxx.xx.xxx' AND table_type=0 except SELECT table_id, partition_idx, base_version, snapshot_version FROM __all_virtual_memstore_info WHERE svr_ip='xxx.xxx.xx.xxx';
Scenarios
Large transactions of OceanBase Database V2.x
OceanBase Database V1.X terminates transactions during minor compactions. To resolve this issue, a data migration mechanism is introduced to OceanBase Database V2.x. The mechanism defines a Frozen Version during the freezing of MemStore. If the commit_version of a transaction is greater than the Frozen Version, the data to be written is migrated to a new active MemStore. Therefore, highly concurrent large write transactions in OceanBase Database V2.x tend to result in high usage or exhaustion of the MemStore memory. OceanBase Database V3.x optimizes the mechanism and resolves the issue.
Therefore, if you use OceanBase Database V2.x and the MemStore memory is exhausted, check the existence of highly concurrent large transactions. Solutions:
Limit the transaction size and the concurrency at the client.
Modify the hidden
_max_trx_sizeparameter. This parameter specifies the maximum allowable write transaction size for a single partition. The default value is 100 MB. If you use OceanBase Database V2.x, we recommend that you do not increase the value. If you must increase the value, contact OceanBase Technical Support.You can calculate the transaction size by using the following formulas:
Maximum number of rows to be inserted = _max_trx_size/(Total length of all columns to be inserted × 2) Maximum number of rows to be updated = _max_trx_size/(Total length of all columns to be updated × 4)Example: Assume that the default value of the
_max_trx_sizeparameter is 100 MB, and you run the following command to create a table:create table test (col1 varchar(512),col2 varchar(512)) ;. The maximum number of rows to be inserted is100,000,000 / ((512 + 512) × 2) = 50,000. If you run the following command to update a table:update test set col1=<new value >, col2=<new value>, the maximum number of rows to be updated is100,000,000 / ((512+512) × 4) = 25,000.
The data write is faster than the minor compaction
In specific scenarios, such as the highly concurrent import of a large amount of data, the data is written into MemStore so fast that the MemStore memory is used up before the minor compaction can be complete. This results in the return of Error 4030: Over tenant memory limits. In this case, you can speed up the minor compaction, or slow down the data write.
Speed up the minor compaction
If the slow minor compaction speed is not caused by the bottleneck of CPU, I/O, or other hardware resources, you can speed up the minor compaction by using the following parameters:
freeze_trigger_percentage: the threshold percentage of MemStore memory. The freezing is triggered when the usage of MemStore memory reaches the threshold.minor_merge_concurrency: the number of worker threads for the minor compaction.mini_merge_concurrency: the number of worker threads for processing the requests whose nested level is 0.For more information about parameters, see Reference Guide (MySQL Mode) and Reference Guide (Oracle Mode).
Notice
The modification of minor compaction parameters may cause major changes. We recommend that you verify the modification in a test environment before you modify the parameters in your production environment.
If you modify the parameters in the production environment, make minor adjustments to observe the change in the compaction speed and the side effects right away. After you increase the minor compaction speed, you can check whether the issue is mitigated.
Slow down data write to the MemStore.
OceanBase Database also provides the MemStore write throttling feature to prevent memory exhaustion. The MemStore write throttling is triggered and the memory allocation is restricted if the MemStore memory usage reaches 90%. You can change the percentage by specifying the
writing_throttling_trigger_percentageparameter. The throttling extends the response time of queries, and thus reduces the data write speed. OceanBase Database regularly checks the MemStore memory usage and disables write throttling when the usage drops to a value lower than the specified threshold. Then, the incremental data processing performance is promptly recovered.OceanBase Database allows you to control the MemStore write throttling by using the following two parameters:
writing_throttling_trigger_percentage: the threshold of MemStore memory usage. The MemStore write throttling is triggered if the MemStore memory usage exceeds the threshold.writing_throttling_maximum_duration: the period of time within which the remaining MemStore memory is allocated after the write throttling is triggered.For more information about the parameters, see Reference Guide (MySQL Mode) and Reference Guide (Oracle Mode).
Notice
After the write throttling is enabled, if the MemStore memory usage reaches the value of the
writing_throttling_trigger_percentageparameter, the execution time of DML statements is expected to extend. To ensure the complete execution of SQL statements, you may set theob_query_timoutandob_trx_timeoutparameters to larger values. However, you can notice that the response time becomes longer.In most cases, you can run the following command to enable the write throttling:
alter system set writing_throttling_trigger_percentage=90;In a cluster where the write speed is greatly different from the minor compaction speed, you can reduce the value of thewriting_throttling_trigger_percentageparameter. Proceed with caution when you modify this parameter in the production environment. If you must modify thewriting_throttling_trigger_percentageparameter in the production environment, we recommend that you perform the modification under the guidance of OceanBase Technical Support.Notice
The preceding two parameters are tenant-level parameters. If you set these parameters in the sys tenant, you need to specify the tenant to which the parameters apply. Otherwise, the parameters apply only to the sys tenant. Example: alter system set writing_throttling_trigger_percentage=90 tenant='oracle';
The MemStore write throttling feature of OceanBase Database ensures that a large number of concurrent DML statements are fully processed on OBServer nodes at a slow speed. However, the OceanBase cluster has a technical limit on processing concurrent writes. Therefore, we recommend that you also verify the concurrency of DML statements at the same time when you modify the MemStore write throttling. If possible, we recommend that you ensure your business by reducing the concurrency of DML statements.