This topic describes how to troubleshoot a slow major compaction and major compaction timeout.
Applicable versions
The solution provided in this topic is applicable to all versions of OceanBase Database.
Troubleshooting logic
To troubleshoot a major compaction exception, you need to check the major compaction parameters of the current cluster and verify the major compaction status. Then, you can analyze different types of major compaction exceptions based on the obtained information and identify the root causes.
Check major compaction parameters
OceanBase Database provides the following major compaction parameters:
enable_manual_merge: specifies whether to enable manual major compaction. Default value:FALSE. You can enable manual major compaction by setting the parameter toTRUE.zone_merge_concurrency: the number of concurrent zones supported in a major compaction. If the parameter is set to0, the system determines the best concurrency based on the deployment status. However, you can keep the default value, which is1. The default value indicates that only one zone is compacted at a time.zone_merge_order: the order of zones in a rotating major compaction. If you do not specify this parameter, the system determines the order.enable_merge_by_turn: specifies whether to enable rotating major compaction. Default value:FALSE. You can enable rotating major compaction by setting the parameter toTRUE.major_freeze_duty_time: the start time of the daily major compaction.enable_auto_leader_switch: specifies whether to enable automatic switchover to the leader. Default value:TRUE.
For more information about the preceding system parameters, see "Parameter reference" in OceanBase Database Reference Guide.
Check major compaction status
In the following example, the value of the frozen_version field is 25, which indicates that the cluster needs to be compacted to version 25. However, the cn-shanghai-e replica is compacted only to version 24.
obclient> SELECT * FROM __all_zone WHERE name = "frozen_version" or name = "last_merged_version";
+----------------------------+----------------------------+---------------+---------------------+-------+------+
| gmt_create | gmt_modified | zone | name | value | info |
+----------------------------+----------------------------+---------------+---------------------+-------+------+
| 2020-12-07 16:01:30.793286 | 2020-12-30 02:00:00.445594 | | frozen_version | 25 | |
| 2020-12-07 16:01:30.793490 | 2020-12-29 02:01:12.449651 | | last_merged_version | 24 | |
| 2020-12-07 16:01:30.794375 | 2020-12-29 02:01:11.880114 | cn-shanghai-e | last_merged_version | 24 | |
| 2020-12-07 16:01:30.795109 | 2020-12-30 02:01:15.563291 | cn-shanghai-f | last_merged_version | 25 | |
| 2020-12-07 16:01:30.795842 | 2020-12-30 02:01:22.694016 | cn-shanghai-g | last_merged_version | 25 | |
+----------------------------+----------------------------+---------------+---------------------+-------+------+
5 rows in set (0.00 sec)
Query the __all_zone table and check the compaction status of the cn-shanghai-e replica. If the status is MERGING, the replica is in the progress of a major compaction.
obclient> SELECT * FROM __all_zone WHERE name = "merge_status";
+----------------------------+----------------------------+---------------+---------------------+---------+------+
| gmt_create | gmt_modified | zone | name | value | info |
+----------------------------+----------------------------+---------------+---------------------+---------+------+
| 2020-12-07 16:01:30.793286 | 2020-12-30 02:00:00.445594 | | merge_status | MERGING | |
| 2020-12-07 16:01:30.794375 | 2020-12-29 02:01:11.880114 | cn-shanghai-e | merge_status | MERGING | |
| 2020-12-07 16:01:30.795109 | 2020-12-30 02:01:15.563291 | cn-shanghai-f | merge_status | IDLE | |
| 2020-12-07 16:01:30.795842 | 2020-12-30 02:01:22.694016 | cn-shanghai-g | merge_status | IDLE | |
+----------------------------+----------------------------+---------------+---------------------+---------+------+
5 rows in set (0.00 sec)
Types of major compaction exceptions
Major compaction exceptions are classified into the following types based on the preceding query results: no major compaction, major compaction timeout, and slow major compaction. The following content describes each type of exception in detail.
No major compaction
The issue of no major compaction is classified into the following types: RootService does not initiate a major compaction on a zone, and no major compaction is initiated on replicas.
No major compaction on a zone
Based on the preceding example, after you confirm that a zone is in the progress of a major compaction, you can query the global_broadcast_version and broadcast_version fields. If the value of the last_merged_version field equals that of the broadcast_version field and lags behind that of the global_broadcast_version field, RootService does not initiate a major compaction on the zone.
obclient> SELECT * FROM __all_zone WHERE name = "global_broadcast_version" or name = "broadcast_version";
+----------------------------+----------------------------+---------------+--------------------------+---------+------+
| gmt_create | gmt_modified | zone | name | value | info |
+----------------------------+----------------------------+---------------+--------------------------+---------+------+
| 2020-12-07 16:01:30.793286 | 2020-12-30 02:00:00.445594 | | global_broadcast_version | 25 | |
| 2020-12-07 16:01:30.794375 | 2020-12-29 02:01:11.880114 | cn-shanghai-e | broadcast_version | 24 | |
| 2020-12-07 16:01:30.795109 | 2020-12-30 02:01:15.563291 | cn-shanghai-f | broadcast_version | 25 | |
| 2020-12-07 16:01:30.795842 | 2020-12-30 02:01:22.694016 | cn-shanghai-g | broadcast_version | 25 | |
+----------------------------+----------------------------+---------------+--------------------------+---------+------+
5 rows in set (0.00 sec)
If RootService does not initiate a major compaction on a zone, perform the following steps to locate the cause:
Check whether an error occurs in the major compaction scheduling thread of RootService.
If yes, the following query returns a value:
grep "daily.*merge.*ret=-" rootservice.logCheck whether the cluster is generating replicas. If yes, the following query returns a value:
obclient> SELECT count(*) FROM __all_virtual_replica_task;
No major compaction on replicas
If the version to be compacted is 25, you can query the meta tables and use data_version! = 25 to filter corresponding replicas.
If you use OceanBase Database V1.X, query the
__all_virtual_core_meta_table,__all_virtual_core_root_table,__all_root_table, and__all_meta_tabletables.If you use OceanBase Database V2.X and later, query the
__all_virtual_core_meta_table,__all_virtual_core_root_table,__all_root_table, and__all_virtual_meta_tabletables.
obclient> SELECT * FROM __all_virtual_meta_table WHERE data_version != 25 LIMIT 10;
+-----------+------------------+--------------+---------------+----------+----------------------------+----------------------------+----------+---------+---------------+---------------+------+-----------------------------------------------------------------------------------------------------------+-----------+-----------+--------------+---------------+--------------+-----------------+--------------------+--------------------+-------------+---------+--------------+---------------+-----------------------+------------+--------------------+--------+-----------+--------------------+------------------+
| tenant_id | table_id | partition_id | svr_ip | svr_port | gmt_create | gmt_modified | sql_port | unit_id | partition_cnt | zone | role | member_list | row_count | data_size | data_version | data_checksum | row_checksum | column_checksum | is_original_leader | is_previous_leader | create_time | rebuild | replica_type | required_size | status | is_restore | partition_checksum | quorum | fail_list | recovery_timestamp | memstore_percent |
+-----------+------------------+--------------+---------------+----------+----------------------------+----------------------------+----------+---------+---------------+---------------+------+-----------------------------------------------------------------------------------------------------------+-----------+-----------+--------------+---------------+--------------+-----------------+--------------------+--------------------+-------------+---------+--------------+---------------+-----------------------+------------+--------------------+--------+-----------+--------------------+------------------+
| 1001 | 1100611139463766 | 0 | xxx.xxx.x.xa | xxxx | 2020-12-29 10:34:15.176561 | 2020-12-29 10:34:15.205753 | 2881 | 1001 | 0 | cn-shanghai-e | 1 | xxx.xxx.x.xx:2882:1609209255175464,xxx.xxx.x.xx:2882:1609209255175464,xxx.xxx.x.xx:2882:1609209255175464 | 0 | 0 | 24 | 0 | 0 | | 0 | 1609209255204831 | 0 | 0 | 0 | 0 | REPLICA_STATUS_NORMAL | 0 | 0 | 3 | | 0 | 100 |
+-----------+------------------+--------------+---------------+----------+----------------------------+----------------------------+----------+---------+---------------+---------------+------+-----------------------------------------------------------------------------------------------------------+-----------+-----------+--------------+---------------+--------------+-----------------+--------------------+--------------------+-------------+---------+--------------+---------------+-----------------------+------------+--------------------+--------+-----------+--------------------+------------------+
1 row in set (0.07 sec)
The preceding results show that the table whose pkey is {tid:1100611139463766, partition_id:0} is not compacted on the OBServer node xxx.xxx.x.xa:xxxx.
You can also search for logs in the rootservice.log file to locate the table that blocks the major compaction.
Major compaction timeout
Locate the partition that is not compacted to the specified version.
a. Query the database tables
If the version to be compacted is 25, query the meta tables and use
data_version! = 25to filter corresponding replicas.Note
Meta tables are a group of tables. We recommend that you query the
__all_meta_tableor__all_virtual_meta_tabletable first. If you do not find the replicas of earlier data versions, check the meta tables at the upper level. If you use OceanBase Database earlier than V2.0, meta tables include the following tables:__all_virtual_core_meta_table,__all_virtual_core_root_table,__all_root_table, and__all_meta_table. If you use OceanBase Database V2.0 and later, meta tables include the following tables:__all_virtual_core_meta_table,__all_virtual_core_root_table,__all_root_table, and__all_virtual_meta_table.b. Query in the database backend Search for the
not mergedkeyword in the latestrootservice.logfile on the RootServer.Check whether the partition is in the progress of a major compaction task.
SELECT * FROM __all_virtual_sys_task_status;You can obtain the execution trace from the virtual table, use the trace to query on the corresponding OBServer node, and then check whether the task is stuck.
If the partition is not in the progress of a major compaction task, check whether the snapshot_version of the latest SSTable is larger than the value of the data_version field in the
__all_virtual_freeze_infotable. If not, an exception occurs to the minor compaction.SELECT * FROM __all_virtual_table_mgr WHERE table_id = xxx and partition_id = xxx; SELECT * FROM __all_virtual_freeze_info WHERE data_version = xxx;If you do not obtain any useful information, you can search recent
observer.logfiles for error logs to find clues.
Slow major compaction
If the major compaction is not terminated, you can locate the cause by using the methods for troubleshooting a major compaction timeout. If the major compaction is terminated, perform the following steps:
Query the major compaction statistics of the specified version to filter the partitions that consume the most time.
SELECT /*+ query_timeout(10000000)*/* FROM __all_virtual_partition_compaction_history WHERE merge_type = "major merge" and merge_version = "<merge_version>" ORDER BY (merge_finish_time - merge_start_time) desc limit 5;Note
The
versionfield consists of three parts. The first part ismerge_version, which cannot be 0 and must be replaced with the compacted version. The other two parts can be0.You can query the `__all_zone` table to obtain the compacted version.Query the
__all_virtual_partition_sstable_merge_infotable to view the number of reused macroblocks.The fields in Step 1, such as
occupy_size,macro_block_count,use_old_macro_block_count,rewrite_macro_old_micro_block_count, andrewrite_macro_total_micro_block_count, indicates the data amount of the partition, the total number of macroblocks, the number of reused macroblocks, and the number of rewritten microblocks in the reused macroblocks. If the ratio of the number of reused macroblocks to the total number of macroblocks is low, the major compaction may slow down. This situation may occur in the first major compaction of a new imported table, or as a result of a full major compaction triggered by a large number of random writes.Check whether the start time of the major compaction is reasonable.
SELECT * FROM __all_virtual_partition_sstable_merge_info WHERE table_id = xxx and partition_id = xxx and svr_ip = "xxx" ORDER BY merge_finish_time desc LIMIT 10;Find the partitions with the latest start time of the major compaction and search for compaction records in the
observer.logfile.grep "sstable merge finish.* table_id.* partition_id" observer.log.20211117* | vi -If error messages are found:
a. In OceanBase Database earlier than V3.x, if
-4288(constexpr int OB_MEMTABLE_CANNOT_MINOR_MERGE = -4288;)is reported, the minor compaction fails because of ongoing transactions on the MemTable. You need to troubleshoot the transactions.b. If other errors are reported, troubleshoot the major compaction module.
In a major compaction, modified macroblocks are rewritten at a slow speed, and non-modified macroblocks are reused and flushed to the disk at a high speed. However, in a full major compaction, all macroblocks are rewritten, which may result in a slow speed.
OceanBase Database V3.2 and later allow you to diagnose the cause of a slow major compaction by querying the __all_virtual_compaction_suggestion table.
Troubleshooting procedure
Log on to the OBServer node where the major compaction is in progress, and check whether a disk I/O bottleneck exists.
Run the
iostatcommand in the shell to view the major compaction status on the OBServer node. If the disk usage is 100% and the value ofawaitis high, a hardware exception is possible and you must check the hardware conditions.[root@hostname /]# iostat -x 1 -kIf the
iostatcommand returns a low value ofawaitbut the overall I/O bandwidth is not high, you can run the following command in the shell to check whether throttling is performed:[admin@hostname log]$ grep "iostat" observer.logThe obtained I/O statistics of the OBServer node contain the following two fields:
sys_io_percentandsys_iops_up_limit. Thesys_io_percentfield indicates the current usage of the system I/O, and thesys_iops_up_limitfield indicates the maximum system I/O capacity.If the value of the
sys_iops_up_limitfield is greater than the bandwidth usage indicated byiostat, the I/O margin is sufficient but no more bandwidth can be used due to the CPU bottleneck.If the value of the
sys_iops_up_limitfield is less than the bandwidth usage indicated byiostat, I/O throttling may be performed. You need to increase the value of the sys_iops_up_limit field. OceanBase Database allows you to control I/O throttling by using the following two parameters:sys_bkgd_io_low_percentageandsys_bkgd_io_high_percentage, which specify the lower and upper limits of thesys_io_percentfield. If you set the lower limit to an excessively small value, major compactions can be greatly slowed down.For more information about the preceding parameter, see "System parameters" in OceanBase Database Reference Guide.
If you confirm that a CPU bottleneck exists, modify the
merge_thread_countparameter to increase the number of major compaction threads.The
merge_thread_countparameter specifies the number of threads for the daily major compaction. The default value is0, which indicates that the number of threads is determined by the system. Value range:[0,64].For more information about this parameter, see "System parameters" in OceanBase Database Reference Guide.
If the issue persists after you perform the preceding steps, a slow major compaction may be caused by the writing of a large amount of data in the following scenarios:
A huge amount of incremental data is accumulated after repeated minor compactions. You can reduce the value of the
minor_freeze_timesparameter to trigger major compactions more frequently.The
minor_freeze_timesparameter specifies the number of minor freezes that trigger a major compaction. Default value:5. Value range:[0,65535]. You can set the parameter to0to disable minor freeze.For more information about this parameter, see "System parameters" in OceanBase Database Reference Guide.
Burdensome write amplification. As a major compaction involves the merge of the baseline data with the data of minor compactions, baseline data macroblocks that are associated with incremental data are rewritten. Therefore, extremely discrete writing of incremental data can cause burdensome write amplification. You can mitigate the write amplification by increasing the value of the
merge_thread_countparameter.
Notice
To prevent burdensome write amplification, do not specify a field that can be randomly written as the primary key. For example, you must not use UUID as the primary key in an OceanBase database.