A major compaction compacts all dynamic and static data, which is time-consuming. When the incremental data generated through minor compactions reaches the specified threshold, OceanBase Database performs a major compaction of data of the same major version.
The biggest difference between a minor compaction and a major compaction is that a major compaction compacts all partitions in the cluster with the global static data at a unified snapshot point. A major compaction is a global operation and generates a global snapshot.
Major compaction modes
Major compactions can be classified into the following types based on the data volume:
Full compaction: All the static data is read and compacted with the dynamic data to generate the final static data. A full major compaction takes a long time and consumes a large amount of I/O and CPU resources.
Incremental major compaction: Only macroblocks modified since the last compaction are compacted. Macroblocks with no changes are reused.
Progressive compaction: A part of the full data is compacted each time, and the full data can be overwritten after several progressive compactions.
Additionally, major compactions can be performed for zones in the cluster in rotation to ensure that the zones not being compacted can still properly provide services externally.
The following table describes the parameters for configuring a major compaction. You can configure them based on your business needs.
| Parameter | Description |
|---|---|
| enable_merge_by_turn | Specifies whether to enable rotating major compaction. |
| zone_merge_concurrency | The number of zones involved in the major compactions. The value can be: * 0: indicates that the number of zones is controlled by the system. * 1: indicates that one zone is compacted each time. * 2: indicates that two zones in a cluster are compacted each time. |
Major compaction status
You can query the __all_zone table for the major compaction status. The status indicates the status of the cluster and the status of each zone. A major compaction can be in the following states:
IDLE: No major compaction is in progress. When the values of
last_merged_versionandfrozen versionare the same, the major compaction is completed.MERGING: The major compaction is in progress.
TIMEOUT: The major compaction has taken a time longer than the specified threshold. After the timeout, the major compaction will continue with a TIMEOUT flag.
ERROR: An error has occurred during the compaction and must be handled with priority.
Compression algorithms
OceanBase Database does not flush a small portion of the data to the disk in real time. Instead, the data is flushed to the disk in a centralized manner through major compactions. Therefore, data can be compressed before being written to the disk to improve the disk space utilization. The data compression ratio and CPU consumption level vary with the compression algorithm and method. You can choose the algorithm and method based on your business needs.
You can specify the default_compress_func parameter to set the compression algorithm. The default value is zstd_1.0. Other values supported include none, lz4_1.0, snappy_1.0, zlib_1.0, and zstd_1.0. Note
A higher compression ratio saves more disk space but undermines the performance. For example, ZSTD consumes less disk space than LZ4 but takes a longer time and has a longer response time for an I/O query.
OceanBase Database allows you to specify a compression algorithm when you create a data table.