Minor compaction
When the memory usage of a MemTable reaches a specified threshold, the data in the MemTable needs to be stored on a disk to release memory space. This process is called a minor compaction. Before the minor compaction, OceanBase Database freezes the active MemTable to prevent new data writes to it and creates a new one for data writes. This process is called a minor freeze.
When OceanBase Database performs minor compaction on the frozen MemTable, it scans rows in the MemTable and then stores them in the SSTable. If a row of data is repeatedly modified by multiple different transactions, multiple data rows of different versions may be stored in the SSTable.
When the data size of a MemTable meets the specified threshold, OceanBase Database performs a minor compaction. Each partitioned replica can independently freeze its active MemTable and store the data to the disk. The SSTable that stores the frozen data can be compacted with only incremental data of the same major version, instead of the global static data. This design accelerates the minor compaction speed because the incremental data is much smaller than the global static data.
Minor compaction triggering
A minor compaction can be triggered both automatically and manually.
When the memory usage of the MemTable reaches memstore_limit_percentage, a minor compaction is automatically triggered. You can also run the following commands to manually trigger a minor compaction.
ALTER SYSTEM MINOR FREEZE [zone] | [server_list] | [tenant_list] | [replica]
tenant_list:
TENANT [=] (tenant_name_list)
tenant_name_list:
tenant_name [, tenant_name ...]
replica:
PARTITION_ID [=] 'partition_idx%partition_count@table_id'
server_list:
SERVER [=] ip_port_list
Examples:
Cluster-level minor compaction
obclient> ALTER SYSTEM MINOR FREEZE;Server-level minor compaction
obclient> ALTER SYSTEM MINOR FREEZE SERVER='10.10.10.1:2882';Tenant-level minor compaction
obclient> ALTER SYSTEM MINOR FREEZE TENANT='prod_tenant';Replica-level minor compaction
obclient> ALTER SYSTEM MINOR FREEZE ALTER PARTITION_ID = '8%1@1099511627933';
Note that although you can manually trigger a minor freeze for a single partition, this operation may not effectively release the memory when this partition shares the same memory block with other partitions. To effectively release the memory of the MemTable of a tenant, perform a tenant-level minor freeze.
Major compaction
A major compaction compacts all dynamic and static data, and will take some time to complete. When the incremental data generated through minor compactions reaches the specified threshold, OceanBase Database performs a major compaction of data of the same major version.
The biggest difference between a minor compaction and a major compaction is that a major compaction compacts all partitions in the cluster with the global static data at a unified snapshot point. A major compaction is a global operation and generates a global snapshot.
The comparison of a minor compaction and a major compaction is shown in the following table.
| Minor compaction | Major compaction |
|---|---|
| A partition- or tenant-level operation that only persists a MemTable. | A global operation that generates a global snapshot. |
| Each tenant of each OBServer independently performs the minor freeze operation on the MemTable. The primary and secondary partitions can be inconsistent. | The primary and secondary partitions must be consistent when their MemTables are frozen. Data consistency is checked during the major compaction. |
| Involves data rows of different versions. | Involves data rows at the snapshot point. |
| A minor compaction compacts the incremental data with the minor SSTable of the same major version and generates a new minor SSTable. It involves only the incremental data and the finally deleted rows require special tags. | A major compaction compacts MemTables and SSTables of the current major version with the full static data of an earlier version and generates a new set of full data. |
The following section describes different types of major compactions.
Full major compaction
Full major compaction is a compaction mode of OceanBase Database, which is similar to the major compaction process of HBase and RocksDB. As the name implies, a full major compaction reads all the current baseline data and compacts the data with the incremental data. Then, it writes the frozen data to a disk as the new baseline data. In this process, all data is rewritten. A full major compaction significantly consumes the disk I/O and space. Therefore, unless necessary or explicitly specified by the DBA, OceanBase Database does not perform full major compactions proactively.
OceanBase Database usually performs full major compactions after DDL operations such as column type modification. DDL modifications take effect in real time and do not block reads or writes or affect Paxos synchronization among multiple replicas. DDL modifications on the stored data are performed during the major compaction when all data is rewritten.
Incremental compaction
Incremental compaction is another compaction mode of OceanBase Database. In most cases, not all macroblocks need to be modified when a major compaction is required. If a macroblock requires no incremental modification, you can reuse it directly instead of rewriting it. This compaction mode is called incremental compaction. Compared with a full major compaction, an Incremental major compaction only rewrites macroblocks that have been modified. This mode greatly reduces the freeze workload. Therefore, it is the default freeze mode of OceanBase Database.
Furthermore, in many cases, not all microblocks in a macroblock are modified. During a major compaction, if some rows of a macroblock are modified, OceanBase Database checks every microblock and directly copies data of unmodified microblocks to the new macroblock. This design eliminates a series of operations, such as row parsing, encoding rule selection, row encoding, and column checksum computation, on unmodified microblocks. The microblock-level incremental compaction further reduces the compaction time.
Progressive compaction
After you perform some DDL operations, such as column addition, column subtraction, and compression algorithm change, you may need to rewrite the data. OceanBase Database does not rewrite data immediately after DDL operations. Instead, it rewrites data during freezes. The Incremental major compaction cannot rewrite the unmodified data. Therefore, OceanBase Database uses progressive major compaction to distribute data rewriting to multiple freeze operations. During each freeze, only some of the data is rewritten.
You can run the following command to control the number of progressive major compaction rounds for a table:
ALTER TABLE [table_name] SET progressive_merge_num=?
Example:
obclient> ALTER TABLE mytest SET progressive_merge_num=60;
In this example, the number of progressive compaction rounds of the mytest table is 60. During 60 rounds of major compactions after column addition or subtraction, 1/60 of data is rewritten during each compaction. In other words, the data is completely rewritten after 60 rounds of major compactions.
The default value of the progressive_merge_num parameter is 0. If you do not set it to another value, OceanBase Database performs 100 rounds of progressive compactions after a DDL operation that needs to rewrite the data. If you set the parameter to 1, OceanBase Database will forcibly perform full major compactions after DDL operations.
Rotating compaction
Generally, OceanBase Database performs major compactions during the off-peak hours, but not all business systems have off-peak hours. Each major compaction significantly consumes the CPU and I/O. If a large number of business requests are received during the freeze, the business will be affected. To avoid the impact on the business, OceanBase Database uses the multi-replica distributed architecture and introduces the rotating major compaction mechanism.
Generally, OceanBase Database has three data replicas. When one data replica is subject to a freeze, OceanBase Database routes the query traffic of this replica to other replicas that are not being frozen. This mechanism ensures that the daily major compaction does not affect business queries. After the major compaction of this replica completes, OceanBase Database routes the query traffic back to this replica and proceeds to freeze another replica. This mechanism is called rotating compaction.
To avoid response time (RT) fluctuation caused by a cold cache after traffic switchover, OceanBase Database preloads the cache before it switches traffics. You can control the preloading time through parameter settings.
Major compaction triggering methods
Three freeze triggering methods are supported: automatic triggering, scheduled triggering, and manual triggering.
When the Minor Freeze number of a tenant in the cluster exceeds the limit, OceanBase Database automatically triggers major compaction of the entire cluster.
You can set parameters to trigger major compactions regularly during off-peak hours every day.
obclient> ALTER SYSTEM SET major_freeze_duty_time = '02:00';You can also manually trigger major compactions by running the following O&M commands.
obclient> ALTER SYSTEM MAJOR FREEZE;