Beyond High Availability: How OceanBase Keeps Data Correct

Jackie Qu
Jackie Qu
Published on June 25, 2026Updated on 2026-06-25
6 minute read
Key Takeaways

· OceanBase combines physical verification and logical verification to detect both block-level corruption and cross-replica or base-table/index inconsistencies.

· By embedding verification into RedoLog, SSTable access, replication, background inspection, and LSM-based major compaction, OceanBase keeps data correctness checks continuous rather than occasional.

· This design is especially valuable for large-scale, multi-replica database systems where silent data corruption, replica divergence, and index inconsistency can remain hidden while services still appear available.

Introduction: The Service Is Online — But Is the Data Still Correct?

In 2021, Meta published "Silent Data Corruptions at Scale," an 18-month study across hundreds of thousands of machines. The study identified hundreds of CPUs affected by silent errors and concluded that Silent Data Corruption, or SDC, is a systemic issue across hardware generations, not an occasional failure of isolated components.

That same year, Google published "Cores That Don't Count," noting that although mercurial cores are extremely rare, they appear often enough at fleet scale to become a distinct operational problem.

Together, these reports from hyperscale infrastructure operators point to the same conclusion: silent data corruption is not merely a low-probability event. At scale, it becomes a question of when it will happen — and whether the system can detect it in time.

The danger lies in the word silent. A disk may return corrupted data without reporting an I/O error. Software defects, compaction processes, or index-maintenance anomalies may also cause logical divergence between a base table and its indexes. In these cases, the database process usually does not crash, and the service still appears online. This highlights a crucial distinction: high availability keeps the system operational; data correctness keeps query results trustworthy. They are separate engineering goals.

Traditional high-availability mechanisms, such as replica failover, are not designed to detect these issues, let alone correct them. OceanBase takes a different approach: beyond multi-replica deployment and Paxos-based high availability, it embeds systematic data verification throughout the data lifecycle.

This article walks through four parts of that design: physical integrity checks, logical consistency checks, architectural support for continuous verification, and recovery after an issue is detected.

Physical Integrity: Multi-Layer Checksum Protection

Physical integrity addresses the binary correctness of data during storage and transmission. Its goal is straightforward: even if hardware, storage media, or the data transfer path fails, the system can detect content changes in time.

The basic logic of physical verification is simple: when data is written, the system records a checksum; when the data is later read or replicated, the system recalculates the checksum and compares the result. If the two checksum values differ, the data may have changed during storage or transmission.

OceanBase applies physical verification at both the RedoLog layer and the SSTable storage layer.

1. RedoLog: Checksum Protection on the Durability Path

RedoLog is a core part of the transaction durability path. When a transaction commits, OceanBase generates RedoLog records, replicates them across replicas through Paxos, and eventually persists them to disk. Each log record contains a checksum in its header. Verification is performed at several critical points:

  • Log generation: When a RedoLog record is generated, its checksum is calculated and recorded as the baseline for subsequent verification.
  • Network transmission: When logs are synchronized across replicas, the receiving replica verifies the checksum to ensure the transmitted data is correct.
  • Log replay: When RedoLog records are applied to memory or disk, the system verifies their checksums again to prevent corrupted logs from affecting data consistency.

This prevents corrupted log records from being accepted and replayed by other replicas. Checksum verification contains physical corruption at the detection point instead of allowing it to spread through log replication.

2. Storage Layer: Integrity Checks at Multiple Storage Levels

OceanBase's storage engine is built on an LSM-Tree design, where incoming writes are first buffered in MemTables and later flushed to immutable SSTables. Because SSTables are read, written, merged, and replicated at different granularities, OceanBase maintains checksum information at multiple storage levels.

  • Macroblocks, fixed at 2 MB: The basic unit of disk I/O. Each macroblock header records a checksum.
  • Microblocks, approximately 16 KB: The basic unit of read I/O. Each microblock header records a checksum.
  • SSTables and partitions: Higher-level data-organization boundaries that also maintain checksum information.

This gives each storage granularity its own integrity-protection mechanism.

3. Full-Path Coverage: Verification Across the Data Flow

Physical verification does not rely on a single centralized scan. Instead, it is embedded in critical stages of the data flow, allowing corruption to be detected before it moves further through the system.

  • Data writes: When data is written into macroblocks during minor or major compaction, OceanBase verifies the written data immediately. This helps catch silent errors on the write path, where a write may appear to succeed even though the data has been silently altered.
  • Data reads: The system enforces checksum verification on microblock headers. Every microblock accessed by a user query must pass verification before it can be processed further, ensuring that corrupted data is not returned to the application.
  • Data replication: During migration, backup, and similar scenarios, the destination verifies the integrity of source data before writing it into macroblocks. This prevents corrupted data from being propagated to other nodes or backup media.
  • Data at rest: A background inspection thread periodically scans all macroblocks and verifies their checksums. This is especially important for cold data, such as historical partitions and archived tables, which may remain untouched for long periods. If verification were triggered only by foreground reads, latent corruption in such data might never be discovered.

Logical Consistency: Cross-Replica Verification

Physical integrity is only part of the story. A piece of data may be physically intact and still be logically wrong.

1. The Limits of Physical Verification

Physical verification can answer the question: Has this copy of data been physically corrupted? But it cannot answer whether replicas or data structures are logically consistent.

For example, a microblock may pass checksum verification, which only proves that this specific copy is intact at the storage level. It does not tell us:

  • Whether the three replicas contain exactly the same data at the same logical version;
  • Whether corresponding column values in a base table and its index match.

Such issues may stem from software defects, index-maintenance anomalies, or concurrency-control problems. They are logical inconsistencies and do not necessarily leave any trace at the physical layer.

2. How Logical Verification Works

Physical verification focuses on the storage integrity of a single copy of data and can be performed independently on each replica. Logical verification is different: it must compare the data states of multiple replicas at the same logical point in time. In a system with continuous writes, replicas may be at different write positions, making a direct comparison of their current states meaningless.

OceanBase solves this by using daily major compaction as a natural verification point. All replicas generate baseline data from the same globally consistent snapshot version. Once compaction completes, OceanBase compares data checksums across replicas. If a mismatch is detected, compaction is paused immediately and an alert is triggered. OceanBase also compares checksums between base-table columns and index columns to detect logical divergence between the two.

Because logical verification reuses data that has already been fully read and written during compaction, it does not require a separate full scan.

3. Base Tables and Indexes: Consistency Across Structures

In addition to cross-replica consistency, OceanBase compares column-level checksums between base tables and indexes. This verification can detect incorrect mappings between a local index and its base table within the same partition, inaccurate cross-partition mappings in a global index, and duplicate key values in a unique index.

If left undetected, these issues may cause queries to return incorrect results through an index path, or cause the same data to produce different answers depending on whether it is accessed through the base table or an index. Such errors are often harder to notice and harder to troubleshoot than an outage.

4. Multi-Replica Architecture as a Consistency Reference

In OceanBase's verification system, multiple replicas provide more than high availability. They also serve as a reference for logical verification: when replicas are compared at the same globally consistent snapshot, majority-consistent results can help identify divergence.

When logical inconsistency is detected, OceanBase pauses compaction, triggers detailed alerts, and writes diagnostic information to system views. The specific view names depend on the target OceanBase version and should be confirmed in the corresponding documentation. Existing SSTables continue serving read requests, so business traffic can continue without immediate interruption. At the same time, the inconsistent data is prevented from being written into a new compaction result.

OceanBase does not automatically repair logical inconsistencies. Operations teams must intervene, identify the root cause, and perform controlled remediation.

Continuous Verification: Architecture Determines Cost

1. The Structural Challenge of Traditional Architectures

High-quality data verification is not free. Frequent full verification usually requires substantial data reads, CPU overhead, and resource contention with business I/O.

In a traditional B+Tree architecture, data is updated in place, leaving no natural point at which the system rewrites the full dataset. Full verification therefore has to run as a separate scan, competing with business I/O for resources. As a result, verification often becomes a low-frequency scheduled or manual task. Coverage is limited, and data errors may remain latent for long periods between verification runs.

2. The Structural Advantage of LSM-Tree

OceanBase's LSM-Tree-based storage engine has a natural advantage here. An LSM-Tree combines incremental writes with periodic compaction: new writes first enter an in-memory MemTable and are later flushed into immutable SSTables. Multiple SSTables are then compacted into newer, more compact SSTables.

This design creates natural verification points:

  • Verification is embedded in data processing: In traditional architectures, full verification requires a dedicated scan. In an LSM-based architecture, physical verification is built into macroblock read and write paths, while logical verification reuses the globally consistent snapshot generated during compaction. Verification becomes part of data processing rather than an extra operation.
  • SSTables are immutable: The new baseline generated by compaction is stable, so verification does not race with ongoing modifications.

This does not make verification cost-free. Checksum calculation still consumes CPU resources, and result comparison requires scheduling and metadata management. However, compared with independent full scans in traditional architectures, the LSM-Tree architecture embeds verification into necessary data-processing workflows. This makes high-quality verification closer to a built-in system behavior than a separately scheduled operations task.

Closed-Loop Strategy: Differentiated Recovery for Different Problems

Detecting a problem is only the first step. How the system responds is just as central to the design. OceanBase applies different recovery strategies to physical corruption and logical inconsistency. Behind this design is a clear engineering philosophy.

1. Physical Corruption: Replica-Level Rebuild

When a physical checksum fails and the other replicas pass verification, the corruption is limited to a single replica. In this case:

  • The data reference is clear: healthy replicas contain trustworthy data.
  • The repair path is clear: corrupted data can be rebuilt from healthy replicas.

OceanBase can asynchronously rebuild a corrupted replica from healthy replicas through its replica-rebuild mechanism. During the rebuild process, foreground reads and writes continue to be served by healthy replicas, so business traffic is not interrupted. The specific operational commands should be confirmed in the official operations manual for the target OceanBase version.

If multiple replicas suffer physical corruption at the same time and the damage exceeds replica redundancy, backup-based recovery becomes the final fallback. Detailed backup procedures are outside the scope of this article.

2. Logical Inconsistency: Preserving Evidence and Requiring Human Analysis

When logical verification detects checksum mismatches across replicas, or between a base table and an index, the situation is more complex. The root cause may be a software defect, a configuration issue, or another unknown factor.

OceanBase's strategy emphasizes preserving evidence and avoiding premature or incorrect repair:

  • Automatic pause: Compaction for the affected partitions is automatically paused, preventing incorrect data from being written into a new baseline version.
  • Detailed records: Error information is written to system views, including key diagnostic fields such as tablet ID, base-table and index checksums, and partition information.
  • Serving continuity: Existing SSTables continue serving read requests, helping avoid immediate service interruption.
  • Human intervention: Operations and R&D teams work together to analyze the root cause and determine whether it is a software defect, configuration issue, or another abnormal condition.

The core principle is this: before the root cause is clear, preserving evidence is more important than rushing into repair. Premature repair may hide the underlying software defect, or even turn a traceable logical error into a silent issue that can no longer be reproduced.

3. The Engineering Philosophy Behind Tiered Handling

The following table summarizes OceanBase's differentiated handling strategy:

Problem TypeHandling StrategyReason
Hardware failure/physical corruptionRecover through standardized proceduresThe problem boundary is clear, and healthy replicas provide a reliable reference
Software issue/logical inconsistencyPreserve evidence and wait for analysisAutomated repair before root-cause analysis may break the audit trail and obscure the root cause.
Simultaneous corruption across multiple replicasRecover from backupThe damage exceeds replica-level redundancy.

Automation fits deterministic scenarios, while uncertain scenarios call for human judgment. This is a conscious engineering trade-off between doing more and doing the right thing.

4. Example Scenarios

The following two scenarios illustrate the complete path from detection to handling and verification.

Scenario A: Silent Corruption in a Cold Data Partition

A physical verification closed loop

Consider a payment system with a historical transaction partition that is rarely queried after end-of-day processing. The partition has three replicas.

  1. Detection: At 2:00 a.m., a background inspection task scans the partition and detects a microblock checksum failure on Replica 2, while Replica 1 and Replica 3 pass verification. A checksum error alert is triggered through the OceanBase Cloud Platform (OCP).
  2. Handling: After confirming the alert, the DBA triggers replica rebuild. OceanBase asynchronously pulls correct data from Replica 1 or Replica 3 and rebuilds the corrupted tablet on Replica 2. Business reads and writes continue on the healthy replicas, with no impact on foreground traffic.
  3. Verification: After the rebuild completes, background inspection scans the tablet again and the physical checksum passes. As an additional consistency check, after the next tenant-level major compaction, logical verification confirms that column checksums across replicas are consistent. The business-side end-of-day reconciliation task can provide optional additional confirmation.

Scenario B: Index and Base-Table Checksum Mismatch After Compaction

A logical verification closed loop

Consider an order database where the orders table has a secondary index on customer_id. The tenant's daily major compaction runs during off-peak hours.

  1. Detection: During logical verification after compaction, OceanBase detects a column-checksum mismatch between the index column and the corresponding base-table column on a tablet. Compaction is automatically paused, a checksum error alert is triggered through OCP, and diagnostic information such as tablet ID, table IDs, and checksums is recorded.
  2. Handling: Operations staff avoid manually overwriting or deleting data on either side to preserve evidence. The R&D team analyzes the alert, version number, and compaction logs, checking recent index DDL changes, possible overlap between concurrent writes and compaction, and known defects. After the root cause is identified and fixed, controlled compaction is triggered again according to the official procedure.
  3. Verification: After recompaction completes, logical verification passes. Column checksums are consistent across replicas, the index-column checksum matches the base-table checksum, and no new records appear in the verification-error view. As optional sampling confirmation, business queries for the same customer_id return consistent result sets through both the base-table scan and the index path.

Summary: Availability Keeps the System Running; Correctness Makes the Data Trustworthy

Reliability has two dimensions: service continuity and data correctness. Both are essential. A system that stays online while silently returning incorrect data may be even more dangerous than one that occasionally goes down, because at least an outage is visible.

OceanBase unifies these two goals within the same architecture:

  • Multi-replica deployment and Paxos keep services online.
  • Physical and logical verification keep data trustworthy.
  • The LSM architecture makes full-data verification a default part of compaction, rather than a separately budgeted operations task.
  • Differentiated recovery strategies extend detection into repair and verification, forming a complete closed loop.

Returning to the question at the beginning: the service is online, but is the data still correct?

This should not be a question that humans must periodically verify by hand. It should be a promise that the system continuously fulfills as it runs.

Availability and correctness do not have to be a trade-off.

Further Reading

Share
X
linkedin
mail