Replica concept
A replica is a copy of data in OceanBase Database's storage engine. OceanBase Database allows you to scale out horizontally and provide advanced disaster recovery capabilities by replicating data of the same tenant across multiple nodes.
A data partition is a logical container for organizing and storing data. It is also the primary mechanism for scaling out horizontally. OceanBase Database replicates multiple copies of a data partition based on the locality attribute of the tenant to provide higher disaster recovery capabilities.
You can decompose a table or index into multiple, smaller, and more manageable parts based on some rules. Each of these parts is a data partition and is a separate object with its own name and optional storage characteristics.
Note
OceanBase Database is renowned for its multi-replica architecture, which is built on the Paxos protocol. The high availability of the multi-replica architecture is the foundation of the overall high availability of OceanBase Database. In this context, a replica means a copy of data on a different node. However, data in OceanBase Database is stored in different entities, such as data partitions, log streams, units, and tenants. Therefore, a replica can refer to different database entities depending on the context.
Replicas
Replicas improve the availability and fault tolerance of OceanBase Database. Replicas can be deployed across different geographical locations to cope with network failures or data center failures.
OceanBase Database replicates data across multiple replicas through partition replication and log synchronization to prevent data loss and ensure that the database services are lossless even if some replicas fail.
Types of replicas
The storage engine of OceanBase Database adopts a layered LSM-tree structure. Data in this structure is divided into two parts: baseline data and incremental data.
Baseline data is data that is written to the disk and persisted. Once baseline data is generated, it will not be modified. Baseline data is stored in SSTables.
Incremental data is data that is stored in memory. When you write data, it is written to the incremental data first, and then persisted to the baseline data through redo logs. Redo logs also serve as commit logs (referred to as clogs) and ensure the transactionality of data persistence.
Multiple redundant copies (for example, three in a geo-distributed architecture with three IDCs and five in a geo-distributed architecture with five IDCs) of data are distributed across multiple nodes. When a transaction is committed, redo logs on multiple nodes are synchronized through the Paxos protocol to ensure majority vote, thereby maintaining data consistency among replicas.
OceanBase Database supports full-featured replicas and read-only replicas in the current version. Full-featured replicas, also known as primary replicas, are named FULL and referred to as F. They store complete data, including redo logs, MemTables, and SSTables. Read-only replicas are named READONLY and referred to as R. They provide only read capabilities and do not participate in elections or log voting, nor can they become leaders of log streams.
Full-featured replicas have the concept of a role, which applies to data partitions. The roles are leader and follower. Leaders provide write services and strong-consistency read services. They can also provide weak-consistency read services. Followers provide only weak-consistency read services. In the case of leader failure, followers can quickly switch to leaders.
Log stream introduction
Log stream concepts
A log stream is an entity automatically created and managed by OceanBase Database. It represents a collection of data, including multiple data partitions and the transaction logs and transaction management structures for operating on the data. The redo log module, which is implemented based on the Paxos protocol, synchronizes logs across replicas to ensure data consistency and achieve high availability. The TxCtxMgr transaction management structure ensures that modifications on all data partitions in a log stream can be atomically committed within the log stream. For a transaction spanning multiple log streams, an optimized two-phase commit protocol is used to ensure the atomicity of the commit. As a participant in a distributed transaction, a log stream is involved in the commit or rollback process when the transaction fails.

In OceanBase Database V4.0, a log stream is a new concept introduced to the database. Compared with OceanBase Database V3.x, where the basic unit of transaction commit is a partition, OceanBase Database V4.x uses a log stream as the basic unit of transaction commit. This change brings significant benefits in terms of resources, performance, and features.
In OceanBase Database V3.x, OceanBase Database uses a partition as the basic unit of transaction commit. Modifications within a partition are guaranteed atomic by the write-ahead log (WAL) mechanism within the partition. Each partition participates in a two-phase commit, making it the basic unit of transaction commit.
In OceanBase Database V4.x, OceanBase Database uses a log stream as the basic unit of transaction commit. Modifications within a log stream are guaranteed atomic by the write-ahead log (WAL) mechanism within the log stream. Each log stream participates in a two-phase commit, making it the basic unit of transaction commit.
Broadcast log stream
Starting from OceanBase Database V4.2.0, the concept of a broadcast log stream is introduced. When the first replicated table is created for a tenant, a special log stream, known as a broadcast log stream, is automatically created for the tenant. Then, any new replicated table created for the tenant will be created in the broadcast log stream. Unlike a normal log stream, a broadcast log stream is automatically deployed with a replica on each OBServer node in the tenant to ensure strong consistency for reads from any OBServer node in ideal conditions.
Generally, the more replicas participate in voting, the longer it takes to reach a majority. If there are multiple OBServer nodes in a tenant, it is not necessary to deploy replicas on all OBServer nodes for voting. Therefore, in such a case, the broadcast log stream will deploy read-only (RO) replicas (also known as only-read replicas) on non-voting OBServer nodes, and full-featured (F) replicas on OBServer nodes that participate in voting.
The differences between a broadcast log stream and a normal log stream in terms of replica deployment are as follows:
In a normal log stream, each zone can have only one replica, and the type of the replica must match the type specified in the locality.
In a broadcast log stream, in addition to the replica of the type specified in the locality for each zone, an RO replica is deployed on each server in the zone that has unit resources of the tenant. Zones without specified replica types in the locality do not have any replicas.
The limitations on broadcast log streams are as follows:
The
systenant and allMetatenants do not have broadcast log streams and do not support creating replicated tables.Each user tenant can have at most one broadcast log stream.
Replica attributes cannot be converted between a broadcast log stream and a normal log stream.
Broadcast log streams can only be deleted when the corresponding tenant is deleted.
Query basic information about log streams
You can query the DBA_OB_LS view for basic information about all log streams in the current tenant, such as the status and log progress. For example:
Query information about normal log streams
Both the sys tenant and user tenants can view the basic information about the log streams in the current tenant. The following example shows the execution of the query in the sys tenant. The sys tenant has only one log stream, which is log stream 1.
SELECT * FROM oceanbase.DBA_OB_LS limit 10;The result is as follows.
+-------+--------+----------------------------------------+---------------+-------------+------------+----------+----------+--------------+-----------+ | LS_ID | STATUS | PRIMARY_ZONE | UNIT_GROUP_ID | LS_GROUP_ID | CREATE_SCN | DROP_SCN | SYNC_SCN | READABLE_SCN | FLAG | +-------+--------+----------------------------------------+---------------+-------------+------------+----------+----------+--------------+-----------+ | 1 | NORMAL | sa128_obv4_2;sa128_obv4_1,sa128_obv4_3 | 0 | 0 | NULL | NULL | NULL | NULL | | +-------+--------+----------------------------------------+---------------+-------------+------------+----------+----------+--------------+-----------+ 1 row in setQuery information about broadcast log streams
Only user tenants can view the information about broadcast log streams. The sys tenant does not have broadcast log streams. The following example shows the execution of the query in a user tenant. The result shows the broadcast log stream information in the user tenant, where replicated tables are created.
SELECT * FROM oceanbase.DBA_OB_LS WHERE flag LIKE "%DUPLICATE%";The result is as follows.
+-------+--------+--------------+---------------+-------------+---------------------+----------+---------------------+---------------------+-----------+ | LS_ID | STATUS | PRIMARY_ZONE | UNIT_GROUP_ID | LS_GROUP_ID | CREATE_SCN | DROP_SCN | SYNC_SCN | READABLE_SCN | FLAG | +-------+--------+--------------+---------------+-------------+---------------------+----------+---------------------+---------------------+-----------+ | 1003 | NORMAL | z1;z2 | 0 | 0 | 1683267390195713284 | NULL | 1683337744205408139 | 1683337744205408139 | DUPLICATE | +-------+--------+--------------+---------------+-------------+---------------------+----------+---------------------+---------------------+-----------+
View the location and role information of log streams
Log streams contain location information that indicates the nodes on which they are distributed. You can query the MEMBER_LIST and LEARNER_LIST columns in the oceanbase.DBA_OB_LS_LOCATIONS view for the distribution of full-featured replicas and read-only replicas, respectively. Data partitions no longer have individual location information. Instead, they inherit the location information of the log stream to which they belong. Log streams can be migrated and replicated to different nodes for performance balancing and disaster recovery.
Log streams contain role information that indicates whether they are leaders or followers. You can query the ROLE column in the oceanbase.DBA_OB_LS_LOCATIONS view for the role information of log streams. Data partitions no longer have individual role information. Instead, they inherit the role of the log stream to which they belong. Log stream roles are elected based on the election protocol.
For more information about the oceanbase.DBA_OB_LS_LOCATIONS view, see DBA_OB_LS_LOCATIONS.
View the mappings between data partitions and log streams
You can query the DBA_OB_TABLE_LOCATIONS view for the mappings between data partitions and log streams in the current tenant. Each replica of each data partition is represented as a record in the view, which provides basic information about the data partition and the log stream to which it belongs.
For more information about the oceanbase.DBA_OB_TABLE_LOCATIONS view, see DBA_OB_TABLE_LOCATIONS.