Replica concept
A replica is a copy of data in OceanBase Database's storage engine. Multiple replicas of the same data reside on different nodes. Here, data is a user-level concept.
At the OceanBase Database level, data partitions are used to enable horizontal scaling and provide advanced disaster recovery capabilities. Each data partition is redundant across multiple nodes based on the locality attribute of the tenant.
A data partition is a container that holds a table or an index and that is divided into multiple smaller and more manageable parts based on specific rules. Each data partition is an independent object with its own name and optional storage characteristics.
Note
OceanBase Database is renowned for its multi-replica architecture, which is built on the Paxos protocol. The high availability of the multi-replica architecture is the foundation of the database's high availability. In this context, a replica is a copy of data on a different node. However, data in OceanBase Database can be carried by different entities, such as data partitions, log streams, units, and tenants. Generally, when we refer to a replica, we are referring to a data partition replica. However, it's important to note that different replicas may correspond to different database entities in different contexts.
Replicas
Replicas improve the availability and fault tolerance of the OceanBase database. Replicas can be distributed in different geographical locations to deal with network or data center failures.
OceanBase Database replicates data across multiple replicas through partition replication and log synchronization to prevent data loss and ensure that the database remains available even if some replicas fail.
Types of replicas
The storage engine of OceanBase Database adopts a layered LSM-tree structure. Data in this structure is divided into two parts: baseline data and incremental data.
Baseline data is data that is written to the disk and persisted. Once generated, the baseline data will not be modified. This data is stored in SSTables.
Incremental data is data that is stored in memory. When you write data, it is written to the incremental data first. This data is stored in MemTables. To ensure transactionality (also known as CommitLogs or clogs).
These data are redundantly stored in multiple replicas (for example, three replicas in a geo-distributed cluster with two IDCs, or five replicas in a geo-distributed cluster with three IDCs). When a transaction is committed, the Paxos protocol is used to synchronize the redo logs among multiple nodes to achieve majority commit. This maintains consistency among replicas.
OceanBase Database supports full-featured replicas and read-only replicas in the current version. Full-featured replicas are also known as normal replicas. They are named FULL and referred to as F. Full-featured replicas store all types of data, including redo logs, MemTables, and SSTables. Read-only replicas are named READONLY and referred to as R. Read-only replicas provide only read capabilities and do not provide write capabilities. Therefore, they can only serve as followers in log stream replication and cannot participate in leader election or vote on leaders. They cannot be elected as leaders of log streams.
Full-featured replicas have the concept of a role, which applies to data partitions. The roles are leader and follower. Leaders primarily provide write services and strong-consistency read services, but can also provide weak-consistency read services. Followers provide only weak-consistency read services. In the case of leader failure, followers can quickly switch to leaders.
Log stream introduction
Log stream concepts
A log stream is an entity automatically created and managed by OceanBase Database. It represents a collection of data, including multiple data partitions and transaction logs and transaction management structures for the partitions. The redo log module, which is implemented based on the Paxos protocol, synchronizes logs among replicas to ensure data consistency and achieve high availability. The TxCtxMgr transaction management structure ensures that modifications of all data partitions within a log stream can be atomically committed within the log stream. For a transaction spanning multiple log streams, OceanBase Database uses an optimized two-phase commit protocol to ensure the atomicity of the transaction. In this way, a log stream participates in distributed transactions.

Log streams are a new concept introduced in OceanBase Database V4.0. Compared with OceanBase Database V3.x, where the basic unit of transaction commit is a partition, OceanBase Database V4.x uses a log stream as the basic unit of transaction commit. This change brings significant benefits in terms of resources, performance, and features.
In OceanBase Database V3.x, OceanBase Database uses partitions as the basic unit of transaction commit. modifications within a partition are ensured to be atomic by the write-ahead log (WAL) mechanism within the partition. Each partition participates in a two-phase commit, making the partition the basic unit of transaction commit.
In OceanBase Database V4.x, OceanBase Database uses log streams as the basic unit of transaction commit. modifications within a log stream are ensured to be atomic by the WAL mechanism within the log stream. Each log stream participates in a two-phase commit, making the log stream the basic unit of transaction commit.
Broadcast log stream
Starting from OceanBase Database V4.2.0, the concept of a broadcast log stream is introduced. When the first replicated table is created for a tenant, a special log stream, called a broadcast log stream, is automatically created for the tenant. Then, any new replicated table of the tenant is created in the broadcast log stream. A broadcast log stream differs from a normal log stream in that the broadcast log stream will have a replica deployed on each OBServer node of the tenant in an attempt to ensure strong consistency for reads from the replicated tables on any OBServer node in ideal conditions.
Generally, the more replicas participate in voting, the longer it takes to reach a majority. If a tenant has multiple OBServer nodes, it is not necessary to deploy replicas on all OBServer nodes for voting. Therefore, in this case, replicas (READONLY, or read-only, replicas) are deployed on non-voting OBServer nodes, and full-featured replicas (FULL, or full-featured, replicas) are deployed on OBServer nodes that participate in voting.
The differences between a broadcast log stream and a normal log stream in terms of replica deployment are as follows:
In a normal log stream, each zone can have only one replica, and the type of the replica must match the type specified in the locality.
In a broadcast log stream, in addition to the replica of the type specified in the locality for each zone, read-only replicas are deployed on other servers within the zone that have tenant unit resources. Zones not described in the locality can have no replicas.
A broadcast log stream has the following limitations:
The
systenant and allMetatenants do not have broadcast log streams and do not support creating replicated tables.Each user tenant can have at most one broadcast log stream.
Replicas cannot be converted from one log stream type to another.
Broadcast log streams can only be deleted when the corresponding tenant is deleted.
Query basic information about log streams
You can query the DBA_OB_LS view for the basic information about all log streams in the current tenant, such as the status and log progress. For example:
Query information about normal log streams
Both the sys tenant and user tenants can view the basic information about the log streams in the current tenant. The following example shows the execution of the query in the sys tenant. The sys tenant has only one log stream, which is log stream 1.
SELECT * FROM oceanbase.DBA_OB_LS limit 10;The result is as follows.
+-------+--------+----------------------------------------+---------------+-------------+------------+----------+----------+--------------+-----------+ | LS_ID | STATUS | PRIMARY_ZONE | UNIT_GROUP_ID | LS_GROUP_ID | CREATE_SCN | DROP_SCN | SYNC_SCN | READABLE_SCN | FLAG | +-------+--------+----------------------------------------+---------------+-------------+------------+----------+----------+--------------+-----------+ | 1 | NORMAL | sa128_obv4_2;sa128_obv4_1,sa128_obv4_3 | 0 | 0 | NULL | NULL | NULL | NULL | | +-------+--------+----------------------------------------+---------------+-------------+------------+----------+----------+--------------+-----------+ 1 row in setQuery information about broadcast log streams
Only user tenants can view the information about broadcast log streams. The sys tenant does not have broadcast log streams. The following example shows the execution of the query in a user tenant. The result shows the broadcast log stream information of the user tenant, and replicated tables are created in the broadcast log stream.
SELECT * FROM oceanbase.DBA_OB_LS WHERE flag LIKE "%DUPLICATE%";The result is as follows.
+-------+--------+--------------+---------------+-------------+---------------------+----------+---------------------+---------------------+-----------+ | LS_ID | STATUS | PRIMARY_ZONE | UNIT_GROUP_ID | LS_GROUP_ID | CREATE_SCN | DROP_SCN | SYNC_SCN | READABLE_SCN | FLAG | +-------+--------+--------------+---------------+-------------+---------------------+----------+---------------------+---------------------+-----------+ | 1003 | NORMAL | z1;z2 | 0 | 0 | 1683267390195713284 | NULL | 1683337744205408139 | 1683337744205408139 | DUPLICATE | +-------+--------+--------------+---------------+-------------+---------------------+----------+---------------------+---------------------+-----------+
View the location and role information of log streams
Log streams contain location information that indicates the nodes on which they are distributed. You can query the MEMBER_LIST and LEARNER_LIST columns of the oceanbase.DBA_OB_LS_LOCATIONS view for the distribution of full-featured replicas and read-only replicas, respectively. Data partitions no longer have their own location information. Instead, they inherit their locations from the log streams to which they belong. Log streams can be migrated or replicated to different nodes for performance balancing and disaster recovery.
Log streams contain role information that indicates whether they are leaders or followers. You can query the ROLE column of the oceanbase.DBA_OB_LS_LOCATIONS view for the role of each log stream. Data partitions no longer have their own role information. Instead, they inherit their roles from the log streams to which they belong. The roles of log streams are elected based on the election protocol.
For more information about the oceanbase.DBA_OB_LS_LOCATIONS view, see DBA_OB_LS_LOCATIONS.
View the mappings between data partitions and log streams
You can query the DBA_OB_TABLE_LOCATIONS view for the mappings between data partitions and log streams in the current tenant. Each replica of each data partition is represented as a row in the view, which records the basic information of the data partition and the log stream to which it belongs.
For more information about the oceanbase.DBA_OB_TABLE_LOCATIONS view, see DBA_OB_TABLE_LOCATIONS.