Concepts of replicas
In OceanBase Database, a replica refers to a copy of the same data stored on different nodes. Here, "data" is considered from the user's perspective.
At the database level, OceanBase organizes data into partitions. Each partition is redundantly stored in multiple copies, according to the tenant’s locality settings. This approach delivers strong horizontal scalability and robust disaster recovery.
A data partition is created by dividing a table or index into smaller, more manageable segments based on specific table creation rules. Each partition is an independent object, with its own name and optional storage properties.
Note
OceanBase Database is known for its multi-replica architecture, which is based on the Paxos protocol and serves as the foundation for high availability. In this context, a "replica" refers to a copy of the same data residing on different nodes. Within OceanBase Database, data is stored in various types of containers such as data partitions, log streams, units, and tenants. Typically, when we refer to "replica," we usually mean "data partition replica." However, it's important to note that the term "replica" can correspond to different database entities depending on the context.
What a replica does
Replicas enhance the availability and fault tolerance of OceanBase Database. Replicas can be located in different geographic regions to address network or data center failures.
OceanBase Database replicates data to multiple replicas through methods such as partition replication and log synchronization to prevent data loss and ensure it can provide lossless services even when a minority of replicas fail.
Types of replicas
OceanBase Database adopts a layered LSM-Tree structure as its storage engine, which splits data into two components: baseline data and incremental data.
Baseline data, persisted to disk and never modified after generation, is referred to as an SSTable.
The incremental data is stored in memory and written to the MemTable. To ensure transactional integrity, RedoLog is used (also known as CommitLog, or CLog).
In these data redundancies, data is stored on multiple nodes. For example, in a three-center, same-city deployment, there are three replicas. In a five-center deployment, there are five replicas. Paxos is used to synchronize the RedoLog on multiple nodes. Once a transaction is committed, the RedoLog is synchronized across most of the nodes, thus maintaining data consistency among replicas.
The current OceanBase Database version supports the following types of replicas:
Full-featured replica
A fully-featured replica, also known as a regular replica, is named FULL, abbreviated as F, and contains all complete data and features, including RedoLog, MemTable, and SSTable.
A full-featured replica has roles. Each data partition has roles, which are the Leader and the Follower. The Leader mainly provides write services and strongly consistent read services and also provides weakly consistent read services. The Follower provides weakly consistent read services. When the Leader fails, the Follower can quickly become the Leader and provide services to users.
The all-round replica is a mandatory type of replica, and the number of all-round replicas for a single tenant must be greater than or equal to 1. For more information about all-round replicas, see All-round replicas.
Read-only replicas
The name of a read-only replica is READONLY, abbreviated as R. Unlike full-featured replicas, read-only replicas provide only read capabilities and do not support write operations. They can only act as Follower replicas for log streams and do not participate in elections or voting for log leadership. Read-only replicas cannot become the Leader replica for a log stream.
Read-only replicas are optional. You can deploy them based on actual business needs. For more information, see Read-only replicas.
Columnstore replicas
The name of a columnstore replica is COLUMNSTORE, abbreviated as C. A columnstore replica refers to the case where, on the same log stream, all baseline data of user tables are stored in columnar format. User tables here include replicated tables, but exclude index tables, internal tables, and system tables. For example, if a user creates a rowstore table on the F replica, the table is stored in columnar format on the machine where the C replica resides. Similar to the read-only replica R, the columnstore replica does not participate in leader election or log voting, and it contains complete SSTables, clog, and MemTable.
Columnstore replicas are optional replicas, typically used in analytical processing (AP) scenarios. They can be deployed based on actual business needs. For more information about columnstore replicas, see Columnstore replicas.
For detailed deployment of columnstore replicas in AP scenarios, see Overview of OceanBase AP deployment.
Introduction to log streams
What is a log stream?
A log stream is an entity automatically created and managed by OceanBase Database. It represents a collection of data, including several data partitions, and the transaction logs and transaction management structures for these data partitions. The RedoLog module is implemented based on the Paxos protocol, ensuring data consistency across multiple replicas and achieving high availability. The TxCtxMgr is the transaction management structure. Modifications to all data partitions within a log stream can be atomically committed within the log stream. When a transaction spans multiple log streams, OceanBase Database uses its optimized two-phase commit protocol to achieve atomic commitment. Log streams are participants in distributed transactions.

The concept of a log stream was introduced in OceanBase Database V4.0. Compared with OceanBase Database V3.x, the most significant change in OceanBase Database V4.0 is the change in the basic unit of transaction commitment, which brings significant value in terms of resources, performance, and functionality.
In OceanBase Database V3.x, transactions are committed at the partition level. Modifications within a partition are guaranteed to be atomic by the WAL within the partition. Each partition participates in the two-phase commit, and the basic unit of transaction commitment is the partition.
In OceanBase Database V4.x, transactions are committed at the log stream level. Modifications within a log stream are guaranteed to be atomic by the WAL within the log stream. Each log stream participates in the two-phase commit, and the basic unit of transaction commitment is the log stream.
Broadcast log stream
Starting from OceanBase Database V4.2.0, OceanBase Database introduces the concept of a broadcast log stream. When the first replicated table is created for a tenant, a special log stream called a broadcast log stream is automatically created. Subsequent replicated tables are created in this broadcast log stream. The key difference between a broadcast log stream and a regular log stream is that a broadcast log stream automatically deploys a replica on each OBServer node within the tenant, ensuring strong consistency reads from any OBServer node under ideal conditions.
Generally, when too many replicas participate in consensus voting, it takes longer to reach a majority decision. In a tenant with many OBServer nodes, it is impractical to have all replicas on every OBServer node participate in voting. Therefore, a broadcast log stream deploys R replicas (read-only replicas) on OBServer nodes that do not need to participate in voting and F replicas (full-featured replicas) on OBServer nodes that do need to participate in voting.
The differences between broadcast log streams and regular log streams in terms of replicas are as follows:
For regular log streams, each zone can have only one replica, and the replica type must match the one specified in the Locality.
For broadcast log streams, each zone has the replica type specified in the Locality, plus an additional read-only replica on each machine with tenant units in the zone. Zones without a specified replica type in the Locality do not have any replicas.
The usage limitations of broadcast log streams are as follows:
The
systenant and allMetatenants do not have broadcast log streams and do not support the creation of replicated tables.Each user tenant can have at most one broadcast log stream.
Broadcast log streams cannot be converted to regular log streams or vice versa.
Broadcast log streams cannot be manually deleted. They are automatically deleted when the tenant is deleted.
View basic information about log streams
You can query the DBA_OB_LS view to obtain basic information about all log streams in the current tenant, including the status and log progress. For example:
View information about regular log streams
Both the
systenant and user tenants can query theDBA_OB_LSview to obtain basic information about their corresponding log streams. The following example is executed in thesystenant, showing the only log stream (log stream 1) available to thesystenant.obclient(root@sys)[oceanbase]> SELECT * FROM oceanbase.DBA_OB_LS limit 1;The result is as follows.
+-------+--------+----------------------------------------+---------------+-------------+------------+----------+----------+--------------+-----------+-----------+ | LS_ID | STATUS | PRIMARY_ZONE | UNIT_GROUP_ID | LS_GROUP_ID | CREATE_SCN | DROP_SCN | SYNC_SCN | READABLE_SCN | FLAG | UNIT_LIST | +-------+--------+----------------------------------------+---------------+-------------+------------+----------+----------+--------------+-----------+-----------+ | 1 | NORMAL | sa128_obv4_2;sa128_obv4_1,sa128_obv4_3 | 0 | 0 | NULL | NULL | NULL | NULL | | | +-------+--------+----------------------------------------+---------------+-------------+------------+----------+----------+--------------+-----------+-----------+ 1 row in setView information about broadcast log streams
Only user tenants can query the
DBA_OB_LSview to obtain information about broadcast log streams. Thesystenant does not have any broadcast log streams. The following example is executed in a user tenant, showing the broadcast log stream information for that tenant. Replicated tables are created in this log stream.obclient(root@mysql001)[oceanbase]> SELECT * FROM oceanbase.DBA_OB_LS WHERE flag LIKE "%DUPLICATE%";The result is as follows.
+-------+--------+--------------+---------------+-------------+---------------------+----------+---------------------+---------------------+-----------+-----------+ | LS_ID | STATUS | PRIMARY_ZONE | UNIT_GROUP_ID | LS_GROUP_ID | CREATE_SCN | DROP_SCN | SYNC_SCN | READABLE_SCN | FLAG | UNIT_LIST | +-------+--------+--------------+---------------+-------------+---------------------+----------+---------------------+---------------------+-----------+-----------+ | 1003 | NORMAL | z1;z2 | 0 | 0 | 1683267390195713284 | NULL | 1683337744205408139 | 1683337744205408139 | DUPLICATE | | +-------+--------+--------------+---------------+-------------+---------------------+----------+---------------------+---------------------+-----------+-----------+ 1 row in set
View the location and role of a log stream
A log stream has location information that records the nodes where it is distributed. You can query the MEMBER_LIST and LEARNER_LIST fields of the oceanbase.DBA_OB_LS_LOCATIONS view to obtain the distribution of full-featured replicas and read-only replicas, respectively. Data partitions no longer have independent location information; instead, their location is determined by the log stream they belong to. OceanBase Database supports migrating and replicating log streams between nodes to achieve performance balancing and disaster recovery.
A log stream has role information that records whether it is a leader or a follower. You can query the ROLE field of the oceanbase.DBA_OB_LS_LOCATIONS view to obtain the role of a log stream. Data partitions no longer have independent role information; instead, their role is determined by the log stream they belong to. The role of a log stream is determined through an election process.
For more information about the oceanbase.DBA_OB_LS_LOCATIONS view, see DBA_OB_LS_LOCATIONS.
View the mapping between data partitions and log streams
You can query the DBA_OB_TABLE_LOCATIONS view to obtain the mapping between data partitions and log streams in the current tenant. Each record in the view corresponds to a replica of a data partition and contains the basic information of the data partition and the log stream it belongs to.
For more information about the oceanbase.DBA_OB_TABLE_LOCATIONS view, see DBA_OB_TABLE_LOCATIONS.