Overview|V4.3.3| docs|Distributed Database

Overview

Last Updated：2025-11-27 02:38:06 Updated

Concurrency control model

Each transaction involves multiple read/write operations on different data in a database. The simplest concurrency control method is serial execution. In serial execution, a process does not trigger an operation until the previous process completes its operation and receives a response. However, this method does not meet the requirement of high concurrency. Therefore, scholars proposed a serializable method. This method can be used to perform multiple operations of a transaction in parallel rather than as a series and achieve the same results as serial execution.

You can create dependencies between transactions based on read/write operations in transactions. The dependencies determine the sequence of the transactions in serial execution. Dependencies fall into the following three types:

Write dependency: If Transaction B attempts to modify Data X after Transaction A modifies Data X, Transaction B depends on Transaction A.
Read dependency: If Transaction A attempts to read Data X after Transaction B modifies Data X, Transaction A depends on Transaction B.
Anti-dependency: If Transaction B attempts to modify Data X after Transaction A reads Data X, Transaction B depends on Transaction A.

Serializability defined by these dependencies is known as conflict serializability. Conflict serializability is possible if dependencies between transactions do not result in a loop. Conflict serializability can be implemented by using the following mechanisms: two-phase locking and optimistic locking. Two-phase locking uses exclusive locks to limit conflicting modifications of transactions. It also supports deadlock detection to roll back transactions that lead to a loop. Optimistic locking rolls back all transactions that may result in a loop in the detection phase during a commit.

However, the serializable isolation level is rarely used in commercial databases due to its high performance cost. Therefore, specific acceptable exceptions are often allowed to improve the performance and scalability of transactions. Common isolation levels include snapshot read and read committed. The snapshot read isolation level relies on maintaining multi-version data, where fixed read versions are used to read corresponding data. This can lead to a loop caused by anti-dependencies. For example, if Transaction A reads Data X of version 1 and modifies Data Y, while Transaction B reads Data Y of version 1 and modifies Data X, write skew occurs. The read committed isolation level exposes non-repeatable reads in which two read results of a transaction are different. The balance between performance and usability is essential to designing transaction isolation levels.

Concurrency control model of OceanBase Database

OceanBase Database supports two isolation levels: snapshot read and read committed. The isolation levels ensure external consistency in distributed environments.

Multi-version data and transaction table

To prevent mutual exclusion between reads and writes, OceanBase Database is designed to store multiple versions of data and maintain two global versions: read version and commit version. The two versions correspond to the maximum local read timestamp and maximum transaction commit timestamp in the following figure. OceanBase Database records a new version in the memory for each update to prevent mutual exclusion between reads and writes.

As shown in the following figure, the memory stores three rows of data: A, B, and C. Each update is maintained by using version (ts), value (val), and transaction ID (txn), and multiple updates are maintained to retain multiple versions of data. The memory also stores a transaction table, which records the ID, status, and version of each transaction. When you start or commit a transaction, a timestamp is obtained from the global timestamp cache as a reference for the read timestamp or commit timestamp.

Concurrency control 1

As shown in the preceding figure, the global timestamp cache maintains the maximum read timestamp of encountered transactions and the maximum commit timestamp of committed transactions: 120 and 100 respectively. The purpose of the two timestamps is described in subsequent sections. In the memory, Data A contains committed Data a whose version is 100, and Data a corresponds to Transaction 10. In a similar way, Data B contains Data j whose version has not been determined, and Data j corresponds to Transaction 12. Data C contains Data x whose version has not been determined, and Data x corresponds to Transaction 15. The transaction table records transactions and their status. For example, Transaction 15 enters the two-phase commit state with data of version 130.

Process commit requests

Distributed transactions in OceanBase Database have three possible states: RUNNING, PREPARE, and COMMIT. Transaction status cannot be atomically confirmed in distributed scenarios. Therefore, the PREPARE phase is introduced in the two-phase commit. OceanBase Database maintains a local commit version (also known as a prepare version) for a transaction in each partition. The global commit version (also known as the commit version) of the transaction is determined by the maximum local commit version among all partitions. It is guaranteed that the global commit version of a transaction is greater than or equal to the local commit version of each partition. This guarantee is essential to concurrency control of read/write requests.

When you commit a transaction, OceanBase Database starts a two-phase commit process and obtains the maximum local read timestamp for each partition as its local commit timestamp (or local commit version). For the first participant partition, the global timestamp is also obtained from the global timestamp cache, and the system compares the global timestamp with the maximum local read timestamp to use the larger one as the local commit timestamp of the first partition. This guarantee ensures single-value anti-dependencies. Based on the guarantee, the commit timestamp is greater than all previous read timestamps. Therefore, the commit is executed after the reads in serial execution, preventing the previous transactions from reading the data of this transaction. As shown in the following figure, Transaction 12 enters the commit phase and changes into the PREPARE state. Assuming that the current partition is the first partition, its local commit version is set to 150, which is the larger one of the maximum local read timestamp (120) and the global timestamp (150).

Concurrency control 2

Before the two-phase commit ends, OceanBase Database ensures that the global commit timestamp is greater than or equal to the local commit timestamp. OceanBase Database can obtain the global commit timestamp after receipt of the two-phase commit message. As shown in the following figure, the backfill status is COMMIT, the timestamp is 160, and the timestamp is asynchronously backfilled to updated data. In this way, you do not need to query the transaction table later. In addition, the maximum transaction commit timestamp is updated to optimize subsequent read requests and wake up transactions in the lock queue.

Concurrency control 3

Process write requests

When you write data to OceanBase Database, the database modifies data based on the two-phase lock protocol to ensure a write dependency. When you initiate a write request to a row, OceanBase Database puts the request to the lock manager to wait for processing if multiple versions of data in the row are involved in an active transaction. OceanBase Database maintains the waiting queue in the lock manager and wakes up the write request by using the lock or timeout mechanism. As shown in the following figure, a write request is initiated to modify Data B. However, Data B is being modified by Transaction 12, an active transaction. Therefore, OceanBase Database puts the write request to the lock queue to wait for wake-up by Transaction 12.

Concurrency control 4

To prevent loops caused by anti-dependencies and read dependencies and avoid lost updates, OceanBase Database compares the read timestamp with the maximum transaction commit timestamp maintained for a row after a write operation locks the row at the snapshot read isolation level. If the read timestamp is less than the maximum transaction commit timestamp, OceanBase Database rolls back the transaction. For example, if the read timestamp of the write operation is 100 and Transaction 12 is committed with timestamp 160, the transaction corresponding to the write operation is rolled back with a TRANSACTION_SET_VIOLATION error.

The way to determine the read timestamp of a write operation varies depending on isolation levels.

At the read committed isolation level, the read timestamp of a write operation is the timestamp obtained at the start of the statement. If a TRANSACTION_SET_VIOLATION error is returned, OceanBase Database redoes the statement and obtains a new read timestamp without rolling back the entire transaction.
At the snapshot isolation level, the read timestamp of a write operation is the timestamp obtained at the start of the transaction. If a TRANSACTION_SET_VIOLATION error is returned, OceanBase Database rolls back the entire transaction to avoid lost updates.

Process read requests

OceanBase Database allows you to read data based on a read version and updates the maximum local read timestamp. The preceding guarantee allows OceanBase Database to gracefully process read requests in distributed scenarios.

Read request processing varies in different scenarios. When you initiate a read request to a transaction in the COMMIT or ABORT state, OceanBase Database determines whether to read data based on the global commit timestamp and transaction status. As shown in the following figure, Read Request r1 attempts to read data by using the read version 90. Based on the snapshot read strategy, OceanBase Database reads Data b whose version is 80.

When you initiate a read request to a transaction in the RUNNING state, the maximum local read timestamp increases. Therefore, the transaction in the RUNNING state enters the two-phase commit state with a larger local commit timestamp, which means that data in the transaction will be skipped. As shown in the following figure, Read Request r2 attempts to read data by using the read version 130. As the maximum local read timestamp increases, OceanBase Database skips Transaction 12 that has not entered the two-phase commit state and reads Data b whose version is 100.

When you initiate a read request to a transaction in the PREPARE state, OceanBase Database skips the data if the local commit timestamp of the transaction is greater than the read timestamp. If the local commit timestamp of the transaction is less than the read timestamp, OceanBase Database cannot determine which one of the global commit timestamp and the read timestamp is larger. In this case, OceanBase Database waits (lock for read). As shown in the following figure, Read Request r3 attempts to read data by using the read version 140. As the maximum local read timestamp increases, OceanBase Database waits until the transaction in the PREPARE state with the version 130 enters the COMMIT state, and compares the updated maximum transaction commit timestamp with the read timestamp 140.

Concurrency control 5