Concurrency control model
Each transaction involves multiple read/write operations on different data in a database. The simplest concurrency control method is serial execution. In serial execution, a process does not trigger an operation until the previous process completes its operation (receives a response). However, this method does not meet the requirement of high concurrency. Therefore, scholars proposed a serializable method. This method can be used to perform multiple operations of a transaction in parallel (not as a series) and achieve the same results as serial execution.
You can create dependencies between transactions based on read/write operations in transactions. The dependencies determine the sequence of the transactions in serial execution. For example, if Transaction B depends on Transaction A, Transaction A must be executed after Transaction B.
Write dependency: If Transaction B attempts to modify Data X after Transaction A modifies Data X, Transaction B depends on Transaction A.
Read dependency: If Transaction A attempts to read Data X after Transaction B modifies Data X, Transaction A depends on Transaction B.
Anti-dependency: If Transaction B attempts to modify Data X after Transaction A reads Data X, Transaction B depends on Transaction A.
Serializability that is defined by conflicts is known as conflict serializability. You can easily analyze conflict serializability by using the preceding conflict mechanism. Conflicts between transactions can be serialized if they are not circular. Conflict serializability can be implemented by using the following mechanisms: two-phase locking and optimistic locking. Two-phase locking uses locks to limit conflicting modifications of transactions. The mechanism supports deadlock detection to roll back circular transactions and prevent loops. Optimistic locking rolls back all possible circular transactions in the detection phase during a commit to prevent loops.
However, few commercial databases support the serializable isolation level. The preceding implementation mechanisms affect database performance in a significant way. Therefore, specific acceptable circularization conditions are usually allowed to expose exceptions and improve the performance and scalability of transactions. The snapshot read and read committed isolation levels are common concurrency control methods that allow exceptions. The snapshot read isolation level depends on multiple versions of data. It allows you to read data of each version based on a fixed read version. As a result, different versions of data in a transaction cause a loop due to their anti-dependencies. For example, Transaction A reads Data X of version 1 and modifies it to generate Data Y of version 2, and Transaction B reads Data Y of version 1 and modifies it to generate Data X of version 2. In this case, Transaction A and Transaction B form a loop. This exception is usually referred to as write skew. The exception is exposed by the snapshot read isolation level. The read committed isolation level exposes the non-repeatable read exception in which two read results of a transaction are different. The balance between performance and semantic usability is essential to designing transaction isolation levels.
Concurrency control model of OceanBase Database
OceanBase Database supports two isolation levels: snapshot read and read committed. The isolation levels ensure external consistency based on distributed semantics.
Multi-version data and transaction table
To prevent mutual exclusion between reads and writes, OceanBase Database is designed to store multiple versions of data and maintain two global versions for each transaction: a read version and a commit version. The two versions correspond to the maximum local read timestamp and maximum commit timestamp in the following figure. In addition, OceanBase Database records a new version in the memory for each update to prevent mutual exclusion between reads and writes.
As shown in the following figure, the memory stores three rows of data: A, B, and C. Each update is maintained by using version (ts), value (val), and transaction ID (txn), and multiple updates are maintained to retain multiple versions of data. The memory also stores a transaction table, which records the ID, status, and version of each transaction. When you start or commit a transaction, the global timestamp cache obtains a timestamp as a reference for the read timestamp or commit timestamp.

As shown in the preceding figure, the global timestamp cache maintains the maximum read timestamp of encountered transactions and the maximum commit timestamp of committed transactions: 120 and 100 respectively. The purpose of the two timestamps is described in subsequent sections. In the memory, Data A contains committed Data a whose version is 100, and Data a corresponds to Transaction 10. In a similar way, Data B contains Data j whose version has not been determined, and Data j corresponds to Transaction 12. Data C contains Data x whose version has not been determined, and Data x corresponds to Transaction 15. The transaction table records transactions and their status. For example, Transaction 15 enters the two-phase commit state with data of version 130.
Process commit requests
Distributed transactions in OceanBase Database have three possible states: RUNNING, PREPARE, and COMMIT. Transaction status cannot be atomically confirmed in distributed scenarios. Therefore, the PREPARE phase is added to the two-phase commit. OceanBase Database maintains a local commit version (also known as a prepare version) for a transaction in each partition. The global commit version (also known as a commit version) of the transaction is determined by the maximum local commit version among all partitions. In each partition, it is guaranteed that the global commit version of the transaction is greater than or equal to the local commit version. This guarantee is essential to concurrency control of read/write requests.
When you commit a transaction, OceanBase Database starts a two-phase commit process and obtains the maximum read timestamp of each participant partition as the local commit timestamp. This guarantee ensures single-value anti-dependencies. Based on the guarantee, the commit timestamp is greater than all previous read timestamps. Therefore, the commit is executed after the reads in serial execution. As shown in the following figure, Transaction 12 enters the commit phase and changes into the PREPARE state. The local transaction version is set to the larger one of the maximum local read timestamp (120) and the global timestamp (150).

Before the two-phase commit ends, OceanBase Database ensures that the global commit timestamp is greater than or equal to the local commit timestamp. OceanBase Database can obtain the global commit timestamp after receipt of the two-phase commit message. In this way, you do not need to query the transaction table later. In addition, the maximum transaction commit timestamp is updated to support subsequent read request optimization and wake up the corresponding transaction in the lock queue. As shown in the following figure, the backfill status is COMMIT, the timestamp is 160, and the timestamp is asynchronously backfilled to updated data. In this way, you do not need to query the transaction table later. In addition, the maximum transaction commit timestamp is updated to support subsequent read request optimization and wake up the corresponding transaction in the lock queue.

Process write requests
When you write data to OceanBase Database, the database modifies data based on the two-phase lock protocol to ensure a write dependency. When you initiate a write request to a row, OceanBase Database puts the request to the lock manager to wait for processing if multiple versions of data in the row are involved in an active transaction. OceanBase Database maintains the waiting queue in the lock manager and wakes up the write request by using the lock or timeout mechanism. As shown in the following figure, a write request is initiated to modify Data B. However, Data B is being modified by Transaction 12, an active transaction. Therefore, OceanBase Database puts the write request to the lock queue to wait for wake-up by Transaction 12.

The snapshot read isolation level is designed to prevent the circularization of anti-dependencies and read dependencies. The circularization causes lost updates. Even if locking succeeds after data writing or wake-up, OceanBase Database compares the read timestamp with the maximum transaction commit timestamp maintained for the row, and rolls back the transaction if the read timestamp is less than the maximum transaction commit timestamp. As shown in the preceding figure, if the read timestamp of the write operation is 100 and Transaction 12 is committed with timestamp 160, the transaction corresponding to the write operation is rolled back with the TRANSACTION_SET_VIOLATION error reported.
Process read requests
OceanBase Database allows you to read data based on a read version. In a read operation, the maximum local read timestamp is replaced with the read version. The preceding guarantee allows OceanBase Database to gracefully process read requests in distributed scenarios.
When you initiate a read request to a transaction in the COMMIT or ABORT state, OceanBase Database determines whether to read data based on the global commit timestamp and transaction status. As shown in the following figure, the read request r1 attempts to read data based on the read version 90. Based on the snapshot read strategy, OceanBase Database reads data “b” whose version is 80.
When you initiate a read request to a transaction in the RUNNING state, the maximum local read timestamp increases. Therefore, the transaction in the RUNNING state enters the two-phase commit state with a larger local timestamp. Based on the preceding guarantee and the concept of snapshot read, OceanBase Database can safely skip the data. As shown in the following figure, the read request r2 attempts to read data based on the read version 130. This increases the maximum transaction read timestamp and ensures that the transaction is committed with a local or global commit version greater than 130. Then, OceanBase Database skips Transaction 12 that has not entered the two-phase commit state and reads data “b” whose version is 100.
When you initiate a read request to a transaction in the PREPARE state, the preceding guarantee allows OceanBase Database to skip the transaction if its local timestamp is greater than the read timestamp. However, if the local timestamp of the transaction is less than the read timestamp, OceanBase Database cannot determine the relationship between the global commit timestamp of the local timestamp and the read timestamp. Therefore, OceanBase Database gracefully waits upon the transaction, which is called lock for read. The two-phase commit process is supposed to end within a short time in OceanBase Database. As shown in the following figure, the read request r3 attempts to read data based on the read version 140. This increases the maximum transaction read timestamp and ensures that the transaction is committed with a global or local commit version greater than 140. OceanBase Database waits until the transaction enters the two-phase commit state and the local commit timestamp becomes 130, and then determines the relationship between the global commit timestamp and the read timestamp 140.
