Based on the multiversion concurrency control (MVCC) and row lock, OceanBase Database implements the concurrency control logic, lock-free read, and mutual exclusion (mutex) for writes. As a result, read-read, read-write, and write-read operations do not block each other, which greatly improves the concurrency capability of the system. The read-read concurrency does not modify the data. Therefore, the data correctness problem does not exist. In terms of the design of concurrency control, the following section discusses three situations: read-write, write-read, and write-write.
Transaction version
Statement snapshot
At the read committed (RC) isolation level, each statement can read the latest data before the statement is executed. The system needs to retrieve the latest snapshot version before a statement is executed. This version is called a statement snapshot.
Transaction snapshot
At the serializable isolation level, each statement in a transaction can only see the data before the transaction is started. This requires the statement to retrieve the latest snapshot version before the transaction is started. This version is called a transaction snapshot. At the repeatable read (RR) isolation level, OceanBase Database also uses transaction snapshots.
Commit version
During transaction commit, a version for the modified data is required. This version is called the transaction commit version. If the Global Timestamp Service (GTS) is enabled, the commit version can be retrieved and confirmed by the GTS leader. If GTS is disabled, the servers where the data is located jointly decide the commit version.
Consistency read
OceanBase Database supports both strong- and weak-consistency reads. The strong-consistency read requires reading data on the leader based on the snapshot information, while the weak consistency read allows reading data of an earlier version.
Strong-consistency read
For cross-IDC query statements, OceanBase Database needs to retrieve the globally consistent snapshots. Otherwise, the consistency of strong reads cannot be ensured. In this case, make sure that GTS is enabled.
For standalone queries, the system does not need to rely on GTS. After the system retrieves the latest commit version of the partition to be read, it can take the latest version as the read snapshot to ensure the consistency of strong reads.
Weak-consistency read
You can use the weak read function by:
- Adding hint to the query statement
Select /*+read_consistency(weak)*/ * from test where c1=1;
Setting system variables
Session-level settings:
set @@ob_read_consistency=2, where 1 = FROZEN, 2 = WEAK, and 3 = STRONG;User-level settings:
set global ob_read_consistency=2;OceanBase Database maintains the latest tenant-level version that is readable and secure. This version determines the statement snapshot of weak read statements. Weak reads are meaningless at the non-RC isolation level, so OceanBase Database only supports weak reads at the RC isolation level.
Unless otherwise specified, the concurrency control discussed in the following section refers to strong-consistency reads.
Concurrent write/read
For a row of data, the read-write transaction and read-only transaction are concurrently run. This row is being modified by the read-write transaction and has not been committed when a read-only transaction is in progress. This is called concurrent write/read.
In this scenario, the following logic is used to ensure the data correctness in OceanBase Database:
During the commit of a read-write transaction (T1), the commit version of T1 is unknown before it is committed. At this time, if a read-only transaction (T2) reads the row concurrently, is it possible to read the modifications of T1? This differs in different situations. If the commit version of T1 is earlier than the read version of T2, T2 needs to read the modifications of T1. Otherwise, T2 does not need to read the modifications of T1.
Specifically, if T1 fails to receive the commit from the user at this time, the read version of T2 is earlier than the commit version of T1. Therefore, T2 does not need to wait for the commit of T1. However, if T1 is in the commit process at this time, T2 can decide whether to read the data modified by T1 after T1 verifies the commit version. Here, the waiting process is transparent to the user, so "writes still do not block reads".
Concurrent read/write
For a row of data, the read-write transaction and read-only transaction are run concurrently. Before the read-write transaction operates on the row, the read-only transaction has already performed the read operation on the row. This is called concurrent read/write. In this scenario, the following logic is used to ensure the data correctness in OceanBase Database:
When T2 is reading data, T1 is still running and the client has not issued a commit version. Therefore, the read version of T1 must be earlier than the commit version of T2, which means that T1 does not need to read the modifications of T2. T1 and T2 do not wait for the commit of each other.
Concurrent write/write
OceanBase achieves concurrent write/write by adding a mutex to each row. In other words, if a transaction on a row has not been committed, the following transaction that needs to perform operations on the same row must wait. In the concurrent write/write scenario, the lost update problem needs to be solved. Assume that transactions T1 and T2 concurrently operate on the same row (R1) in a single-partition table (Table A), and the column to be updated has a local index. Execution procedure:
The statements of T1 and T2 start to execute. They retrieve the same statement snapshot versions. Suppose that the snapshot versions are both 100.
T1 obtains the row lock before T2 and performs the update operation during its execution. T2 encounters the row lock conflict and retries until the row lock is released.
T1 releases the row lock after the commit.
How is T2 run? If T2 directly holds the row lock and modifies the data, the data consistency problem will occur, because T2 does not see the modifications of T1. Instead, T2 directly overwrites the data. To solve the problem, the system double-checks the data consistency after T2 obtains the row lock of R1. If the system finds a consistency problem, it solves the problem in different ways based on different isolation levels:
At the RC isolation level, the system retries to execute the statement of T2.
At the serializable isolation level, the statement of T2 cannot be retried and the following error information will be returned to the user:
ORA-08177: Cannot serialize access for this transaction
When OceanBase returns the ORA-08177 error, you can make the following decisions based on your business requirements: