The intra-tenant routing feature allows ODP to select an appropriate node to execute SQL statements after obtaining the server list of the tenant.
During intra-tenant routing, ODP functions like a common proxy such as HAProxy, and selects a server from the tenant server list to implement a one-to-one correspondence between the client connection and the server connection. This achieves simple implementation, but cannot meet performance and high availability requirements. Therefore, routing affects connection management.
Intra-tenant routing is the most complex part of the routing feature for the consideration of better performance and higher availability. This topic describes leader routing, follower routing, and tenant server routing, information caching, routing strategies, transaction routing, and common issues.
Leader routing
A distributed system usually adopts the multi-replica mechanism for disaster recovery and high availability. To ensure data consistency between replicas, the Paxos or Raft algorithm is often used. In practice, a special replica is present, which usually has the latest data and controls the synchronization of data among the replicas. This replica is called the leader, and other replicas are collectively called followers.
OceanBase Database has only one leader. Therefore, the leader routing strategy is to send SQL statements to this replica. In this example, the select c1 from t1 statement is used to describe two conditions that must be met for leader routing:
The SQL statement operates on (for example, queries, inserts into, updates, or deletes from) an entity table, such as the t1 table in the preceding example.
The request reads the latest data. In other words, the request is a strong read.
In ODP logs, the keyword for leader routing is ROUTE_TYPE_LEADER. To implement leader routing, ODP needs to know the ID and location of the partition to be accessed. In OceanBase Database, a table can have one or multiple partitions.
Single-partition table
The table has only one partition. In this case, ODP can obtain the replica location information from the table name.
Multiple-partition table
The table has multiple partitions. ODP needs to calculate the partition ID based on the table name and partitioning key in the SQL statement, and then obtain the replica location information. Replica information includes both the leader information and follower information. Leader routing uses only the leader information.
Multi-partition routing involves partitioning methods (such as Hash, Range, and List partitioning), partitioning key types (such as number and varchar), partitioning algorithms (such as Hash), and type conversion (if the value type in the SQL statement differs from the type of the partitioning key). Therefore, the implementation is relatively complex. For a subpartitioned table, routing involves the following 10 steps:
Parse the SQL statement to get the table name.
Access an internal table of OceanBase Database based on the table name and check whether the table is a partitioned table.
Parse the column expression, such as c1=1, in the SQL statement.
Access an internal table of OceanBase Database to obtain the information about the partitioned table.
Access an internal table of OceanBase Database to obtain the partition information.
Calculate the
partition_idvalue of the partition based on the column expression.Access an internal table of OceanBase Database to obtain the subpartition information.
Calculate the
partition_idvalue of the subpartition based on the column expression.Calculate the final
partition_idvalue.Access an internal table of OceanBase Database to get location information corresponding to the
partition_idvalue.
Follower routing
Similar to leader routing, follower routing also must meet two conditions:
The SQL statement queries an entity table, such as the t1 table in
select c1 from t1.The request does require the latest data. In other words, the request is a weak read.
These two conditions are different from the conditions for leader routing.
For the first condition, follower routing supports only query statements and does not support other statements. This is also the implementation requirement of the Paxos algorithm.
For the second condition, the weak-read setting
ob_read_consistency=weakmust be explicitly specified by using a hint, in the session, or in another way.
For follower routing, the SQL statement can be sent either to the leader or a follower. Therefore, follower routing has more choices. For more information about the selection from multiple replicas during routing, see Routing strategies.
Read/Write splitting is an important feature related with follower routing. Read/Write splitting on requests can reduce the pressure on the leader. ODP also supports read/write splitting. We are constantly improving the feature and have helped customers solve performance issues in various scenarios, for example, OceanBase Cloud.
Tenant server routing
Sometimes, ODP cannot obtain the leader or follower. In this case, ODP can select a tenant server, which is tenant server routing. Typical scenarios of tenant server routing are as follows:
The SQL statement itself does not contain a table name, for example,
select 1.The server where the leader or follower is located is faulty.
Due to its limitations, ODP fails to obtain the table name from a complex SQL statement, or cannot take a replica route.
Through tenant server routing, ODP sends the SQL statement to a server where the tenant is located, thereby ensuring proper functioning. As with follower routing, tenant server routing also involves selection from multiple replicas, and therefore requires routing strategies.
Information caching
For leader routing, follower routing, and tenant server routing, ODP needs to query replica routing information from the sys tenant. To improve performance and reduce pressure on the sys tenant, ODP caches routing information.
Data freshness is crucial for cached information. Cached information of the sys tenant can be created and updated regularly by accessing internal tables. However, ODP does not adopt this mechanism to cache the information for replica routing. Otherwise, ODP will have to pull replica routing information for too many SQL statements, incurring great pressure on the sys tenant.
Number of SQL statements = Number of replicas × Number of ODPs.
OceanBase Database supports up to hundreds of thousands and even millions of partitions. Therefore, data freshness of cached information is a big challenge for ODP. If expired cached information is used, the route is inaccurate. The following describes how ODP ensures data freshness of cached information.
To view the cached information of a table, log on to the ODP with the root@proxysys account and run the show proxyroute command:
MySQL [(none)]> show proxyroute like 'ob1.hudson tt1 test sbtest1'\G
*************************** 1. row ***************************
cluster_name: ob1.hudson
tenant_name: tt1
database_name: test
table_name: sbtest1
state: AVAIL
partition_num: 1
replica_num: 3
table_id: 1101710651081698
cluster_version: 2
schema_version: 1649196335597728
from_rslist: N
create_time: 2022-04-07 12:41:16
last_valid_time: 2022-04-07 12:41:16
last_access_time: 2022-04-07 12:41:16
last_update_time: 1970-01-01 08:00:00
expire_time: 2022-04-12 12:48:42
relative_expire_time: 2022-04-07 12:40:41
server addr: server[0]=xx.xx.xx.xx:xx,leader,FULL; server[1]=xx.xx.xx.xx:xx,follower,FULL; server[2]=xx.xx.xx.xx:xx,follower,FULL;
This example shows the important information in the cache, including the cluster name, tenant name, database name, table name, number of partitions, number of replicas, time information, address information, and cache status. The cache status information is important for the refresh mechanism, because ODP implements caching strategies by modifying the status information. The cached information can be in one of the following states:
BUILDING: The cached information is being created. ODP needs to wait until the cached information is created before using it.AVAIL: The cached information is in a normal state and can be directly used.DIRTY: The cached information is invalid and inaccurate.UPDATING: The invalid cached information is being updated.DELETED: The cached information has been deleted and cannot be used. It will be cleared later.
ODP refreshes the cache status by modifying the cache status, to ensure data freshness. The following describes the cache refresh mechanism in terms of creation, eviction, and refresh.
Cache creation: When ODP accesses a partition for the first time, it obtains the routing information by querying the
__all_virtual_proxy_schematable of the sys tenant. The specified table name is the actual table name. The table cache is different from the tenant routing information. The cache status changes toAVAILafter the cache is created.Cache eviction: ODP changes the cache status to
DIRTYwhen the OBServer returns the routing inaccuracy error, which is indicated in theis_partition_hitfield carried in the OK packet.Cache refresh: When the status of the cached information changes to
DIRTY, the expired cached information is evicted and the cache is re-created or the cached information is updated.
Note
At present, cache eviction is implemented based on the feedback packet of the OBServer. Therefore, the cache status is not perceived in real time. The cached information can only be refreshed upon an incorrect route feedback. This may cause issues.
Transaction routing
The preceding section describes the routing strategy for a single SQL statement. Some features, such as the transaction feature, involve one or more SQL statements. In this case, transaction routing is required.
OceanBase Database Proxy (ODP) supports two routing modes for transactions. In one mode, ODP routes the statements in a transaction to the same OBServer node for execution. In the other mode, ODP routes the statements in a transaction to different OBServer nodes and synchronizes the transaction status on these nodes to implement the distributed execution of the transaction.
Note
The OceanBase 2.0 protocol is used to synchronize the transaction status.
Select nodes for transaction routing
Two roles are involved in transaction routing: coordinator and participant. The coordinator node is the node that starts the transaction. It runs non-DML statements that affect the transaction status, such as BEGIN, START TRANSACTION, COMMIT, and ROLLBACK. The participant nodes run DML statements that do not affect the transaction status.
ODP comprises a simple SQL parser that parses whether an SQL statement is a DML statement. For a DML statement, ODP routes the statement to proper nodes based on table routing or logical data center (LDC)-based routing. The routed-to nodes are participant nodes of the transaction. The coordinator node runs the first statement in the transaction.
Note
For more information about LDC-based routing, see Routing strategies.
Configure transaction routing
Transaction routing is enabled by default. You can run the following statements to query whether transaction routing is enabled and configure transaction routing.
Log on to the sys tenant as the root user from a client.
Check that the OceanBase 2.0 protocol is enabled.
obclient> SHOW PROXYCONFIG LIKE enable_ob_protocol_v2; obclient> ALTER PROXYCONFIG SET enable_ob_protocol_v2=True;Configure transaction routing for ODP.
obclient> SHOW PROXYCONFIG LIKE enable_transaction_internal_routing; obclient> ALTER PROXYCONFIG SET enable_transaction_internal_routing=True;
Common issues
When ODP implements intra-tenant routing, the following issues may occur:
Failed to get the table name for leader routing.
The SQL statement is too complex. Currently, ODP cannot identify all SQL statements.
The SQL statement is too long. The ODP buffer for SQL statements is 4 KB in size. An excessively long SQL statement may not be fully parsed.
Partition calculation failed for leader routing.
ODP does not support the calculation of multiple partitioning keys, such as range(c1,c2).
The SQL statement does not contain a partitioning key expression, or ODP failed to extract the partitioning key expression.
ODP cannot handle the partitioning key expression, such as c1=now(). ODP does not support the now() function.
Expired cached information is used.
ODP does not have an active refresh mechanism for cached information .
The OBServer does not return feedback for routing.
A configuration error occurred.
If the routing strategy is not set to
FOLLOWER_FIRST, weak data reading requests are sent to the leader in follower routing.The LDC routing information is not configured or the information is incorrectly configured, resulting in routing across IDCs or cities.