Intra-tenant routing|V4.1.0|OceanBase Database Proxy| docs|Distributed Database

The intra-tenant routing feature allows ODP to select an appropriate node to execute SQL statements after obtaining the server list of the tenant.

During intra-tenant routing, ODP functions like a common proxy such as HAProxy, and selects a server from the tenant server list to implement a one-to-one correspondence between the client connection and the server connection. This achieves simple implementation, but cannot meet performance and high availability requirements. Therefore, routing affects connection management.

Intra-tenant routing is the most complex part of the routing feature for the consideration of better performance and higher availability. This topic describes leader routing, follower routing, and tenant server routing, information caching, routing strategies, transaction routing, and common issues.

Leader routing

A distributed system usually adopts the multi-replica mechanism for disaster recovery and high availability. To ensure data consistency between replicas, the Paxos or Raft algorithm is often used. In practice, a special replica is present, which usually has the latest data and controls the synchronization of data among the replicas. This replica is called the leader, and other replicas are collectively called followers.

OceanBase Database has only one leader. Therefore, the leader routing strategy is to send SQL statements to this replica. In this example, the select c1 from t1 statement is used to describe two conditions that must be met for leader routing:

The SQL statement operates on (for example, queries, inserts into, updates, or deletes from) an entity table, such as the t1 table in the preceding example.
The request reads the latest data. In other words, the request is a strong read.

In ODP logs, the keyword for leader routing is ROUTE_TYPE_LEADER. To implement leader routing, ODP needs to know the ID and location of the partition to be accessed. In OceanBase Database, a table can have one or multiple partitions.

Single-partition table

The table has only one partition. In this case, ODP can obtain the replica location information from the table name.
Multiple-partition table

The table has multiple partitions. ODP needs to calculate the partition ID based on the table name and partitioning key in the SQL statement, and then obtain the replica location information. Replica information includes both the leader information and follower information. Leader routing uses only the leader information.

Multi-partition routing involves partitioning methods (such as Hash, Range, and List partitioning), partitioning key types (such as number and varchar), partitioning algorithms (such as Hash), and type conversion (if the value type in the SQL statement differs from the type of the partitioning key). Therefore, the implementation is relatively complex. For a subpartitioned table, routing involves the following 10 steps:

Parse the SQL statement to get the table name.
Access an internal table of OceanBase Database based on the table name and check whether the table is a partitioned table.
Parse the column expression, such as c1=1, in the SQL statement.
Access an internal table of OceanBase Database to obtain the information about the partitioned table.
Access an internal table of OceanBase Database to obtain the partition information.
Calculate the partition_id value of the partition based on the column expression.
Access an internal table of OceanBase Database to obtain the subpartition information.
Calculate the partition_id value of the subpartition based on the column expression.
Calculate the final partition_id value.
Access an internal table of OceanBase Database to get location information corresponding to the partition_id value.

Follower routing

Similar to leader routing, follower routing also must meet two conditions:

The SQL statement queries an entity table, such as the t1 table in select c1 from t1.
The request does require the latest data. In other words, the request is a weak read.

These two conditions are different from the conditions for leader routing.

For the first condition, follower routing supports only query statements and does not support other statements. This is also the implementation requirement of the Paxos algorithm.
For the second condition, the weak-read setting ob_read_consistency=weak must be explicitly specified by using a hint, in the session, or in another way.

For follower routing, the SQL statement can be sent either to the leader or a follower. Therefore, follower routing has more choices. For more information about the selection from multiple replicas during routing, see Routing strategies.

Read/Write splitting is an important feature related with follower routing. Read/Write splitting on requests can reduce the pressure on the leader. ODP also supports read/write splitting. We are constantly improving the feature and have helped customers solve performance issues in various scenarios, for example, OceanBase Cloud.

Tenant server routing

Sometimes, ODP cannot obtain the leader or follower. In this case, ODP can select a tenant server, which is tenant server routing. Typical scenarios of tenant server routing are as follows:

The SQL statement itself does not contain a table name, for example, select 1.
The server where the leader or follower is located is faulty.
Due to its limitations, ODP fails to obtain the table name from a complex SQL statement, or cannot take a replica route.

Through tenant server routing, ODP sends the SQL statement to a server where the tenant is located, thereby ensuring proper functioning. As with follower routing, tenant server routing also involves selection from multiple replicas, and therefore requires routing strategies.

Information caching

For leader routing, follower routing, and tenant server routing, ODP needs to query replica routing information from the sys tenant. To improve performance and reduce pressure on the sys tenant, ODP caches routing information.

Data freshness is crucial for cached information. Cached information of the sys tenant can be created and updated regularly by accessing internal tables. However, ODP does not adopt this mechanism to cache the information for replica routing. Otherwise, ODP will have to pull replica routing information for too many SQL statements, incurring great pressure on the sys tenant.

Number of SQL statements = Number of replicas × Number of ODPs.

OceanBase Database supports up to hundreds of thousands and even millions of partitions. Therefore, data freshness of cached information is a big challenge for ODP. If expired cached information is used, the route is inaccurate. The following describes how ODP ensures data freshness of cached information.

To view the cached information of a table, log on to the ODP with the root@proxysys account and run the show proxyroute command:

MySQL [(none)]> show proxyroute like 'ob1.hudson tt1 test sbtest1'\G
*************************** 1. row ***************************
        cluster_name: ob1.hudson
         tenant_name: tt1
       database_name: test
          table_name: sbtest1
               state: AVAIL
       partition_num: 1
         replica_num: 3
            table_id: 1101710651081698
     cluster_version: 2
      schema_version: 1649196335597728
         from_rslist: N
         create_time: 2022-04-07 12:41:16
     last_valid_time: 2022-04-07 12:41:16
    last_access_time: 2022-04-07 12:41:16
    last_update_time: 1970-01-01 08:00:00
         expire_time: 2022-04-12 12:48:42
relative_expire_time: 2022-04-07 12:40:41
         server addr: server[0]=xx.xx.xx.xx:xx,leader,FULL; server[1]=xx.xx.xx.xx:xx,follower,FULL; server[2]=xx.xx.xx.xx:xx,follower,FULL;

This example shows the important information in the cache, including the cluster name, tenant name, database name, table name, number of partitions, number of replicas, time information, address information, and cache status. The cache status information is important for the refresh mechanism, because ODP implements caching strategies by modifying the status information. The cached information can be in one of the following states:

BUILDING: The cached information is being created. ODP needs to wait until the cached information is created before using it.
AVAIL: The cached information is in a normal state and can be directly used.
DIRTY: The cached information is invalid and inaccurate.
UPDATING: The invalid cached information is being updated.
DELETED: The cached information has been deleted and cannot be used. It will be cleared later.

ODP refreshes the cache status by modifying the cache status, to ensure data freshness. The following describes the cache refresh mechanism in terms of creation, eviction, and refresh.

Cache creation: When ODP accesses a partition for the first time, it obtains the routing information by querying the __all_virtual_proxy_schema table of the sys tenant. The specified table name is the actual table name. The table cache is different from the tenant routing information. The cache status changes to AVAIL after the cache is created.
Cache eviction: ODP changes the cache status to DIRTY when the OBServer returns the routing inaccuracy error, which is indicated in the is_partition_hit field carried in the OK packet.
Cache refresh: When the status of the cached information changes to DIRTY, the expired cached information is evicted and the cache is re-created or the cached information is updated.

Note

At present, cache eviction is implemented based on the feedback packet of the OBServer. Therefore, the cache status is not perceived in real time. The cached information can only be refreshed upon an incorrect route feedback. This may cause issues.

Transaction routing

The preceding section describes the routing strategy for a single SQL statement. Some features, such as the transaction feature, involve one or more SQL statements. In this case, transaction routing is required.

OceanBase Database Proxy (ODP) supports two routing modes for transactions. In one mode, ODP routes the statements in a transaction to the same OBServer node for execution. In the other mode, ODP routes the statements in a transaction to different OBServer nodes and synchronizes the transaction status on these nodes to implement the distributed execution of the transaction.

Note

The OceanBase 2.0 protocol is used to synchronize the transaction status.

Select nodes for transaction routing

Two roles are involved in transaction routing: coordinator and participant. The coordinator node is the node that starts the transaction. It runs non-DML statements that affect the transaction status, such as BEGIN, START TRANSACTION, COMMIT, and ROLLBACK. The participant nodes run DML statements that do not affect the transaction status.

ODP comprises a simple SQL parser that parses whether an SQL statement is a DML statement. For a DML statement, ODP routes the statement to proper nodes based on table routing or logical data center (LDC)-based routing. The routed-to nodes are participant nodes of the transaction. The coordinator node runs the first statement in the transaction.

Note

For more information about LDC-based routing, see Routing strategies.

Configure transaction routing

Transaction routing is enabled by default. You can run the following statements to query whether transaction routing is enabled and configure transaction routing.

Log on to the sys tenant as the root user from a client.

Check that the OceanBase 2.0 protocol is enabled.

obclient> SHOW PROXYCONFIG LIKE enable_ob_protocol_v2;
obclient> ALTER PROXYCONFIG SET enable_ob_protocol_v2=True;

Configure transaction routing for ODP.

obclient> SHOW PROXYCONFIG LIKE enable_transaction_internal_routing;
obclient> ALTER PROXYCONFIG SET enable_transaction_internal_routing=True;

Common issues

When ODP implements intra-tenant routing, the following issues may occur:

Failed to get the table name for leader routing.
- The SQL statement is too complex. Currently, ODP cannot identify all SQL statements.
- The SQL statement is too long. The ODP buffer for SQL statements is 4 KB in size. An excessively long SQL statement may not be fully parsed.
Partition calculation failed for leader routing.
- ODP does not support the calculation of multiple partitioning keys, such as range(c1,c2).
- The SQL statement does not contain a partitioning key expression, or ODP failed to extract the partitioning key expression.
- ODP cannot handle the partitioning key expression, such as c1=now(). ODP does not support the now() function.
Expired cached information is used.
- ODP does not have an active refresh mechanism for cached information .
- The OBServer does not return feedback for routing.
A configuration error occurred.
- If the routing strategy is not set to FOLLOWER_FIRST, weak data reading requests are sent to the leader in follower routing.
- The LDC routing information is not configured or the information is incorrectly configured, resulting in routing across IDCs or cities.