Blog编组 28
Introduction to OBProxy: Modules and Features
右侧logo

oceanbase database

Photo by Jordan Harrison on Unsplash

OceanBase Database Proxy (OBProxy) is a proxy service used dedicatedly for OceanBase Database. It keeps users from the complex backend of distributed OBServer clusters, allowing users to access distributed databases as easily as they access standalone ones. The SQL statements of a user are first sent to ODP nodes, which select suitable OBServers and forward the statements to the selected OBServers, and then return the results to the user.

In this article, we will share the details about the modules and features of OBProxy to give you a closer look at what it is, what it does, and how to use it.

The Architecture of OBProxy

oceanbase database

OBProxy Architecture

In the figure above, you can see a business application on top of three OBProxies. In a real deployment, a load balancer, such as an F5 load balancer, is usually deployed between the application and OBProxies to distribute statements among these OBProxies, which then forward statements to OBServers. You can see that the OceanBase cluster shown in the figure consists of six OBServers. With knowledge of data distribution on OBServers, OBProxies can forward SQL statements to the exact OBServers where the requested data is located. This improves the execution efficiency compared with forwarding the SQL statements to nodes without the requested data. For example, in the figure above, the data of tables t1, t2, and t3 is stored in partitions P1, P2, and P3 respectively; the leader replicas are colored in red, and the follower replicas are in blue. If the user sends an “INSERT INTO t1” statement, OBProxy can forward the statement directly to the OBServer that hosts the leader replica P1 in IDC2.

Why should OBProxy send an SQL statement to the node where the requested data is located?

This is because by doing so, the execution plan of that SQL statement can be executed locally without remote procedure calls (RPCs), which improves the performance. In a production environment, OBProxy also considers the geographic locations of OBServers to avoid forwarding requests across IDCs or cities. OBProxy supports many routing strategies. We will introduce them in future articles.

How to use OBProxy?

You can use the database service once you complete the deployment of the OceanBase cluster (with OBProxies in it).

Assume that you need to access a database by using JDBC.

final String URL = "jdbc:mysql://127.0.0.1:2883/test?useSSL=false&useServerPrepStmts=true";

When you create a connection, you must first initialize the connection information. The sample URL above contains information such as the IP address, port number, name, and connection attributes of the database to access. For database access by using OBProxy and directly connecting to an OBServer, only the IP addresses and port numbers used are different. Other information is the same. OBProxy is completely transparent to the user when used later.

With OBProxy, you do not need to pay attention to the distributed database architecture, which makes things easy. To be more specific, OBProxy offers you the following benefits:

1. OBProxy is compatible with the MySQL protocol. You can use standard MySQL drivers.

2. When switching from a MySQL database to an OceanBase database, you do not need to modify the code for database access.

OBProxy keeps you from the complex backend events in your distributed system, such as server switchover, server crash, allocation of tenant resource units, and daily major compaction, ensuring stable connections with clients.

OBProxy Modules and Features

Now, let’s take a closer look at the modules of OBProxy to help you understand its features from a systemic point of view. We classify the features of OBProxy into three layers, as shown in the figure below.

oceanbase database

OBProxy Modules and Features

The foundation layer

The foundation layer supports the upper layer by implementing network communication, a basic framework such as a thread management framework, and the basic tool library.

The network communication library supports the TCP protocol, SSL protocol, and RDMA communication, and encapsulates easy-to-use APIs for the upper layer. The asynchronous thread framework creates and manages threads, and schedules tasks. The basic tool library encapsulates essential capabilities to provide useful APIs for coding.

The business layer

As the most complex layer of OBProxy, the business layer provides basic capabilities related to the database service.

The business layer supports many protocols, such as the MySQL protocol, Oracle protocol, and proprietary OceanBase protocol. This makes OceanBase products highly compatible with other systems and facilitates the development of more powerful features.

The connection management feature handles connections between the client and server and provides advanced capabilities such as connection timeout and troubleshooting.

The SQL parsing feature allows OBProxy to identify SQL semantics and extract key routing information such as table names and partitioning keys from SQL statements.

The data routing feature allows OBProxy to forward a request to the OBServer which ensures the highest execution efficiency. Accurate routing plays a key role in achieving high performance.

The disaster recovery and high availability feature allow OBProxy to detect faulty OBServers in time and connect to another OBServer once a faulty OBServer was selected.

The transaction status management feature allows OBProxy to manage the status of transactions in a connection. The transaction status affects the routing and forwarding performance of OBProxy.

The product layer

As OBProxy matures, we have encapsulated some of its capabilities into products in proxy mode and SDK mode to provide services for external systems. Sharding is also supported by OBProxy based on the LDC architecture of Ant Group. We are also working on more useful features to make the product better.

How OBProxy Works

Now let’s talk about how OBProxy processes an SQL statement.

oceanbase database

How OBProxy Processes an SQL Statement

Below is the major steps for OBProxy to deal with an SQL statement:

1. Establishes a TCP connection to a client and processes socket read/write events by using epoll, which is implemented in the network communication library.

2. Reads the byte stream from the TCP connection, saves it to the buffer, and parses a message based on the MySQL protocol. OBProxy parses the header first and then decides whether to parse the rest of the message.

3. Reads the SQL statement from the message and parses the statement.

4. Finds the table data distribution information by using the table name and the table partition information stored in the location cache, and select the OBServer that hosts the requested data.

5. Finds the connection to the target OBServer from the database connection pool, and perform the disaster recovery management check on the target OBServer.

6. Uses the selected connection to interact with the target OBServer based on the high-performance forwarding framework.

7. Processes the data received from the OBServer at the protocol layer and returns the data to the client.

This process does not describe disaster recovery operations in case of an exception. However, you can refer to the workflow shown in the figure above. OBProxy also processes many important background tasks, which will be introduced later in an article.

Key OBProxy features

Hopefully, by now you are familiar with the modules and features of OBProxy and what it does for the execution of an SQL statement.

Now, let’s highlight the key features of OBProxy below:

  1. High-performance routing

OBProxy is a key component in the data access procedure. We have adopted a multi-thread asynchronous framework for OBProxy and designed it to transparently route requests in a streamlined manner. At the same time, we have significantly optimized the code of critical paths and ensured that OBProxy consumes only minimal server resources.

2. Protocol support

OBProxy supports a variety of protocols, such as the MySQL protocol, Oracle protocol, and proprietary OceanBase protocol. We are working to enhance the OceanBase protocol with the expectation to provide more powerful features.

3. Connection management

OBProxy keeps users from backend issues and keeps stable connections with the client. This is very important due to the direct benefit that users are not distracted by the report of connection errors.

4. Data routing

As a factor in database performance and high availability, data routing is closely related to the deployment architecture and data distribution, and greatly affects the execution of SQL statements. Correct routing is a great concern in the database industry.

5. Sharding

The sharding capability plays a key role in existing financial cloud solutions. OBProxy developed in C shows better sharding performance.

What’s Next?

OBProxy, although originating initially as an internal product in Ant Group, has served more customers outside of Ant Group. During this process, OBProxy faces a lot of challenges such as incompatibility in certain scenarios. However, the product is gradually maturing after several rounds of iteration. In view of the customer needs and pending issues, we will focus on the following priorities:

  1. Basic capabilities.

We will adapt OBProxy to the database kernel and new features based on customer needs and OBServer feature iteration, with the aim to constantly enhance the stability and performance of the OBProxy kernel.

2. Platform adaption.

We will enhance the compatibility of OBProxy with more platforms, such as Kubernetes, Docker, cloud platforms, and ARM-based servers, and improve user experience by supporting platform capabilities based on in-depth platform adaption.

3. Ecosystem integration.

On the one hand, we will adapt OBProxy to existing open-source projects. For example, provide monitoring data for SkyWalking. On the other hand, we will allow open-source projects to interconnect with OceanBase Database. In this way, OceanBase Database will be applied in a better way in the open-source community.

4. Productization.

We will provide customers with full-fledged solutions based on the characteristics of OBProxy, and optimize the documentation to facilitate the usage of OBProxy.

5. Driver.

We will adapt OBProxy to more drivers in other languages, and provide the product in proxy mode and SDK mode for a better user experience.

Facing plenty of opportunities and challenges in the future, we will keep upgrading OBProxy together with the kernel of the OceanBase Database to provide user-friendly products, documents, and services.

Hope this article helps you better understand OBProxy. You may refer to its documentation if you want to know more. There will be more articles explaining OBProxy in detail. Stay tuned!


ICON_SHARE
ICON_SHARE