Scenarios
In database applications, when the same SQL statements are frequently executed with different parameters, batch execution can be achieved through optimization strategies. This process requires collaboration between the client and the OBServer. On the client side, JDBC parameters can be adjusted to send multiple SQL statements to the OBServer at once. On the server side, the OBServer rewrites these batched SQL statements to generate corresponding batch execution plans. While optimizing one side can provide some performance improvements, optimizing both the client and the OBServer can yield even better results, significantly enhancing system performance and response speed.
This optimization method significantly improves performance in the following ways:
The JDBC driver can send multiple SQL statements at once, reducing network request overhead.
The OBServer rewrites SQL statements containing multiple queries to generate batch execution plans, reducing the overhead of parameterization, plan cache lookup, and building and tearing down SQL execution contexts, from multiple instances to just one.
Batch execution plans allow the OBServer storage engine to optimize batch operations, such as batch writing data to a memtable and batch checking row locks and primary key conflicts.
This topic explains the relevant concepts, configuration methods for different usage scenarios, and how to verify the practical application of batch execution optimization.
Audience
- Solution Architects
- Database Administrators
Key concepts
Multi Queries
Multi Queries is a protocol for sending multiple SQL statements in one batch through JDBC in MySQL mode. Multiple SQL statements are concatenated with semicolons (";"). This reduces the number of communication rounds between the client and the server, and improves performance to some extent. Although you can concatenate multiple SQL statements, such as the following example, the OBServer node executes each SQL statement individually:
-- Although multi queries is used, these statements are not executed in batches because they operate on different tables.
update t1 set b=2 where a=1;
update t2 set b=3 where a=2;
delete from t1 where a=3;
The following example shows how to execute multiple statements in one batch:
-- Multiple update statements on the same table
update t1 set b=1 where a=1;
update t1 set b=2 where a=2;
update t1 set b=3 where a=3;
-- Multiple insert statements on the same table
insert into t1 select * from t2 where t2.c1 = 1;
insert into t1 select * from t2 where t2.c1 = 2;
-- Multiple delete statements
delete from t1 where a = 1;
delete from t1 where a = 2;
delete from t1 where a = 3;
If you use a prepared statement (PreparedStatement, PS), the syntax is as follows:
prepare_stmt; // defines the SQL statement template
addBatch(); // binds parameters
...
addBatch();
executeBatch(); // executes all the batch parameters
Array Binding
Array binding is a protocol for sending multiple SQL statements in one batch through JDBC in Oracle mode. It is used only in the PS protocol. The JDBC driver splits SQL statements and parameters, groups parameters of the same column type into arrays, and sends the arrays to the OBServer node. The usage is similar to that in the PS protocol in MySQL mode. This section does not elaborate on the details.
Multi Values Insert
You can use the insert into t1 values(x), (y), (z)...; statement to insert multiple records. This statement itself supports batch execution and therefore offers better performance.
How to configure
Configure OBServer
The configuration of OBServer is the same in both MySQL mode and Oracle mode. The following configurations are related to batch execution optimization:
ALTER SYSTEM SET ob_enable_batched_multi_statement = true;SET GLOBAL _nlj_batching_enabled = true;
ob_enable_batched_multi_statement is a tenant-level configuration item. After you set ob_enable_batched_multi_statement, OBServer can generate execution plans for batch SQL statements sent through multi-queries or array binding.
_nlj_batching_enabled is a tenant variable. After you set _nlj_batching_enabled, the Nested Loop Join optimization for group rescan is enabled. Although this optimization is not directly related to SQL batch execution, it is often used in the generated batch execution plans. Therefore, we recommend that you enable it.
You must execute these two configurations under the tenant.
Configure JDBC
When an application connects to a database using a database URL, you can set the corresponding JDBC parameters, for example:
conn=jdbc:oceanbase://xxx.xxx.xxx.xxx:2883/?rewriteBatchedStatements=TRUE&allowMultiQueries=TRUE&useLocalSessionState=TRUE&useUnicode=TRUE&characterEncoding=utf-8&socketTimeout=3000000&connectTimeout=60000&user=xxx@tenant&password=****
OceanBase Database determines the running mode (MySQL or Oracle) based on the tenant name when the JDBC driver is connected.
Note
When you use OceanBase Cloud, even if you configure the same JDBC parameters, the connection string may vary slightly. For JDBC connection configuration examples in a cloud environment, see Obtain connection parameters for OceanBase Cloud and Obtain connection parameters for Alibaba Cloud.
Make sure to adjust the connection information based on the specific details provided by your cloud service to ensure a successful connection to OceanBase Database.
Configure parameters for MySQL mode
In MySQL mode, the following parameters are recommended:
conn=jdbc:oceanbase://xxx.xxx.xxx.xxx:2883/?allowMultiQueries=TRUE&rewriteBatchedStatements=TRUE&useLocalSessionState=TRUE&useUnicode=TRUE&characterEncoding=utf-8&socketTimeout=3000000&connectTimeout=60000&user=xxx@tenant&password=****
- allowMultiQueries: By default, this parameter is set to
FALSE. It must be set toTRUEto allow multiple statements connected by semicolons. This setting enables batch optimization for scenarios using the text protocol. - rewriteBatchedStatements: By default, this parameter is set to
FALSE. If using the PS protocol, it must be set toTRUEto rewrite multiple statements into a semicolon-separated format during the execution ofexecuteBatch().
In MySQL mode, it is recommended to set both parameters to TRUE to enable batch optimization for both text and PS protocols.
Special note: After enabling rewriteBatchedStatements, the JDBC driver will rewrite multiple multi-query insert statements (whether using the text protocol or PS protocol) into a single Multi Values Insert statement.
Note
If unexpected errors occur after enabling allowMultiQueries and rewriteBatchedStatements, you can try disabling these settings and contact OceanBase Technical Support to verify the cause.
Configure parameters for Oracle mode
In Oracle mode, the following parameters are recommended:
conn=jdbc:oceanbase://xxx.xxx.xxx.xxx:2883/?useArrayBinding=TRUE&useServerPrepStmts=TRUE&user=xxx@tenant&password=****
- useArrayBinding: By default, this parameter is set to
FALSE. It must be set toTRUE, and when this setting is enabled, auto-commit must be disabled (autoCommit set to FALSE) to enable the array binding protocol, which only applies to the PS protocol. WhenexecuteBatch()is called, parameters are sent to the OBServer as an array. - useServerPrepStmts: By default, this parameter is set to
FALSE. Setting this parameter toTRUEenables PrepareStatement on the server side. The array binding protocol only takes effect when this parameter is set.
In Oracle mode, it is recommended to set both parameters to TRUE.
Note
The rewriteBatchedStatements and useServerPrepStmts parameters are mutually exclusive and can only be set to one of the two values. Therefore, they cannot be set simultaneously.
How do I check whether batch optimization is enabled?
In addition to correctly configuring the OBServer and JDBC settings, the SQL statements must meet the following requirements:
In Multi Queries, each SQL statement must be highly consistent (homogeneous) to effectively utilize batch optimization and ensure high performance. Additionally, the types of constant parameters must remain consistent to avoid type conversion, which helps in generating the optimal execution plan and improving query execution efficiency.
Explicit transactions must be used, which can be achieved by setting
autocommit=0or by using theBEGINstatement to start a transaction. After batch optimization, all statements are executed as a single unit. If any one statement fails, the entire transaction is rolled back. This behavior is different from regular multi queries.Some complex SQL statements cannot be rewritten for batch execution. For example, statements that use hash joins or contain
case...whenclauses cannot be converted. These cases are not exhaustive. Therefore, even if all other optimization conditions are met, if batch execution is not enabled on the OBServer, it is normal.
To check whether batch optimization is enabled on the OBServer, you can use the following methods:
If you understand the rewriting principles of batch statements, you can directly view the execution plan. Typically, the execution plan for batch statements uses the NLJ operator to connect a virtual view composed of parameter groups and the original join order.
Use the
trace_idto find the correspondingsql_idin thegv$ob_sql_auditview, and then use thissql_idto query the batch optimization information:-- Use the trace_id to find the corresponding sql_id SELECT sql_id FROM gv$ob_sql_audit WHERE trace_id = 'your_trace_id'; -- Use the found sql_id to query the batch optimization information SELECT is_batched_multi_stmt FROM gv$OB_PLAN_CACHE_PLAN_STAT WHERE SQL_ID='xxxx';If the output is 1, batch optimization is enabled.
You can also directly search for the execution statement in the
gv$ob_plan_cache_statview, for example:SELECT is_batched_multi_stmt FROM gv$OB_PLAN_CACHE_PLAN_STAT WHERE statement LIKE 'update t1 set c1=%';