V4.2.5_CE
Version information
- Release date: October 21, 2024
- Current version: V4.2.5_CE
- RPM version: oceanbase-ce-4.2.5.1-101000092024120918
Overview
OceanBase Database V4.2.5_CE is a new Long-Term Support (LTS) version designed for transaction processing (TP) services. Building on the foundation of V4.2.4_CE, this version introduces several enhancements. Notably, it supports baseline-prioritized SQL plan management (SPM) evolution and partitioned table-based late materialization. Additionally, improvements to the cardinality estimation system, common table expression (CTE) extraction, and inlining cost verification significantly enhance the optimizer's capabilities. To further boost system stability and reliability, the new version includes features such as consumption logs, forcible stopping of archiving jobs, active transaction transfer, and adaptive arbitration upgrades or downgrades based on I/O load.
In terms of compatibility, V4.2.5_CE introduces features like lock functions, invalid dates, and eXtended Architecture (XA) transactions, making it easier to migrate business operations from MySQL databases. The new version also optimizes table-level restore performance and upgrade processes, significantly reducing the time required for these operations. Furthermore, it enhances resource isolation by supporting isolation for statistics collection, clog commit, and DDL operations. Memory management has also been improved, with the ability to limit memory usage at the SQL statement level and optimize memory usage for stored procedures, ensuring greater stability across various scenarios.
Ease of use is another key focus of this release. The Active Session History (ASH) diagnostic module now provides row lock wait and retry wait events, response time histograms, and log transmission link views, greatly improving system observability. Additional enhancements include optimized parallel log replica migration, real-time application of unit configuration changes, and the introduction of the ./alert/alert.log file, which records logs of interest to database administrators (DBAs), further streamlining usability. The new version also expands the multi-model ecosystem of OceanBase Database. It introduces support for the OBKV-Redis model and optimizes expired data deletion in the OB-HBase model, addressing issues related to excessive data versions in hotkey scenarios. Moreover, it adds the ColumnPaginationFilter filter and reverse scan functionality to enhance compatibility with Apache HBase.
Key features
Kernel enhancements
Global indexes bound with the same table group as the primary table when the SHARDING attribute of the table group is set to NONE
In earlier versions, global indexes are not bound with table groups. The primary table within a table group and its global indexes may be located on different OBServer nodes. Since V4.2.5_CE, if the primary table is bound with a table group for which the
SHARDINGattribute is set toNONE, the database requires that the global indexes have the same data distribution as the primary table before you perform DDL operations related to the global indexes, such asCREATE TABLE,CREATE TABLE LIKE,CREATE INDEX,ALTER TABLE ADD INDEX,TRUNCATE TABLE, andTRUNCATE PARTITION. If you execute theALTER TABLEstatement to bind the primary table with a table group for which theSHARDINGattribute is set toNONE, the global indexes are bound with the primary table for table group alignment.Adaptive join
In earlier versions, the optimizer selects the optimal join method for a query with a join during the plan generation phase based on statistics. If the statistics are inaccurate or outdated, the join method selected by the optimizer is not the optimal one. For example, if the optimizer estimates that the table contains a few rows based on statistics, it may select nested-loop join (NLJ). However, if the table actually contains much more rows than estimated, hash join may be the optimal choice for higher performance. To resolve this issue, the new version introduces the adaptive join feature, which allows the database to select the most appropriate join method dynamically rather than merely depending on statistics.
Improvement on the cardinality estimation system of the optimizer
Cost estimation by the optimizer relies on accurate cardinality estimation for each operator. Accurate cardinality estimation relies on an appropriate estimation strategy and accurate statistics. OceanBase Database V4.2.2 has restructured the base table-based and selectivity-based cardinality estimation methods to improve the cost estimation accuracy of the optimizer. OceanBase Database V4.2.4_CE makes improvements for some complex scenarios. OceanBase Database V4.2.5_CE further enhances the cardinality estimation system in the following aspects to improve its accuracy in some scenarios:
- A lower bound is specified for the selectivity of complex joins that contain multiple types of join conditions and loop joins that involve multiple tables. This requires that when an inner or outer join contains complex join conditions, each row of the small table be joined with at least one row of the large table. That is, the selectivity is not lower than 1/N, where N is the number of rows in the large table. This resolves the issue that the number of rows estimated for a loop join is too small.
- The number of rows joined after a filter is corrected to improve the accuracy in estimating the number of rows to be joined.
- For an anti-join that contains predicates, if the predicates on one table are highly selective, cardinality estimation uses base containment assumptions instead of simple containment assumptions for improved accuracy.
Enhanced DBLink capabilities: domain name resolution and remote binary data writing
OceanBase Database allows you to use a DBLink to access a remote database. In earlier versions, you need to specify the IP address of the remote database when you create the DBLink. In V4.2.5_CE, you can configure a domain name and write binary data to the remote database by using the DBLink.
Support for using a hint to specify to use hash join for a CONNECT BY hierarchical query
You can select NLJ or hash join for a CONNECT BY hierarchical query. In earlier versions, a join method is selected based on rules. Hash join is selected only when no index is available. However, in some scenarios, hash join is a better choice even if an index is available. In V4.2.5_CE, you can use the
USE_HASHhint to specify to use the hash join method for a hierarchical query. You can also bind an outline for the hierarchical query.Serverless-oriented optimization
OceanBase Database V4.2.5_CE is optimized in the following aspects to better adapt to the serverless architecture:
- The meta tenant and user tenant can share the memory. This resolves the issue where the meta tenant of a large-sized user tenant has insufficient memory space whereas the meta tenant of a small-sized user tenant has low memory usage.
- Threads in the commonly used thread pool ObSimpleThreadPool can be dynamically adjusted. In earlier versions, the number of threads in a thread pool is statically specified when the thread pool is created. In V4.2.5_CE, the number of threads in a thread pool can be dynamically increased or decreased based on the request queue length. This dynamic resizing is supported by some background thread pools.
- The local memory for threads is reduced. The local memory for threads is reduced by removing unnecessary local variables of threads, decreasing the memory usage by local variables, and minimizing the lifecycle of local variables.
Distributed late materialization
In a single-table query, the database usually uses the index table to identify matching rows and then queries row data from the primary table, preventing a full-table scan. Each access to the primary table is a random I/O query. If a large amount of data is involved in table access during the execution of an SQL statement, a significant amount of performance resources is consumed. Take the statement
select * from t1 where c1 > 1 order by c2 limit 10as an example. If a composite index onc1andc2exists, the generated execution plan typically filters data that meets thec1 > 1condition in the index table, then accesses thet1table to retrieve data, and finally selects the top 10 data records. If a large amount of data is involved in table access, the performance may deteriorate. However, if you modify the execution plan to first filter data based on the index table, then sort the data and select the top 10 data records, and finally query the primary table for 10 times, the amount of performance resources consumed by random I/Os is significantly reduced. This optimization strategy is called late materialization. Simply put, late materialization is a strategy where a single-table query sorts the filtered data and limits the number of data records returned before retrieving data from the primary table, thus reducing performance resource consumption by random I/Os.OceanBase Database supports late materialization for non-partitioned tables in earlier versions, and supports late materialization for partitioned tables in V4.2.5_CE.
Index selection optimization
In V4.2.5_CE, when statistics expire, only rule-based optimization is retained and dynamic sampling is enabled to address the issue of index deviation. Global statistics are used to prevent index deviation caused by the absence of histograms after partition pruning when data skew occurs in indexed columns. New cost coefficients are used to calculate the projection overhead and filtering overhead of large object (LOB) columns to improve the index selection accuracy.
Improvements in cost-based rewrite deviation and time consumption
Queries can be rewritten based on rules or costs. Unlike rule-based rewrite, cost-based rewrite cannot ensure that the rewritten query can achieve the optimal performance in all scenarios. Generally, the performance of a query is subject to data distribution and whether an appropriate index is available. Therefore, cost-based rewrite requires cost verification. Cost verification remains incomplete and incurs two issues: rewrite deviation and high time consumption in hard parsing. OceanBase Database V4.2.5_CE reduces the time consumed in a single cost verification and limits the number of retries allowed in cost-based rewrite to reduce the time consumption of cost-based rewrite.
SQL query interface enhancements
OceanBase Database V4.2.5_CE implements access isolation for read-only replicas. This feature defines the routing rule for internal data requests. It is prohibited to forward requests destined for a read-only replica to a non-read-only replica, thereby implementing resource isolation between TP and analytical processing (AP) services and ensuring long-term stable operation in hybrid transaction/analytical processing (HTAP) scenarios. This rule does not apply to requests destined for non-read-only replicas. The metadata structure in the result set is modified. OBServer nodes obtain server metadata based on the tenant mode. Based on the server metadata obtained, the Java Database Connectivity (JDBC) driver derives the metadata to serve metadata access requests of users.
Improvements in CTE extraction and inlining cost verification
CTE extraction and inlining are two choices for the common calculation module. CTE extraction materializes the calculation results of CTEs by using the WITH clause. The calculation results can then be directly referenced. CTE inlining does not materialize the calculation results of CTEs. The result of each CTE is calculated in the context where it is used. OceanBase Database V4.2.5_CE abstracts a common inlining and materialization cost verification module to enhance cost verification for CTE inlining, improving the accuracy of local cost verification.
Enhanced query range extraction capabilities
OceanBase Database V4.2.5_CE enhances the query range extraction capabilities. It supports range extraction for spatial columns, implicit conversion, and quick path extraction in NLJ rescan scenarios. In addition, the query range pruning feature is associated with the cost model. The database determines the query range and the end column of the query range based on costs.
Enhancement in lock contention management
The new version enhances lock contention management for local executions by removing reliance on the wakeup mechanism and eliminating unnecessary wait times. It also introduces improved lock contention management for remote executions. By combining proactive detection at the source with wakeup notifications at the destination, the new version avoids aimless retries during remote lock contention, thereby reducing CPU and network overhead. For hotspot rows with multiple transactions requesting locks, the database prioritizes wakeups based on the order of the requests. This ensures older requests are not blocked by newer ones, prevents CPU jitter caused by simultaneous retry attempts from multiple requests, and minimizes wakeup delays in exceptional cases.
Log stream replica migration optimization
In a disaster recovery scenario in earlier versions, unit migration from one OBServer node to another triggers replica migration. If the amount of data is large, the migration may take a long time. If you migrate the units back to the source OBServer node, you can migrate the corresponding replica back to the source OBServer node only after the replica migration to the destination OBServer node is completed. These meaningless migrations compromise the replica migration efficiency. This scenario is optimized in V4.2.5_CE. If the database detects any unneeded replica migration tasks during log stream replica migration, it automatically cancels the tasks to improve the replica migration efficiency.
Forcible stopping of log archiving jobs
A log archiving job writes corresponding metadata in the archive directory of the archive media when each round begins and ends. When you stop an archiving job in earlier versions, the archiving job advances to the STOP state only after the archiving stopping metadata is written. If the archive media cannot be accessed, metadata cannot be written to the archive directory. As a result, the archiving job cannot be stopped and remains in the STOPPING state. To address this issue, OceanBase Database V4.2.5_CE provides the forcible archiving job stopping feature. An archiving job can advance to the STOP state without the need to forcibly write archiving stopping metadata. When archive logs are consumed, the system cannot determine whether the forcibly stopped archiving round has been completed, which affects archive log consumption by downstream systems.
Globally unique client session ID
Prior to OceanBase Database V4.2.5_CE and OceanBase Database Proxy (ODP) V4.2.3 BP1, when the client executes
SHOW PROCESSLISTthrough ODP, the client session ID in ODP is returned. However, when the client queries the session ID by using an expression such asCONNECTION_IDor from a system view, the session ID on the server is returned. A client session ID corresponds to multiple server session IDs. This causes confusion in session information queries and makes user session management difficult. The uniqueness of the client session ID cannot be ensured across ODPs. Therefore, killing a session across ODPs may fail or such a session may be mistakenly killed. In the new version, the client session ID generation and maintenance process is reconstructed. If the version of OceanBase Database is not earlier than V4.2.5_CE and the version of ODP is not earlier than V4.2.3 BP1, the session IDs returned by various channels, such as theSHOW PROCESSLISTcommand, theinformation_schema.PROCESSLISTandGV$OB_PROCESSLISTviews, and theconnection_id,userenv('sid'),userenv('sessionid'),sys_context('userenv','sid'), andsys_context('userenv','sessionid')expressions, are all client session IDs. You can specify a client session ID in the SQL or PL command KILL to terminate the corresponding session. If the preceding version requirements for OceanBase Database and ODP are not met, the handling method in earlier versions is used.Enhanced CDC
OceanBase Database V4.2.5_CE enhances the Change Data Capture (CDC) feature in the following aspects:
The beforeimage output of LOB data stored in OUTROW mode is improved to resolve the issue where the beforeimages of LOB columns in updated rows are incomplete in the following scenarios: a DELETE statement is executed, an UPDATE statement is executed on the primary key, an UPDATE statement is executed to change the storage mode of a LOB column from OUTROW mode to INROW mode, and an UPDATE statement is executed on a non-LOB column.
The storage structure of transaction logs in CDC is optimized and the log pullback framework is restructured. This optimization improves the efficiency of OBServer nodes providing logs to CDC and allows for faster retrieval of missing transaction logs when CDC is started during a transaction, thereby increasing CDC startup efficiency during a transaction.
Newly generated data can be synchronized to a table created through a table-level restore.
Synchronization of data in virtual generated columns is supported to meet the data consumption requirements of downstream systems.
Support for data synchronization from the standby tenant in CDC
Standby database is an important part in the high-availability system of OceanBase Database. OceanBase Database supports primary/standby tenants since V4.1. A standby tenant serves as a backup of a production tenant, namely, the primary tenant. When the primary tenant fails to provide services, the standby tenant can switch to the PRIMARY role to provide services. This shortens the unavailability time and improves the error tolerance capability of the database system. CDC formats and delivers incremental transaction data in OceanBase Database in units of transactions to downstream components for consumption. Synchronization of data from the standby tenant is supported for the following purposes:
- Consume data in the local database: If an application that needs to consume CDC data is located in a different region from the primary tenant, it preferentially consumes incremental transaction data from the local standby tenant.
- Consume data in the standby tenant after a switchover. After a switchover, CDC can switch to the consumption link of the standby tenant to avoid interrupting the data link.
- Reduce the impact on the primary tenant and ensure business stability in the primary tenant.
Baseline-prioritized SPM evolution
In earlier versions, SPM evolution proceeds online. When a new plan is generated, it immediately evolves based on the baseline plans. If the SPM module verifies that the new plan is superior to the baseline plans based on the average CPU time required for plan execution, the new plan is used and marked as a baseline plan. Online evolution can address most execution plan deviation issues. However, bad cases also occur. For example, if the data amount is small when a new plan is generated, the SPM module misjudges that the new plan outperforms existing plans. When the data amount surges, the new plan is no longer suitable. OceanBase Database V4.2.5_CE provides an SPM evolution mode that prioritizes a baseline plan. In this mode, the optimizer always uses a baseline plan unless no reproducible baseline plan is available. In this case, all baseline plans are considered to be in the FIXED state.
Compatibility with MySQL
Lock functions
OceanBase Database V4.2.5_CE supports the following MySQL lock functions:
GET_LOCK,IS_FREE_LOCK,IS_USED_LOCK,RELEASE_ALL_LOCKS, andRELEASE_LOCK. You can use these functions in different parts of SQL statements, such as in the ORDER BY and HAVING clauses of a SELECT statement, in the WHERE condition of a SELECT, DELETE, or UPDATE statement, or in a SET statement. The function expression can be a literal, column value, NULL, variable, built-in function or operator, loadable function, or stored function. Lock functions are a type of built-in functions that allow you to define and use locks.Table locking
OceanBase Database V4.2.5_CE supports the
LOCK TABLEandUNLOCK TABLEsyntaxes. After a table is locked, other locking, writing, and DDL operations on the table are blocked. To lock a table in a tenant, you must have theLOCK TABLEprivilege. The tenant-level parameterENABLE_LOCK_PRIORITYis provided to specify to enable table lock priorities. After table lock priorities are enabled, theRENAME TABLEstatement has the highest priority in obtaining a table lock. When aRENAME TABLEstatement is waiting for a table lock together with another DDL operation, a locking operation, or a DML operation, theRENAME TABLEstatement is the first to obtain a table lock. At present, the support for the table locking feature of MySQL is limited.Restructured ENUM and SET data types
The ENUM and SET data types are generally used to store strings of a limited number of types, to save the storage space and improve data processing efficiency. ENUM and SET data is stored as numeric values and explained as character arrays. The new version optimizes the metadata storage structure of the ENUM and SET data types. It saves information about complex character arrays in a subschema context to reduce the attention of developers for ENUM and SET metadata transfer. In addition, the general type conversion logic makes the derivation of the ENUM and SET data types simpler and more efficient.
Compatibility with invalid date types of MySQL
OceanBase Database V4.2.5_CE supports invalid date types. The
sql_modesystem variable specifies whether to allow the storage of invalid date types. For example, you can add theALLOW_INVALID_DATESkeyword to store a date that does not exist, such as 2024-2-30, or theNO_ZERO_IN_DATEkeyword to store such a date as 2022-01-00. The_enable_mysql_compatible_datesparameter is provided to specify whether to support invalid date types of MySQL.Quick insertion of columns in the middle of a table
The DDL operation for inserting columns in the middle of a table is changed from an offline operation in earlier versions to an online operation in V4.2.5_CE to improve the operation performance. At present, you can use ADD COLUMN in combination with MODIFY COLUMN or use ADD COLUMN alone to insert columns in the middle of a table online.
XA transaction support
OceanBase Database V4.2.5_CE supports syntaxes related to XA transactions, namely, XID, XA_START, XA_END, XA_PREPARE, XA_COMMIT, XA_ROLLBACK, and XA_RECOVER. Pay attention that XA transactions cannot be implicitly committed. If you execute a DDL statement in an XA transaction, an error is returned.
Character set improvement
To meet the requirements for specific character sets in different regions outside the Chinese mainland, OceanBase Database V4.2.5_CE supports the hkscs and hkscs31 character sets of Oracle. It also supports the Japanese character set
SJISalong withUTF16LE,DEC8, andBIG5in MySQL, as well as their associated collations. Moreover, it supports Unicode Collation Algorithm (UCA) collations forUTF8andUTF16, and collations related to UCA 9.0.
OBKV enhancements
Expired data deletion optimized for OBKV-HBase
To improve the query performance in a scenario with hotspot keys, namely, a large amount of data of expired versions is accumulated within a short period of time, OceanBase Database V4.2.5_CE optimizes the expired data deletion feature. It measures HBase rowkeys with an expired time-to-live (TTL) or MaxVersion value when executing HBase queries and quickly generates a TTL-based deletion task to delete expired HBase rowkeys, so as to reduce the amount of data to be scanned during queries, thereby improving the query performance.
ColumnPaginationFilter supported for OBKV-HBase
The ColumnPaginationFilter filter is provided to limit the number of columns to return. You can use this filter to specify to return data of the latest version in the specified number of columns starting from a specific column. This filter is usually used in column pagination. To use this filter, you need to specify the offset and number of columns to return.
Reverse scan supported for OBKV-HBase
The new version supports the reverse scan feature. You can scan data in the table in the reversed order of rowkeys (from large to small). HBase data is sorted and stored by rowkey. Therefore, the reverse scan feature applies to scenarios where data needs to be accessed in descending order of rowkeys.
Client information views
OBKV supports the client/server access mode. In OBKV, data is accessed through RPC. It does not have the concept of session and cannot show client parameters in sessions. OceanBase Database V4.2.5_CE provides the [G]V$OB_KV_CLIENT_INFO views to display client information and the __all_virtual_kv_client_info table to record client parameters, such as the connection pool size, response timeout period, and number of retries upon failure. You can query the views for the configurations of the business client to improve troubleshooting efficiency.
OBKV-HBase enhancement
This version introduces the
ColumnPaginationFilter, a filter designed to limit the number of returned columns. Starting from a specified column, it retrieves the latest version of a defined number of columns, making it particularly useful for column pagination. To use this filter, you must specify both the offset and the limit. Additionally, the Reverse Scan feature is now supported, enabling users to scan table data in reverse RowKey order (from largest to smallest), which is the opposite of the standard forward scan order (from smallest to largest). Since HBase stores data sorted by RowKey, the Reverse Scan feature is especially beneficial in scenarios where data needs to be accessed in descending RowKey order.OBKV-Redis model
OBKV-Redis is a persistent database compatible with Redis APIs, designed to address the limitations of traditional Redis combined with relational database services (RDS). These limitations include the inability to ensure data consistency, complex architecture management, poor scalability, lack of transactional capabilities, and high O&M costs. OBKV-Redis integrates caching and database functionalities, storing hot data in memory and cold data on disk. It also provides transactional capabilities to ensure data consistency, meeting the RT and throughput requirements for 80% of user scenarios. OBKV-Redis is primarily compatible with native Redis 3.2, 4.0, and 5.0, supporting five basic data types: string, hash, list, set, and zset, as well as some common commands such as TTL, EXPIRE, and DEL.
OBKV performance optimization
The multi_set API of the storage layer is supported for multi-PUT operations. Batch data reads and conflict checks are supported to improve the batch import performance of OBKV. OceanBase Database V4.2.3 supports group commit. You can execute PUT or GET operations on the server in batches. This improves the performance of GET and PUT operations by 20% to 30%. OceanBase Database V4.2.5_CE optimizes the group commit feature. Apart from GET and PUT, group commit is also supported for INSERT, DELETE, REPLACE, UPDATE, INSERTUP, and INCREMENT/APPEND operations. Local indexes and global indexes are supported in group commit of the operations, except PUT. Operations are grouped by log stream instead of tablet and the request queue is transformed to a lock-free queue. This addresses performance issues incurred from frequent group locking in high concurrency scenarios. The batch size for group commit can be smoothly adjusted. The tenant-level parameters kv_group_commit_batch_size and kv_group_commit_rw_mode are added to respectively specify the batch size for group commit in OBKV and the operation types supported for group commit when the feature is enabled. The enable_kv_group_commit parameter is deprecated. The (G)V$OB_KV_GROUP_COMMIT_STATUS views related to group commit are provided to monitor the status information about each log stream group, such as the request queue size, batch size, number of log stream groups, and queue size of a failed group.
Performance improvements
Table-level restore performance optimization
The overall table-level restore process comprises three steps: (1) Restore an auxiliary tenant to the specified point in time based on the backup data; (2) Import the specified table from the auxiliary tenant to the destination tenant; (3) Clear the auxiliary tenant.
In earlier versions, the task scheduling strategy in the table import task scheduler enforces serial execution of multiple table import tasks. The degree of parallelism (DOP) is low in the data completion and index creation phases of the primary table. To improve the table-level restore performance, the new version optimizes the parallelism in the following aspects:
- The tenant parameter
recover_table_concurrencyis provided to control the maximum number of table import tasks (up to K) that can execute simultaneously, enhancing the parallelism of multi-table imports. - Table-level recovery parallel control parameters provided by DDL are leveraged to increase the concurrency of index creation.
- The tenant parameter
Upgrade process optimization
OceanBase Database V4.2.5_CE optimizes the upgrade process to improve the post-upgrade performance. Batch upgrade is supported for system variables, system tables, and virtual tables (views), and unnecessary schema refresh operations are reduced. After these optimizations, the average upgrade time per tenant is approximately one-tenth of what it was before.
DML performance optimization
The new version supports batch processing of the
INSERT ... VALUES ... ON DUPLICATE KEY UPDATE ...andREPLACE ... VALUES ...statements in multi-query scenarios to improve the DML execution performance. When you execute the MERGE INTO statement on a table with a unique key, parallel DML (PDML) is supported to improve the execution efficiency in scenarios with large amounts of data.
Resource optimization
DDL resource isolation
DDL requests occupy worker threads and share thread resources with DML requests and query requests. In this case, resource contention may occur. If a large number of DDL requests are initiated at a point in time, normal business requests may be affected. After parallel DDL is introduced, the tenant may face higher DDL pressure. OceanBase Database V4.2.5_CE isolates worker threads used by DDL operations from other requests, and provisions additional worker threads when DDL synchronization is pending. This ensures that sufficient worker threads are available for other tasks during heavy DDL workloads, thereby improving system stability.
Resource isolation for statistics collection and clog commit
In a high-performance computing environment, reasonable resource allocation and isolation are decisive in ensuring system stability and improving efficiency. An effective resource isolation strategy can prevent resource contention and interference between tasks, thereby improving the resource utilization efficiency and overall service quality. In earlier versions, you can configure different unit configs for tenants to implement resource isolation between the tenants, and use the resource manager to configure resource isolation within a tenant. Resource isolation is supported for CPU and I/O resources. To reduce the impact of background tasks on user requests, resource isolation is required for background tasks, and has been supported for most background tasks. The new version supports resource isolation for statistics collection and clog commit to further enhance the resource isolation mechanism and reduce the impact of background tasks on user requests. In terms of I/O resource isolation, three parameters
MIN_IOPS,MAX_IOPS, andWEIGHT_IOPSare provided for respectively specifying the minimum IOPS, maximum IOPS, and IOPS weight. According to the I/O resource scheduling strategy, the database guarantees resources to meet the minimum IOPS, and schedules resources based on the IOPS weight while ensuring that the maximum IOPS is not exceeded. The resource isolation strategy is described as follows:- Set the
MAX_IOPSparameter to10for clog commit, which means that a maximum of 10% IOPS resources can be used for committing clogs. - Set the
MAX_IOPSparameter to20for statistics collection, which means that a maximum of 20% IOPS resources can be used for collecting statistics. - Limit the IOPS weight for foreground tasks and statistics collection. Specifically, set the default
WEIGHT_IOPSvalue to100for foreground tasks, and to20for statistics collection. According to the weight-based scheduling strategy, the bandwidth resources for foreground tasks and those for statistics collection conform to the ratio 5:1.
OceanBase Database provides a default isolation template to simplify resource isolation configuration. Background resource isolation is supported for CPU and I/O resources. The control group (cgroup) feature is required for CPU resource isolation, and is optional for I/O resource isolation.
- Set the
Memory limits for SQL queries
OceanBase Database can track the memory usage of individual SQL queries and take actions, such as returning an error to the client or logging a warning, when the memory usage exceeds the specified threshold. To ensure overall system stability in scenarios of large queries or nonoptimal plans, make sure that a single SQL query does not occupy excessive memory resources. The tenant-level parameter
query_memory_limit_percentageis provided to specify the maximum percentage of memory in the tenant available for a single SQL query. When the memory usage reaches the specified threshold, the SQL query is interrupted with an error returned.Memory optimization for stored procedures
OceanBase Database V4.2.5_CE replaces the general memory allocator used for stored procedures with a custom memory allocator, which implements proactive memory release and shortens the memory lifecycle of stored procedures. The new version also optimizes the design of the symbol table, memory usage of basic data types, memory management for complex data types such as RECORD and ARRAY, and memory management for expressions. This addresses the issue of memory accumulation in various scenarios, and improves the memory usage stability of stored procedures and the calculation performance of some expressions.
Reliability improvements
Migration of active transactions during tablet transfer
In the design of standalone log streams, data is in the unit of tablets, while logs are in the unit of log streams. Multiple tablets are aggregated into one log stream, saving the high cost of two-phase commit of transactions within a single log stream. To balance data and traffic among different log streams, tablets can be flexibly transferred between log streams. However, during the tablet transfer process, active transactions may still be handling the data, and even a simple operation may damage the atomicity, consistency, isolation, and durability (ACID) of the transactions. For example, if active transaction data on the transfer source cannot be completely migrated to the transfer destination during concurrent transaction execution, the atomicity of the transactions cannot be guaranteed. In earlier versions, active transactions were killed during the transfer to prevent transaction problems. This mechanism affects the normal execution of transactions to some extent. To resolve this problem, the new version supports the migration of active transactions during tablet transfer, which enables concurrent execution of active transactions and ensures that no abnormal rollbacks or consistency issues occur in concurrent transactions due to the transfer.
Adaptive arbitration upgrade or downgrade based on the I/O load
The arbitration service provides an algorithm for detecting faults on log disks based on the I/O load, which can effectively detect cloud disk faults and perform arbitration downgrade to prevent cloud disk faults from continuously affecting business requests. Specifically, the fault detection algorithm continuously monitors the write performance of a log disk. If the write performance of the log disk decreases to a value (specified by a parameter) that is significantly lower than the baseline, the algorithm marks the log disk as faulty and performs arbitration downgrade. The algorithm continues monitoring the log disk performance after marking the log disk as faulty, and marks the log disk as normal when the performance recovers to the normal range. This way, business requests submitted after the arbitration upgrade are not affected by log disk faults. The cluster-level parameter
log_storage_warning_trigger_percentageis provided for specifying the write performance threshold in percentage for triggering a log disk fault.Backup support for parameters and tenant resource configurations
V4.2.5_CE introduces support for backing up both cluster-level and tenant-level parameters. Cluster-level parameters can be backed up separately by specifying the path information using the
ALTER SYSTEMcommand. Tenant-level parameters are included in the backup data set and are backed up automatically during data backup. When performing tenant-level physical recovery, it is recommended that the target tenant's resource configuration matches that of the source tenant to ensure a higher success rate for physical recovery and maintain the stability of subsequent production operations. To assist with configuring resources for the target tenant, V4.2.5_CE also supports backing up the resource configuration and replica distribution information of the tenant.
Usability improvements
Parallel log stream replica migration optimization
In earlier versions, before RootService generates a load balancing task that needs to modify the
paxos_replica_numparameter, such as data synchronization to replicas or replica deletion, RootService needs to wait for the previous member change task in the log stream to be completed and obtain the accuratepaxos_replica_numvalue. In other words, multiple replica tasks, such as replica migration, in the same log stream can only be executed in serial. The serial execution limitation results in low efficiency of some load balancing tasks and O&M operations. For example, if you migrate two units in two zones at the same time, the two replicas of the same log stream in the two units must be migrated in order rather than in parallel. Since V4.2.5_CE, replicas of the same log stream in different zones can be migrated in parallel. This can accelerate scaling, improve load balancing efficiency, and reduce the probability of replica rebuilding for the standby tenant. The primary tenant requires parallel log stream replica migration to accelerate scaling in proof of concept (POC) scenarios. It may not enable parallel migration in regular scenarios. This is because a large amount of data needs to be replicated during parallel replica migration, which may compromise the I/O performance of the leader. The standby tenant requires parallel replica migration to effectively reduce the probability of replica rebuilding. The tenant-level parameterreplica_parallel_migration_modeis provided for controlling the parallel migration mode for log stream replicas.Row lock wait and retry wait events provided in the ASH diagnostic module
When row lock contention occurs, multiple sessions may be blocked, compromising the system performance. In earlier versions, you can query the
V$OB_LOCKSandV$OB_TRANSACTION_PARTICIPANTSviews for troubleshooting when row lock contention occurs. The ASH diagnostic module provides row lock wait time statistics and implements a wait loop based on session blocking to implement performance diagnostics for the row lock link. The ASH diagnostic module also provides retry wait event statistics and categorizes wait events into theLOCATION,SCHEMA,SQL,STORAGE,TX, andOtherclasses.Response time histograms
OceanBase Database of earlier versions provide statistics on the average response time and maximum response time in different SQL categories, which lack fine-grained statistics that reflect the SQL execution performance at a specific percentile. OceanBase Database V4.2.5_CE provides response time histograms. You can query and monitor response time statistics of a specific SQL type at a specific percentile, such as P90 or P95, based on the
[G]V$OB_QUERY_RESPONSE_TIME_HISTOGRAMview.Statistics monitoring optimization
OceanBase Database V4.2.5_CE restructures the statistics monitoring framework. The new framework records all diagnostic data generated during database execution and stores the data in an appropriate form. You can query the data from different dimensions, such as tenants, groups, and sessions. The memory and CPU overheads of the framework are controlled at an appropriate level to avoid blocking or affecting regular database tasks. In the new framework, after a session is disconnected, the corresponding wait events and statistical items can no longer be queried from diagnostic views. Diagnostic data that only reflects the internal running situation is no longer recorded to avoid confusing customers. Apart from tenant-level and session-level statistical views, group-level statistical views are provided to help you better monitor diagnostic data in resource isolation scenarios.
Log synchronization link optimization
OceanBase Database V4.2.5_CE provides the
GV$OB_LOG_TRANSPORT_DEST_STATandV$OB_LOG_TRANSPORT_DEST_STATviews related to log transmission links to facilitate observation and monitoring. You can query the views in the following scenarios:- Observe the number of log synchronization links and the traffic in the downstream systems of the primary tenant, and take actions when the read I/Os in the log synchronization links affect the write I/Os of the primary tenant.
- When a specific log synchronization link with a slow progress is detected, you can query the views to check whether the RPC processing time is as expected and determine whether the log synchronization link is normal based on the check result.
System log optimization
The
./alert/alert.logfile is added to the system log directory to record log information concerned by DBAs. This log file can help resolve the issue where the observer.log file has poor readability because it contains a large volume of logs. You can use the cluster-level parameteralert_log_levelto set the log level toINFO,WARN, orERROR.Dynamic modification of memory specifications for OBServer nodes
Memory specification modification for OBServer nodes takes effect upon a restart in earlier versions, and in real time in the new version, which improves the elastic scaling capability.
Improvement of the
SHOW PROCESSLISTcommandThe
SHOW PROCESSLISTcommand shows the status information about the nodes of a tenant. The virtual tables and views related to theSHOW PROCESSLISTcommand are reconstructed. Thetotal_timecolumn indicates the total request time, which starts from when a request is received to when a response is returned to the client. If no request is received, thetotal_timecolumn indicates the duration of the current status. Thetotal_cpu_timecolumn is provided to describe the duration of a request remaining in the active state in a session. The remote execution status and information are also improved. With these improvements, the semantics of time-related columns are clearer and more scenarios are covered.SPM evolution information stored on the disk and import or export of baselines
To better monitor SPM execution, OceanBase Database V4.2.5_CE provides the
DBA_OB_SPM_EVO_RESULTview for recording detail results of SPM evolution tasks. You can call stored procedures in the DBMS_SPM package to import and export SPM baselines based on intermediate tables. Specifically, you can callDBMS_SPM.CREATE_STGTAB_BASELINESto create a baseline cache table, callDBMS_SPM.PACK_STGTAB_BASELINESto export baselines to the cache table, and callDBMS_SPM.UNPACK_STGTAB_BASELINESto import baselines from the cache table. To facilitate migration across tenants, the new version also allows you to import and export baselines by using CSV files. Specifically, you can callDBMS_SPM.PACK_CSV_BASELINESto export baselines to a CSV file and callDBMS_SPM.UNPACK_CSV_BASELINESto import baselines from a CSV file.
Compatibility changes
Product behavioral changes
| Feature | Change description |
|---|---|
| Group commit | The tenant-level parameters kv_group_commit_batch_size and kv_group_commit_rw_mode are added to respectively specify the batch size for group commit in OBKV and the operation types supported for group commit when the feature is enabled. The enable_kv_group_commit parameter is deprecated. You can set the value of the kv_group_commit_batch_size parameter to a value greater than 1 to enable group commit. |
| Return null values for closed cursors to the driver | In earlier versions, the OBServer node returns the ID of a closed cursor as a handle to the driver. The driver cannot perceive the information that the cursor has been closed. In V4.2.5_CE, for a closed cursor, the OBServer node returns a null value to the driver to inform that the cursor has been closed. |
View changes
| View | Change type | Description | |
|---|---|---|---|
| [G]V$OB_SQL_AUDIT | Modified | The SEQ_NUM column is added to show the sequence numbers of statements in a transaction. |
|
| CDB_WR_ACTIVE_SESSION_HISTORY | Modified | The proxy_sid column is added to show the proxy session ID. |
|
| DBA_WR_ACTIVE_SESSION_HISTORY | Modified | The proxy_sid column is added to show the proxy session ID. |
|
| [G]V$OB_ACTIVE_SESSION_HISTORY | Modified | The proxy_sid column is added to show the proxy session ID. |
|
| DBA_OB_SPM_EVO_RESULT | New | Displays the following information in the current tenant: when the first baseline plan is generated for an SQL statement, the execution result of each SPM evolution task, and the situation where no reproducible baseline plan is available for an SQL statement. | |
| CDB_OB_SPM_EVO_RESULT | New | Displays the following information in the sys tenant: when the first baseline plan is generated for an SQL statement, the execution result of each SPM evolution task, and the situation where no reproducible baseline plan is available for an SQL statement. | |
| DBA_OB_KV_REDIS_TABLE | New | Displays the commands specific to various data structures of Redis, and the corresponding table names, in the current tenant. | |
| CDB_OB_KV_REDIS_TABLE | New | Displays the commands specific to various data structures of Redis, and the corresponding table names, in the sys tenant. | |
| DBA_OB_KV_TTL_TASKS | Modified | The task_type column is added to show the type of the TTL-based deletion task. |
|
| DBA_OB_KV_TTL_TASK_HISTORY | Modified | The task_type column is added to show the type of the TTL-based deletion task. |
|
| CDB_OB_KV_TTL_TASKS | Modified | The task_type column is added to show the type of the TTL-based deletion task. |
|
| CDB_OB_KV_TTL_TASK_HISTORY | Modified | The task_type column is added to show the type of the TTL-based deletion task. |
|
| [G]V$OB_KV_CLIENT_INFO | New | Displays the connection parameters of the client in OBKV tenants. | |
| [G]V$ob_res_mgr_sysstat | New | Displays system statistics at the granularity of resource groups. | |
| CDB_WR_SQL_PLAN | New | Displays the SQL plans corresponding to collection-related columns. | |
| DBA_WR_SQL_PLAN | New | Displays the SQL plans corresponding to collection-related columns. | |
| [G]V$OB_LOG_TRANSPORT_DEST_STAT | New | Displays information about downstream log synchronization links. | |
| [G]V$OB_PL_CACHE_OBJECT | Modified | The DB_ID column is added to show the database ID. |
|
| GV$LATCH | Deprecated | To obtain lock diagnostic information, you can query GV$SYSTEM_EVENT and GV$ACTIVE_SESSION_HISTORY. |
Parameter changes
| Parameter | Change type | Description |
|---|---|---|
| recover_table_concurrency | New | The maximum number of tables for parallel restore in the cross-tenant table import phase of a table-level restore. It is a tenant-level parameter. The default value is 0, which specifies to use the default value 1 of the kernel. |
| recover_table_dop | New | The maximum DOP for restoring a single table in the cross-tenant table import phase of a table-level restore. It is a tenant-level parameter. |
| kv_group_commit_batch_size | New | The batch size for group commit in OBKV. It is a tenant-level parameter. A value greater than 1 specifies to enable group commit and use the specified value as the batch size for group commit. The default value is 1, which specifies to disable group commit. |
| kv_group_commit_rw_mode | New | The operation types supported for group commit when the group commit feature is enabled. It is a tenant-level parameter. The default value is ALL, which means that all operation types are supported for group commit. |
| enable_kv_group_commit | Deprecated | Specifies whether to enable group commit. |
| query_memory_limit_percentage | New | The maximum percentage of memory in the tenant available for a single SQL query. It is a tenant-level parameter. The default value is 50. |
| unit_gc_wait_time | New | The maximum time to wait for triggering unit GC. When the specified wait time elapses, GC is forcibly performed. It is a cluster-level parameter. The default value is 1 minute. |
| enable_lock_priority | New | Specifies whether to enable lock priorities. When lock priorities are enabled, specific DDL operations have the highest lock priority. It is a tenant-level parameter. For example, when the RENAME TABLE statement and a DML statement are waiting for a table lock at the same time, the RENAME TABLE statement is the first to obtain a table lock. The default value is False. |
| sts_credential | New | The credential used to access the Security Token Service (STS) provided by OceanBase Cloud Platform (OCP). It is a tenant-level parameter. The default value is an empty string. |
| replica_parallel_migration_mode | New | The parallel migration mode for log stream replicas. It is a tenant-level parameter. The default value is auto, which specifies to enable parallel migration for the standby tenant and disable parallel migration for the primary tenant. |
| ob_storage_s3_url_encode_type | New | Specifies whether the URL encoding method used when a request is sent over the S3 protocol complies with the RFC 3986 standard. It is a cluster-level parameter. The default value is default, which specifies to use the default behavior of the S3 SDK and not to encode some special characters such as the at sign (@). If you set the value to compliantRfc3986Encoding, the URL encoding method used when a request is sent over the S3 protocol complies with the RFC 3986 standard. |
System variable changes
| System variable | Change type | Description |
|---|---|---|
| range_index_dive_limit | New | The number of ranges for cardinality estimation by the optimizer by using the storage layer. It takes effect both at the session and global levels. The default value is 10. |
| partition_index_dive_limit | New | The number of partitions for cardinality estimation by the optimizer by using the storage layer. It takes effect both at the session and global levels. The default value is 10. |
| ob_enable_parameter_anonymous_block | New | Specifies whether to enable parameterization for anonymous blocks. It takes effect both at the session and global levels. The default value is True. |
System package changes
| System package | Change type | Description |
|---|---|---|
| DBMS_SPM | Modified | Provides the DBMS_SPM.CREATE_STGTAB_BASELINES subprogram for creating a baseline cache table, the DBMS_SPM.PACK_STGTAB_BASELINES subprogram for exporting baselines to the cache table, and the DBMS_SPM.UNPACK_STGTAB_BASELINES subprogram for importing baselines from the cache table. To facilitate migration across tenants, the new version also allows you to import and export baselines by using CSV files. Specifically, you can call DBMS_SPM.PACK_CSV_BASELINES to export baselines to a CSV file and call DBMS_SPM.UNPACK_CSV_BASELINES to import baselines from a CSV file. |
| DBMS_BALANCE | Modified | Added new syntax SET_BALANCE_WEIGHT and CLEAR_BALANCE_WEIGHT for setting and removing partition weights. |
Syntax changes
| Syntax | Change description | |
|---|---|---|
| The syntax for importing specified features is added. | You can run this command only in the sys tenant. You can import the time zone metadata of OBKV-Redis and spatial reference system (SRS) IDs of spatial data. Functionality of alter system load module data module=[redis/timezone/srs] tenant=[tenan_name] [infile=file_path]:
|
|
| The command for triggering procedure or function recompilation is added. | The syntax for triggering the recompilation of a procedure or function is as follows: alter {procedure |
function} name compile [reuse settings]. When using the reuse settings clause, the procedure or function is compiled using the same settings as the original PL object.If the reuse settings clause is not used, the procedure or function is compiled using the settings of the current session. |
| The CACHE INDEX statement is provided. | Mocked from MySQL syntax and does not provide actual functionality. | |
| The LOAD INDEX INTO CACHE statement is provided. | Mocked from MySQL syntax and does not provide actual functionality. | |
| A hint is added for controlling the storage behavior of statement-level ENUM/SET metadata. | A hint is added for controlling the storage behavior of statement-level ENUM/SET metadata. This hint has higher priority than the corresponding parameter. -- Enable: select /*+opt_param('enable_enum_set_subschema', 'true') / * from t1; -- Disable: select /+opt_param('enable_enum_set_subschema', 'false') */ * from t1; |
Recommended versions of tools
The following table lists the recommended versions of tools for OceanBase Database V4.2.5_CE.
| Component | Version | Remarks |
|---|---|---|
| ODP | V4.3.2.0 | |
| OCP | V4.3.2 BP1 | |
| OceanBase Developer Center (ODC) | V4.3.2_BP1 | |
| OBCDC | V4.2.5_CE | |
| OceanBase Migration Service (OMS) | If the destination is OceanBase Database V4.2.5_CE, use OMS V4.2.3. If the source is OceanBase Database V4.2.5_CE, contact OceanBase Technical Support. | |
| OceanBase Client (OBClient) | V2.2.6 | |
| LibobClient | V2.2.6 |
Upgrade notes
- If your cluster contains a large number of tablets, it takes much time to restart OBServer nodes during the upgrade. It takes about 20 minutes to restart OBServer nodes if the cluster contains more than 3 million tablets. If a large number of tablets are involved, you need to reserve sufficient time for the upgrade.
- Only OceanBase Database V4.2.1_CE BP2 and earlier can be upgraded to V4.2.5_CE. OceanBase Database V4.2.1_CE BP3 and later cannot be upgraded to V4.2.5_CE. Support for upgrading from all V4.2.1_CE BP versions is expected in V4.2.5_CE BP1. Pay special attention to the upgrade path.
- V4.2.2.1_CE is a barrier version for V4.2.2.0_CE. To upgrade OceanBase Database from V4.2.2.0_CE to a future later version, you need to upgrade OceanBase Database to V4.2.2.1_CE first.
- You can directly upgrade OceanBase Database from a version of the V4.2.3_CE or 4.2.4_CE series to V4.2.5_CE.
- To upgrade ODP and OBServer nodes, we recommend that you first upgrade OBServer nodes to OceanBase Database V4.2.5_CE and then upgrade ODP to V4.3.2.
- Major compactions and DDL operations are prohibited during the upgrade.