V4.6.0
Version information
- Release date: April 10, 2026
- Version: V4.6.0
- RPM version: oceanbase-4.6.0.0-100000132026040922
Version overview
OceanBase Database V4.6.0 is a major version upgrade designed for modern data processing.
Comprehensive upgrade of AI hybrid search capabilities
This release introduces a native SQL hybrid search interface that supports multi-modal integrated queries across vector, full-text, and scalar data. It adds match phrase search and search index, and refactors the hybrid search execution framework. Additionally, this release significantly optimizes both the querying and building performance of full-text indexes, delivering a high-performance, high-precision hybrid search experience for scenarios such as AI knowledge bases.
Enhanced vector database features
This release optimizes the performance of HNSW indexes in incremental scenarios, extends IVF indexes to support tables without primary keys, and improves large-scale clustering results using HGraph. It also introduces recall rate evaluation for vector indexes and sparse vector optimization. Combined with Document AI processing capabilities, this release builds an end-to-end "RAG all-in-one" knowledge base solution.
Improved HTAP mixed workload capabilities
This release implements intelligent routing and strong consistency reads for columnstore replicas, and introduces automatic partition splitting for columnstore tables. It optimizes the PX parallel plan queuing and adaptive task partitioning mechanisms, and supports incremental data encoding as well as skip index, providing more balanced and efficient analytical processing capabilities for mixed workloads.
PL engine performance optimization
This version improves system package execution efficiency and supports reusing Plan Cache for dynamic SQL inside PL. It enhances FORALL batch execution, UDF index lookups, asynchronous transaction commit, and streaming cursors, significantly improving stored procedures and complex business logic and providing stronger support for high-concurrency OLTP workloads.
Improved compatibility and functionality integrity
This version supports MySQL Session temporary tables and optimizes the performance of global temporary tables in Oracle. It enhances SQL capabilities with features such as recursive CTE, partition exchange, and materialized views. Additionally, it supports Iceberg V3 format and V2 write operations, provides the REST catalog protocol, and supports complex data types. This offers a more comprehensive solution for enterprise data integration and analysis.
Optimized system performance
The performance of large-capacity domestic machines is enhanced. The performance of PDML and replica table queries is optimized, achieving a 14x performance improvement in high-concurrency scenarios. A lock-free memory allocator is introduced, leading to a 168% performance improvement in single-threaded memory allocation and significantly enhancing the overall system performance.
Enhanced OBKV data model capabilities
This release adds weak read routing to the nearest IDC, TTL task index scanning, and rate-limiting optimizations, and improves HBase monitoring metrics. It also supports low-level put operations for the time-series model and optimizes hot key queries, providing more efficient data access and O&M monitoring capabilities for time-series and wide-table scenarios.
Enhanced OBCDC features
This release supports incremental data synchronization after table-level recovery to ensure the stable operation of OBCDC. Additionally, it supports the real-time calculation of virtual generated columns to ensure downstream MySQL Binlog compatibility, improving the reliability and consistency of data synchronization.
Comprehensive improvements in security and reliability
This release introduces non-intrusive column encryption and the caching_sha2_password authentication mechanism, and optimizes the standby database read feature to reduce message interactions. It also enhances the integrity verification of backup data and improves resource limits for large queries to ensure service quality when large and small queries coexist, providing more comprehensive data protection and stability guarantees for core enterprise businesses.
Enhanced O&M and management capabilities
This release refactors the SQLSTAT and ASH architectures to ensure complete and accurate monitoring data. It refines wait events, optimizes Plan Cache memory management and diagnostics, and improves LOB consistency verification capabilities as well as DROP/PURGE DATABASE performance. Furthermore, this release adds support for heterogeneous zones and introduces the window compaction feature to optimize compaction performance, providing smarter and more efficient O&M guarantees for large-scale enterprise production environments.
Key features
Enhanced hybrid search capabilities
Hybrid search SQL syntax
Starting from V4.4.1, you can use the
DBMS_HYBRID_SEARCHsystem package to perform hybrid search on vector, full-text, and scalar indexes. In V4.6.0, a more user-friendly SQL hybrid search interface is introduced, supporting native SQL syntax for multimodal fusion search. The newSELECT ... FROM HYBRID_SEARCH(TABLE table_name, DSL_STRING)syntax allows you to access the DAS hybrid search framework through pure SQL, providing better query performance. In heap tables, you can perform hybrid combinations of vector search (KNN), full-text search, scalar filtering (term/range/terms), and JSON/array expression queries. It supports various fusion algorithms, including weighted fusion, RRF (Reciprocal Rank Fusion), and WRRF (Weighted RRF). It also supports multi-vector queries, complex Boolean combinations (must/should/filter/must_not), andmin_scorescore filtering. Additionally, it supports score normalization for multi-query results.Match Phrase feature in full-text indexes
To meet the short phrase search requirements in knowledge base scenarios, the new version supports the Match Phrase feature. This feature maintains the position arrays of tokens within documents in the inverted index auxiliary table, enabling precise phrase matching and fuzzy matching with tolerance distance (slop) in the full-text index. It supports the
match_phrasesyntax similar to Elasticsearch, including theslopparameter for controlling the tolerance of token distances. It also maintains compatibility with the oldMATCH AGAINST PHRASE MODEsyntax while improving performance. The newFTS_INDEX_TYPE=PHRASE_MATCHindex type option is available, with position information storage enabled by default but can be disabled to save storage space. During queries, a dedicated Phrase Query iterator andslopcalculation algorithm are used for pre-filtering optimization, suitable for document search scenarios with higher phrase weights after algorithm processing in RAG workflows, enhancing the accuracy and efficiency of hybrid search.Search Index feature
To meet the efficient search requirements for complex types like JSON/ARRAY/MAP in hybrid search scenarios, the new version supports the Search Index feature. Compared to traditional function indexes and multi-value indexes, Search Index supports equality, inclusion, and range queries on all paths of JSON fields through a single index, addressing performance issues in Ad Hoc queries and schema-less scenarios. Search Index is created using the
CREATE SEARCH INDEXsyntax and allows path filtering withinclude_paths/exclude_pathsand type filtering withinclude_types/exclude_types. The newPICK JSON_TYPEsyntax (e.g.,pick json_number/pick json_string) enables precise filtering of JSON scalar types, optimizing query accuracy. The index uses an inverted table structure and supports Index Merge for combined queries. Initially, it is only supported for heap tables, providing high-performance indexing capabilities for flexible JSON queries in knowledge base platforms, enhancing query efficiency in hybrid search scenarios.Optimized hybrid search execution framework
The new version provides a unified dynamic pruning execution framework for advanced hybrid search scenarios like RAG knowledge bases. This framework supports unified docid pruning and block max score pruning across full-text, scalar, and multi-value indexes. By restructuring into composable search operators, it offers excellent extensibility and supports multi-layer Boolean query semantics described with ES DSL. Adaptive execution strategies are introduced, including cost-driven primary iterator selection, data prefetch width optimization, sort/lookup strategy selection, and switching between sparse pruning and vectorized execution strategies, enhancing hybrid search performance. It also provides hybrid search diagnostics based on query profiles, supporting partition-level profile display of operator execution trees and key statistics to help identify performance bottlenecks.
Optimized full-text index query performance
The new version has been deeply optimized for full-text search TopK and incremental data scenarios. By implementing a lightweight
advance_scanmechanism at the storage layer, optimizing BM25 expression calculation parameters, and improving incremental data query algorithms, the efficiency of full-text index search has been significantly enhanced. In practical tests, this optimization can improve TopK query performance by 80% in scenarios with pure incremental data and achieve significant performance improvements in mixed data scenarios.Optimized full-text index construction performance
The new version optimizes the full-text index construction process to address the issue of excessively long construction times for large-scale data. Previously, the main bottlenecks in full-text index construction in OceanBase Database were sorting, writing, and tokenization. The new version improves performance through several core optimizations, streamlining the construction process to ensure that inverted tables and forward tables are tokenized only once. By scanning the main table, data is ordered by docid, eliminating the need for global sorting. Writing efficiency is improved by using memory-based CONST or RLE encoding formats to handle duplicate rows, reducing the impact of storage row expansion. Tokenization performance is optimized by using a swiss hash map to reduce cache misses, with the tokenization hash map created only once and the tokenizer implementation optimized.
For example, with 32 parallel threads, the index construction time for space tokenization in the msmacro dataset decreased from 313.37 seconds to 46.15 seconds; for the mldr_en dataset, it decreased from 277.14 seconds to 28.22 seconds; for the mldr_zh dataset using the IK smart tokenizer, it decreased from 5152.72 seconds to 62.75 seconds; and for the
mldr_zhdataset using theIKmax_wordtokenizer, it decreased from 5463.48 seconds to 70.63 seconds.
Comprehensive enhancement of vector capabilities
HNSW vector index incremental optimization
To address the pain points of memory bloat, degraded query performance, and slow restart loading caused by excessive incremental data in HNSW vector indexes, this release introduces a segmented architecture and an automatic memory management mechanism:
- It divides incremental data into four segment layers (Active, Frozen, Incr, and Base), supporting freezing, persistence, and background compaction.
- It controls the upper memory limit of the Active Segment and the compaction trigger threshold through the
ob_vector_index_active_segment_max_sizeandob_vector_index_merge_trigger_percentageparameters, balancing performance and resource consumption. - It provides the
dbms_vector.flush_indexanddbms_vector.compact_indexinterfaces to manually trigger the flush and compaction of vector indexes. - It adds the
GV$OB_HNSW_INDEX_SEGMENT_INFOview to enhance observability. After optimization, the memory footprint of vector indexes in frequent write scenarios is controllable, query performance is stable, and fault recovery is faster. Furthermore, it maintains compatibility with original user interfaces, allowing users to achieve performance improvements without modifying their business logic.
IVF index support for tables without primary keys and heap tables
Traditional IVF indexes require a table to have a primary key, which limits their application on data without primary keys. This release extends the capabilities of the IVF index, allowing it to be applied to tables without primary keys and heap tables. This enables enterprises to build efficient vector indexes on data without primary keys, such as historical logs and user behavior, providing more possibilities for data analysis and search. This release adapts the index table structures of
IVF_flatandIVF_pqindexes for tables without primary keys, ensuring they can be used together with other index types, such as HNSW and full-text indexes, in hybrid search scenarios. It supports multi-partition scenarios as well as correct DDL, DML, and query operations, with query performance also improved in certain scenarios.IVF index support for HGraph
To address large-scale vector index scenarios where the number of IVF cluster centers (
nlist) exceeds 5,000, traditional IVF using heap sort search incurs excessive computational costs. This release supports building a lightweight graph structure (HGraph) on the IVF cluster center table, significantly accelerating cluster center search speeds by replacing brute-force search with graph traversal algorithms. During the building process, each iteration accelerates the graph construction for cluster centers and reduces the memory consumption of thecenter_distancematrix, while supporting the rapid location of candidate clusters during the query phase. It maintains a high recall rate with controllable memory overhead, making it particularly suitable for large-scale vector search scenarios wherenlistranges from 5,000 to 65,000. It is recommended for cosine distance calculations; it is not recommended for largenlistscenarios under L2 distance due to the massive memory consumption of pre-computed tables.Vector index recall rate evaluation
This release provides user-queryable vector index quality evaluation capabilities to solve the challenges of vector index performance tuning. Through the
query_recallandindex_recallinterfaces in theDBMS_VECTORpackage, users can proactively calculate the approximate search recall rate for specific queries or specified indexes, evaluating the matching degree between approximate search results and exact brute-force search results. The background automatically samples user queries, periodically calculates the average recall rate in batches, and persists it to theDBA_VECTOR_INDEX_RECALL_REPORTsystem view. This feature supports configuring sampling rates, time windows, and resource limits, providing quantitative evaluation metrics for vector indexes to facilitate index tuning.In-memory sparse vector index optimization
This release includes several key optimizations for in-memory sparse vectors:
- It optimizes sparse vector parsing by replacing regular expressions with character stream parsing, improving performance by approximately 8%.
- It migrates the sparse vector rearrangement mechanism from the vsag library to the observer side to reduce memory usage.
- It optimizes the
pre-filterto skip unnecessary calculations in high-filtering scenarios. - It supports
SINDI_SQquantized indexes to reduce storage costs. - It adds the
[G]V$OB_SINDI_INDEX_INFOview for sparse vector index information. - It adds the
index_vector_memory_advisorsubprogram to theDBMS_VECTORsystem package to estimate the memory usage of vector indexes for tables yet to be created, and theindex_vector_memory_estimatesubprogram to analyze the memory usage of vector indexes for existing tables.
These optimizations improve the query performance and memory efficiency of sparse vector processing, and enhance the ease of use for vector index maintenance and diagnostics.
Support for multiple semantic indexes on a single column
Currently, to test the performance of different types of semantic indexes, users can only create them on different columns or tables for comparison. To make it easier for users to conduct performance testing and comparisons of semantic indexes under different models or parameters, this release supports creating multiple semantic indexes on a single column. It introduces the
vector_indexhint syntax, allowing users to specify the semantic index and filtering method (pre_filterorpost_filter) used by the query via hints. When multiple semantic indexes are created on the same column, queries must explicitly specify the semantic index to be queried using thevector_indexhint; otherwise, an error will be reported. This feature can only be used in testing environments and must be enabled via the_enable_multiple_semantic_indexes_on_columnparameter. Do not enable it in production environments.Document AI capability enhancement
V4.6.0 introduces Document AI features, enhancing document processing capabilities. Includes three core functions:
LOAD_FILEexpression reads files from external storage, supporting local files, HDFS, OSS, S3, etc.AI_PARSE_DOCUMENTparses content of complex-type documents.AI_SPLIT_DOCUMENTtable function splits document content and outputs segment information.
Combined with
AI_EMBED,AI_RERANK,AI_COMPLETE, and hybrid search capabilities, it enables "RAG all in One" knowledge base solutions.
Optimizations for mixed HTAP workloads
Unified entry for columnstore replicas and full-featured (F) replicas
To support mixed TP and AP workloads in HTAP scenarios where business cannot be separated, OceanBase Database V4.6.0 introduces automatic routing for columnstore replicas. This feature allows columnstore replicas and F replicas to use the same request entry. The optimizer automatically routes complex analytical queries to columnstore replicas based on features such as weak reads, parallel execution, or when the cost of row-based queries exceeds a threshold. This enables automatic traffic splitting between TP and AP workloads, improving query efficiency and resource utilization. The feature provides three control modes:
OFF(disabled).AUTO(intelligent routing).FORCE(force columnar execution).
New system variables
ap_query_route_policy,ap_query_cost_threshold,ap_query_replica_fallback, and hints provide flexible query-level control. These settings are compatible with the existingob_route_policystrategy, and when set toCOLUMN_STORE_ONLY, they take precedence.Strong consistency reads on columnstore replicas
Traditional analytical queries often tolerate weak consistency reads, which provide cost-effective performance scaling. However, in HTAP scenarios with strict data consistency requirements, strong consistency reads are necessary to ensure the real-time and accurate results of analytical queries. OceanBase Database V4.6.0 introduces strong consistency reads on columnstore replicas. When an AP query is automatically identified and routed to a columnstore replica, if the
ob_read_consistencyparameter is set to'strong', the system retrieves a global consistent snapshot version using GTS and checks the replay status of the columnstore replica. If the replica has not replayed to the required snapshot, it waits for the replay to complete before executing the read operation. This ensures that the latest data consistent with the primary replica is read. Compared to weak reads, strong reads provide stronger consistency guarantees, supporting AP queries requiring transaction-level consistency.Automatic partition splitting for columnstore tables
Earlier versions of OceanBase Database supported automatic partition splitting for rowstore tables. V4.6.0 extends this feature to columnstore tables with a primary key. This feature determines the split point based on sampled memory and SSTable data within a partition and splits the data by row number into two target partitions. It shares the same parameters (
enable_auto_splitandauto_split_tablet_size) and syntax as automatic partition splitting for rowstore tables. However, by default, automatic partition splitting for columnar SSTables is triggered only when the number of rows exceeds 1 million. This feature helps distribute large table data across multiple partitions, improving AP query and write performance and achieving load balancing.Optimized parallel execution queuing mechanism for PX
The previous approach of capping threads using the
parallel_servers_targetparameter was inflexible and could misallocate resources. OceanBase Database V4.6.0 introduces a new parallel execution queuing strategy based on real-time load. By statistically analyzing the real-time number of PX threads, this strategy enables queuing before execution and dynamic adjustment of parallelism during execution. When the real-time thread count exceeds thepx_target_low_watermarklow watermark, the parallelism is linearly reduced. When it exceeds thepx_target_high_watermarkhigh watermark, the parallelism is reduced to 1 to avoid CPU overload. This new strategy better addresses resource scheduling imbalances and resource waste in large-scale parallel scenarios.Adaptive task partitioning for PX
The new version optimizes the phenomenon of long-tail tasks in SQL parallel execution scans. It dynamically adjusts the parallel task partitioning strategy based on data distribution and system load, improving the efficiency and resource utilization of parallel execution. In columnstore table scans, after using Skip Index for rapid filtering, data is concentrated in a few granules, while other granules are quickly consumed. When the remaining granules are fewer than the PX threads, the previous version would experience a drop in parallelism and long-tail scanning. The new version introduces PX task rebalancing capabilities. During runtime, if the remaining granules are fewer than the PX threads, the remaining granules are further split. This ensures a more balanced distribution of scanning tasks, preventing some threads from being idle while others bear excessive workload, thereby improving the parallel scanning efficiency of columnstore tables.
Optimized query performance for incremental data
Based on the LSM-Tree architecture, OceanBase Database provides a storage mode where baseline data is stored by column and incremental data is stored by row. This is suitable for scenarios with real-time, high-frequency data writes and complex analysis. However, when there is a large amount of incremental data, query performance may be affected. The new version optimizes incremental data read speed by supporting encoding and Skip Index for incremental data. It also pushes down expression aggregation statistics at the data block level. For the TPC-H 100G dataset, when the table mode is Delete-Insert and all data is incremental, hot query performance improves by approximately 260%, and cold query performance improves by approximately 150%. Incremental encoding can be enabled via the tenant-level parameter
default_delta_formator the table-level attributedelta_format. Incremental Skip Index can be enabled via the tenant-level parameterdefault_skip_index_levelor the table-level attributeskip_index_level.
Optimizer and executor enhancements
Optimized hard parsing performance for SQL
To address excessive memory usage during hard parsing of complex SQL statements with many repeated elements (such as large
IN,OR, andINSERTclauses), the new version optimizes memory usage by restructuring the array data structures used during SQL parsing. Key optimizations include:- Introducing a new
ObSqlArrayto replaceObSEArray, reducing memory overhead. - Removing the
auto_freemechanism and usingallocatorfor explicit construction to prevent memory bloat. - Eliminating pre-allocated arrays to reduce initial memory consumption.
- Adjusting the expansion strategy to dynamically calculate
block_sizebased on data types.
These changes effectively address static memory overhead for empty arrays and dynamic expansion memory bloat, significantly improving memory efficiency during hard parsing of complex SQL statements.
- Introducing a new
Enhanced predicate derivation capabilities
The new version optimizes the logic for general predicate derivation (
IN,LIKE,!=) to allow cross-view derivation. It also uses rules to determine the derivation relationships between different types of predicates. Additionally, it supports the removal of redundant NLJ (Nested Loop Join) predicates, identifying and eliminating those that are redundantly calculated on the right side if they were already executed on the left side, especially when the right side is a subquery or view. This allows for predicate elimination at the base table level, reducing unnecessary computational overhead.Enhanced plan pushdown capabilities
In V4.6.0, the generation of
Runtime Filterhas been moved from the first stage to the second stage of the optimizer, enhancing plan robustness. The version control mechanism removes the first-stage generation logic, and in the second stage,Runtime Filteris added as a supplement to the selected plan, reducing reliance on statistics. The effectiveproducerjudgment logic has been enhanced, identifying operators with filtering effects instead of justfilteroperators.Runtime Filternow supports cross-DFO (Data Flow Operator) transmission and broader pushdown capabilities, penetrating more operators such asWindow,Group By, andSet Operation. It also improves column derivation for equality join conditions, enhancing data filtering efficiency.Additionally, the new version enhances
Group Bypushdown capabilities, solving the issue of intelligent pushdown for aggregation operators. Using a post-order traversal of the logical plan tree,Group Byis pushed down during plan generation, avoiding high overhead from cost verification. It supports splittingGroup Byat JOIN nodes, performingLocal Shufflebefore aggregation to improveshardingmatching, and intelligent placement decisions forGroup By. It identifies and pushes downSingle Group By, optimizing query performance while maintaining usefulsharding, particularly beneficial for large-scale scenarios like star schemas, reducing intermediate data transmission.Optimized common temporary table extraction
In V4.6.0, the logic for extracting common subexpressions in SQL statements has been optimized. Within a composite SQL statement, the new version can more intelligently identify shared computational parts among multiple similar queries with
GROUP BYclauses. For example, for multiple similar queries likeSELECT SUM(t1.c1), t2.c3 FROM t1, t2 WHERE t1.c1 = t2.c1 AND t2.c3 = some_value GROUP BY t2.c3, the old version could only extract common table expressions withoutGROUP BY, while the new version can extract the complete computation includingGROUP BYinto a common temporary table. Each branch then filters the results of the common temporary table based on the conditiont2.c3 = some_value. By sharing the same grouping computation results, it reduces redundant calculations and improves performance.Optimized execution of vector assignment syntax
The execution layer introduces the
AliasRefexpression to replace the previous cross-layer direct reference method. This addresses the expression specification issues when using vector assignment syntax likeSET (col1, col2) = (SELECT ...)inUPDATEstatements, ensuring correct execution and proper specification description during the optimizer phase.Multi-value index support for Index Merge
To address performance issues with JSON multi-value indexes in scenarios with multiple multi-value predicates, the new version allows multi-value indexes in the Index Merge path. It generates separate scan nodes for each multi-value predicate, enabling precise predicate extraction and reducing expression computation. It also supports rewriting
JSON_CONTAINSinto multipleMEMBER OFpredicates, avoiding non-precise predicate calculations and improving query efficiency for multi-value index queries, especially in JSON arrays with multiple query conditions.Optimized Join Order selection algorithm
When SQL queries need to join multiple tables (such as in an order system querying user, product, and logistics information), different join orders can drastically affect query speed. Previous versions used the Iterative Dynamic Programming (IDP) algorithm for Join Order enumeration to find the optimal plan, but this was time-consuming with large datasets. V4.6.0 introduces a heuristic permutation enumeration algorithm that can quickly and with less memory enumerate a relatively optimal plan, accelerating hard parsing. It also employs an adaptive dual-algorithm strategy: for small-scale queries, it uses precise dynamic programming to find the optimal solution; for complex queries, it switches to the efficient permutation enumeration algorithm to quickly find a good approximate solution. This optimization is particularly beneficial for report queries and data analysis, significantly reducing query plan generation time and avoiding computational explosion through intelligent pruning.
Optimized performance for distinct aggregation queries
Traditional three-phase parallel algorithms for queries with multiple
distinctaggregations can suffer from data replication bloat, high memory usage, and disk spilling. The new version introduces several optimizations:- Replaces redundant data replication with the
Expansionoperator to reduceshuffledata volume. - The
Hash Distinctoperator supports independent hash table construction fordistinctcolumns, reducing row length and computational overhead. - Supports intelligent rewriting of
filter distinctscenarios, convertingcount(distinct if(...))into a combination ofany_trueandcountto avoid redundant replication of the same data column. - Optimizes transmission encoding by compressing invalid
NULLcolumns. This significantly reduces memory consumption and disk I/O, preventing performance degradation from data bloat, especially in large-scale scenarios with multipledistinctcolumn aggregations.
- Replaces redundant data replication with the
Optimized Hash Group By Limit Pushdown
To address high memory and CPU consumption in
GROUP BY ... LIMIT Kqueries where theHash Group Byoperator may need to compute all groups, the new version introducesLimitpushdown optimization. When the number of groups in the hash table reaches theLimitthreshold, it stops creating new groups and only processes data that belongs to existing groups. By pushing theLIMITvalue down to theHash Group Byoperator, the algorithm complexity is reduced fromO(N)toO(K), significantly decreasing unnecessary calculations and memory usage. Testing shows a 7-8x performance improvement and a 10x reduction inHash Group Byoperator execution time for 1M rows withLimit 10, substantially reducing resource consumption for large-scale group queries.Enhanced function computation performance
V4.6.0 optimizes the computation performance of several functions, including
approx_count_distinct,listagg ... rollup, time extraction functions forstringtypes,cast string to datetime/date,sec_to_time,replace,hex,reverse,instr,locate,concat,lrpad,left,right,sign,exp,sha/sha2,aesdecrypt,crc32,from_base64, bitwise operation functions, and IP-related functions. This further accelerates data processing and meets the requirements for real-time computing. The new version also enhances the self-validation capabilities of the vectorized expression framework, improving the stability of expression computations.Window operator support for streaming computation
To address the issue of high memory and CPU overhead caused by materializing entire partitions during window function computations, V4.6.0 implements fully streaming execution. It supports
row_number,rank, anddense_rankfunctions, as well asfirst_value,last_value, andnth_valueunder specific window definitions (ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) and rules, and aggregate functions without thedistinctclause. It also supportsfirst_valueandnth_valueunder specific window definitions (ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) and rules. By restructuring the window operator framework and optimizing modules such as partition encoding, extraction of common operations, and temporary memory lifecycle management, the system can compute and output results row by row without materializing the entire partition when conditions are met. When the cardinality of the partition key is low and concurrency is high, it combinesrange distributionoptimization to enable concurrent execution.System variable loading optimization
The new version optimizes the performance of loading system variables during the initialization of
inner sessionfrom multiple dimensions. In the old version, each time a newinner sessionwas created, over 800 system variables needed to be loaded with their default values, which introduced performance bottlenecks such as string comparisons, length calculations, and type conversions. The new version adopts a hybrid loading strategy, preloading 6 core variables (such ascharacter_set_connectionandsql_mode) and over 50INFLUENCE_PLANvariables that affect query plans, while lazily loading other variables as needed. It also optimizes the underlying implementation by usingidindexing to accelerate lookups and implementing a dedicated fast type converter to avoid the overhead of the general framework. In complex query scenarios, the percentage ofinit_session_infodropped from 34% to 3%, and the execution time decreased from 100 seconds to 55.889 seconds, demonstrating significant optimization.Performance optimization for Hive/Iceberg external table queries
The new version accelerates predicate filtering on string columns in Hive/Iceberg tables that use dictionary encoding. By filtering at the dictionary code level, it avoids unnecessary string decoding overhead. It also supports a memory optimization mechanism that copies dictionary values to shared memory for multiple rows. It provides both white-box and black-box filtering modes. In test scenarios such as TPCH Q19, integer comparisons at the dictionary code level significantly outperform string comparisons, especially in high-selectivity filtering scenarios where the number of unique values in a column is much smaller than the total number of rows.
Support for Bloom Filter in external tables for Parquet/ORC files
The new version introduces support for Bloom Filter indexes in external table queries to accelerate equality predicate filtering on Parquet and ORC files. By leveraging the built-in Bloom Filter metadata of the file format, it can quickly determine whether a target value may exist in the current data block before reading data when the query condition is an equality predicate such as
col = 'xxx'. The implementation uses a lazy loading strategy, loading the Bloom Filter only when themin-maxrange cannot determine the filtering. This optimization is particularly effective for equality queries on high-cardinality columns (such as UUIDs and usernames), significantly reducing unnecessary I/O reads.Parallel parsing of CSV files in external tables
V4.6.0 introduces parallel parsing of single CSV files to improve the efficiency of large-scale data imports and external table queries. It implements intelligent file splitting by combining the gambling boundary method with full-scan boundary detection. When the number of files is less than or equal to twice the number of threads, the system automatically selects a parallel parsing strategy based on the complexity of the data format. New
FORMAToptions (PARALLEL_PARSE_ON_SINGLE_FILE,PARALLEL_PARSE_FILE_SIZE_THRESHOLD, andMAX_ROW_LENGTH) provide flexible control over parallel behavior. By leveraging the multi-state conversion mechanism of theGIoperator and SQC coordination, it achieves efficient parallel processing while maintaining data integrity. The new version also supports theSNAPPYcompression algorithm for CSV files in external tables.Optimization for downloading large partitions from ODPS (MaxCompute) across multiple machines
To address the issue of high data volume in a single partition and bandwidth limitations during single-machine downloads in ODPS external table imports, V4.6.0 implements parallel downloading of the same partition across multiple machines. During the QC phase, it identifies large partitions by parallelly obtaining their physical sizes. In the SQC phase, it dynamically and finely splits tasks. By utilizing the transaction consistency feature of the ODPS Tunnel API, it deserializes sessions in SQC to enable parallel reading across multiple machines. The task scheduling algorithm is optimized to prioritize large partitions and ensure load balancing, reducing initialization overhead while maintaining compatibility with single-machine parallel processing. This optimization effectively addresses partition skew scenarios, fully utilizing cluster network bandwidth to enhance import performance.
Transaction and storage engine enhancements
Optimized small transaction dump status
For high-frequency small transaction scenarios, the transaction status refill mechanism is optimized to speed up data cleanup. This avoids long transactions blocking data recycling: even when some transactions are still uncommitted, the system can clean up data from completed transactions, improving read and write performance.
Migration and replication support for partition splitting and macroblock reuse
In scenarios such as migration, replication, and physical recovery, if the source side has multiple versions of major SSTables or macroblocks that can be reused after partition splitting, the destination side still needs to copy them repeatedly, leading to wasted space and bandwidth. V4.6.0 optimizes the macroblock reuse capability during migration and replication. By building a reuse mapping table, scheduling the splitting tablets in sequence, and modifying the macroblock copy RPC, the destination side can reuse the major macroblocks from the source side and shared macroblocks between multiple versions of major SSTables. The new
TabletGroupGeneratorDagdynamically schedules dependencies, ensuring that the source side is migrated before the destination side. This feature reduces the amount of macroblock copying, improves migration efficiency, and reduces storage space usage.Append-only tables
append-onlyis a new table update model in OceanBase Database, followingpartial-updateanddelete-insert.append-onlytables only allowINSERToperations and prohibitUPDATE,DELETE, and other operations that may involve updates, such asREPLACE INTO. Data written to anappend-onlytable cannot be modified. This feature is suitable for scenarios requiring immutable data, such as financial transaction records, logs, and IoT data monitoring. It meets the requirements for audit compliance, efficient appending, and high-performance queries. To create anappend-onlytable, use theMERGE_ENGINE = append_onlysyntax. The write order does not require rowkeys to be incrementally increasing, and you can combine it with TTL (Time To Live) for expired data cleanup.TTL (Time To Live)
The TTL (Time To Live) feature is a new data expiration deletion feature based on the merge mechanism. It defines expiration strategies for SQL-mode relational tables. Unlike traditional KV-mode tables, which use background threads to write
DELETErows, TTL asynchronously deletes data during merges without occupying Clogs or write bandwidth. Deletion information is only recognized internally by OceanBase Database and is not synchronized to downstream systems. Users can configure theenable_ttlswitch, set thettl_duty_timetime point, or manually trigger a TTL task. When a TTL task is triggered, the system synchronizes the deletion rules to the MDS through a transaction. During merges, data is filtered and physically deleted based on the recycle version number. TTL supportsDELETE_INSERTandAPPEND_ONLYtable modes. Currently, it only supports expiration strategies based on the hidden columnora_rowscn.
PL engine enhancements
Performance optimization of the system package execution framework
The new version optimizes the system package execution framework to avoid the need to store and deeply copy PL symbol tables in
paramstore. In a test scenario with 1 million data records, the execution time forselect count(utl_raw.CAST_TO_RAW(c1)) from t2decreased from 6.05 seconds to 5.17 seconds (a 14.5% improvement), and the execution time forselect count(dbms_lob.getlength(c1)) from t2decreased from 4.89 seconds to 3.89 seconds (a 20.4% improvement).Performance optimization of the PL execution framework
The new version optimizes the
column convertoverhead in scenarios where PL input parameters are of basic types. It supports precomputing parameter default values and shallow copying constant parameters. By eliminating memory contexts, avoiding schema guard refreshes, canceling column conversions, and preallocating parameter storage memory, theinner callperformance improves by 3%-11%. Optimizations such as avoiding unnecessary deep copies by storing results after checking returned rows and caching tenant configurations in the top-level PL context improve loop execution performance. Movingsys_var_in_pl_cache_str_to the session-level cache eliminates hotspots caused by repeated serialization during PL cache retrieval. Optimizing the execution frameworks for multiple layers ofpackage routineandsubprogram routineimproves performance by 24%-39% through top-level PL context caching and avoiding unnecessary execution context creation. Caching execution environment information at the session level avoids repeated system variable loading, improving performance by 5%. Optimizing the execution frameworks for multiple layers of PL improves performance by 44%-75% through top-level PL context caching, disabling end-to-end tracing, optimizing expression cleanup logic, and sharingvslice allocatoracross multiple layers of PL.Performance optimization of expression evaluation within PL
The new version resolves the issue of long paths for reading and writing
packagevariables, refines data type conversion logic, and eliminates unnecessary conversions in assignment statements, default value generation, and deep copies of complex types. It also removes theclear expr flaglogic during expression evaluation. This results in a 12%-26% performance improvement in expression evaluation within PL.Enhanced support for
FORALLin PL/SQLThe new version enhances the functionality of the
FORALLstatement in PL/SQL, supporting advanced use cases such as dynamic SQL batch execution,indices of collection, andvalues of index_collection. It also supports complex types (such as PL extension types, JSON, and geometric types) as members ofcollection. This optimization significantly improves the performance of batch DML operations in PL/SQL, allowing loop execution scenarios to leverage thearraybindingfeature of the SQL engine for efficient batch processing.Performance optimization of the
dbms_randomsystem packageThe new version optimizes the performance of the
dbms_randomsystem package through eight improvements: eliminating unnecessary expression evaluations, simplifying package variable access paths, optimizing the execution framework, and eliminating unnecessary column conversions and UDF calls. In some scenarios, this results in a 42% performance improvement.Support for query range extraction for
non-deterministicUDFs with constant parametersThe new version supports extracting query ranges for
non-deterministicUDFs with constant parameters, enabling index filtering for scenarios wherecol = udf(). This significantly improves SQL computation performance in these scenarios.Asynchronous commit support for PL transactions
The new version introduces asynchronous commit logic at the protocol layer to manage PL transaction commit status, supporting parallel processing of transaction commits and PL execution in distributed transaction scenarios. This achieves asynchronous transaction commits within stored procedures, effectively addressing thread occupancy issues caused by frequent synchronous commits at the PL layer, thereby improving system throughput. In complex transaction scenarios, performance can improve by up to 39%, providing significant performance benefits for high-concurrency PL workloads.
Streaming cursor prefetch optimization
The new version reduces the overhead of SQL auditing and cursor environment switching by prefetching data in batches in memory. This optimization significantly improves the performance of streaming cursors, with performance improvements of 44%-54% when cursor data volume is greater than or equal to the batch cache size (100 rows). This optimization requires enabling the
_enable_streaming_cursor_prefetchparameter.Support for reading uncommitted transactions in streaming cursors
In Oracle mode, reading uncommitted data from streaming cursors and then committing the transaction causes the cursor to become invalid, which is incompatible with Oracle behavior. The new version optimizes this behavior by checking whether uncommitted data was read when the streaming cursor is opened and marking it accordingly. When the transaction is committed, affected streaming cursors are converted to non-streaming cursors, and data is
loadedinto local memory starting from the current cursor position, avoiding the "fetch out of sequence" error caused by incompatibility with Oracle behavior.Trigger performance optimization
The new version optimizes the trigger execution path by introducing a direct method for retrieving trigger package bodies and adding the
get_cached_package_body()method to avoid unnecessary package specification lookups. This reduces schema operation overhead during trigger execution, thereby improving trigger performance.Performance optimization of dynamic SQL execution
The new version optimizes dynamic SQL precompilation by prioritizing PS cache lookups for precompiled statements, avoiding repeated execution plan creation, and adding parameter and expression counters to improve cache matching accuracy. This enhances dynamic SQL execution performance.
Parameterization support for dynamic SQL within PL
In older versions, only
CALLstatements were parameterized in PL, and internal SQL statements were not. Due to different parameter literals, internal SQL statements had different SQL IDs, preventing them from utilizing the Plan Cache. This led to frequent cache plan evictions and increased hard parsing and RT (response time) for PL requests. The new version uses Fast Parser technology to quickly identify constants and replace them with "?" placeholders, ensuring consistent SQL ID generation for the same logic. It also handles special syntax likeIS NULLto prevent incorrect parameterization, ensuring syntax correctness. This optimization enables effective reuse of Plan Cache for internal SQL statements in high-concurrency scenarios, reducing plan generation overhead and improving performance.Rewriting
select into from dualstatements to PL assignment statementsThe new version supports rewriting
select into from dualstatements in PL/SQL to assignment statements to avoid SQL execution overhead. This optimization is controlled by the system variablePLSQL_CAN_TRANSFORM_TO_ASSIGN. This optimization improves performance by approximately 84%, significantly reducing the execution cost of such statements.
Compatibility and feature enhancements
Session-level temporary tables in MySQL mode
V4.6.0 introduces session-level private temporary tables in MySQL mode, supporting creation, modification, deletion of temporary tables, and index creation based on temporary tables, enhancing the smooth migration from MySQL.
Performance optimization for global temporary tables in Oracle mode
V4.6.0 restructures transaction-level and session-level global temporary tables in Oracle mode, introducing
temporary tabletfor storing temporary table data. This data is released at the end of the session, optimizing performance.MySQL Recursive Union Distinct
MySQL 8.0 supports Recursive Union All and Recursive Union Distinct in Common Table Expressions (CTEs). OceanBase Database supports Recursive Union in MySQL mode since V3.2.3, but only Recursive Union All. V4.6.0 extends compatibility with MySQL Recursive Union Distinct, ensuring output data uniqueness. It also enhances Recursive Union All, supporting data persistence when memory is insufficient.
Prepare-phase parameters under the combined PS protocol in Oracle mode
With the combined PS protocol, the new version supports parsing during the Prepare phase when parameters are present, addressing type inference issues in Oracle mode when parameter information was unknown. Upgrades keep legacy behavior by default; enable
_prexec_prepare_with_paramswhen you need this feature.Partition exchange between primary and subpartitioned tables
The new version supports partition exchange between a single-level partitioned table and a two-level partitioned table. The target table must be a two-level partitioned table whose first-level partitioning is
RANGEorLIST. The source table must be a single-level partitioned table, and its partitioning must exactly match the corresponding subpartition definitions of the target table. Use theALTER TABLE target_partition_table_name EXCHANGE PARTITION first_level_partition_name WITH TABLE origin_partition_table_name WITHOUT VALIDATION;syntax for flexible partition management, suitable for fast large-scale data migration. Note that statistics become invalid after the exchange and must be regathered.JSON column conversion support
The new version supports converting JSON columns to
text,mediumtext, orlongtext, simplifying O&M tasks.Enhanced DBMS_SCHEDULER in Oracle mode
The new version adds
BYMONTH,BYYEARDAY,BYMONTHDAY,BYDAY,BYHOUR,BYMINUTE, andBYSECONDto therepeat_intervaloption inDBMS_SCHEDULER, offering flexible job scheduling. It also supports Chinese names for string parameters in theGBKcharacter set.Extended spatial functions in Oracle mode
Oracle mode introduces
SDO_CONTAINS,SDO_ANYINTERACT,SDO_AREA, andSDO_UTIL.GETVERTICESfor common GIS operations like spatial containment, intersection, area calculation, and vertex extraction. These functions are suitable for spatial data analysis, geospatial calculations, map rendering, and geometric quality checks.Enhanced materialized views
The new version significantly enhances materialized view capabilities, including:
- Incremental refresh now supports outer joins,
UNION ALL, and non-aggregated single-table mode. - Materialized view logs support column addition and view renaming.
- Nested materialized view cascading refresh resolves multi-level dependency issues.
- Supports
AS OF PROCTIME()to skip refreshing a dimension table during incremental refresh, improving efficiency. Incremental materialized views with multi-table joins can reference ordinary views that are declared withAS OF PROCTIME(). - Aggregated incremental refresh now supports
min/maxfunctions and non-basic column parameters. - Supports nested incremental refresh and
LEFT JOINaggregation for non-key-based materialized views. - Automated MLOG management, automatically creating and cleaning redundant MLOG tables.
- Full refresh materialized views now support UDFs.
- New
Minimalmode switch improves columnar wide table refresh performance. - Added range predicate comparison logic (
<,<=,>,>=,=,BETWEEN AND) for materialized view rewriting, determining if the query range is fully contained.
These enhancements improve flexibility and performance for complex business scenarios.
- Incremental refresh now supports outer joins,
Iceberg V3 format support
V4.6.0 extends support for Iceberg V3, enabling Time Travel queries to access historical data via branch names, tags, snapshot IDs, or timestamps, even after data deletion or modification. It also supports Deletion Vector, allowing retrieval and removal of marked-deleted rows during Iceberg table reads.
Iceberg Insert support
The new version supports writing data to Iceberg tables using
INSERT INTO catalog.db.table SELECT ...syntax, with file typesparquetandorc(controlled bywrite.format.default, defaulting toparquet). Supports serial and parallel write modes viaparallelhint. Currently supports only V2 Iceberg tables, not partitioned tables, and does not collectcolumn size,min/max, ornull countsstatistics. Maintains data type consistency withselect into parquet/orcfor data lake integration.REST Catalog protocol support
Adds REST Catalog support, allowing standardized HTTP REST API access to Iceberg tables. OceanBase Database includes an HTTP client connection pool with configurable maximum connections, timeout, and keep-alive settings, reducing connection overhead. Provides monitoring view
[G]V$OB_EXTERNAL_CATALOG_CLIENT_POOL_STATfor real-time client pool status. UseCREATE EXTERNAL CATALOGto createRESTtype catalogs, configurable with server address and authentication methods.Parquet/ORC file support for complex data types
The new version supports reading and writing complex data types (
Array,Map,JSON) in Parquet/ORC files. Files containing arrays (Array), maps (Map), or nestedJSONcan be directly imported via external or temporary tables and exported in the same format.
System performance improvements
Performance optimization in large-scale domestic machines
Optimization of CPU hotspots in typical scenarios for domestic CPUs such as Hanguang and Kunpeng, including PDML and replication table query performance. In a large 256-core environment, the high-concurrency PDML performance improves by 25%, and memory consumption in non-PDML scenarios decreases. Business query performance when connecting to followers for replication tables improves 14x, increasing from 20,000 QPS to 270,000+, with a gap of less than 10% compared to the 300,000 QPS in direct leader queries. In a small 32-core environment, the same optimizations also result in performance improvements.
Memory allocation for library functions is optimized.
We have designed a lock-free memory allocator based on the
mimallocalgorithm. It optimizes the performance of small memory allocations when OceanBase Database integrates various external libraries. This allocator uses a fragmented memory block design, a delay-idle recovery strategy, and optimizes lock contention. It breaks through the performance of local small object allocation in threads. The core architecture includes a thread-local allocator, size-class-based management, and fast/slow path allocation mechanisms. The fast path provides nearly O(1) performance, while the slow path handles memory released across threads in batches. In a single-threaded scenario, its performance improves by about 168% compared to that ofglibmalloc. In a 16-threaded local allocation scenario, the performance improves by about 180%. This supports more efficient memory management for large-scale and high-concurrency scenarios.
Enhancements to the OBKV data model
OBKV supports reading from read replicas and from the same region's data centers.
OBKV adds weak reads and closest-IDC reads: you can read from a non-leader replica to reduce leader load and prefer the replica in the nearest IDC to cut latency. This applies to OBKV-HBase 2.5.0+ and OBKV-Table 2.3.0+ clients. Use the
follower_firstorfollower_onlyrouting strategy, configured globally or per statement in the client configuration or code.TTL-based tasks support index scans and rate limiting.
The TTL task now supports index scanning, allowing users to accelerate the TTL data cleanup process by specifying indexes. Traditional primary key scans require reading complete rows to determine if data has expired, but local indexes including the expiration column can significantly reduce I/O overhead and improve scan speed. This feature is enabled through the
TTLScanIndexparameter inKV_ATTRIBUTES, supporting both Table and HBase models. The index must be a local index containing TTL-related columns, and global indexes are not supported.At the same time, OBKV TTL tasks feature new throttling capabilities, allowing you to set the
kv_ttl_delete_max_opsparameter to control the maximum throughput of delete operations for TTL tasks per second. When a tenant has limited resources or high request pressure, TTL tasks may impact system performance. Throttling helps effectively manage CPU and clog write bandwidth. The default value0indicates no throttling, while setting it to a value greater than 0 allows precise control over the deletion rate. This feature, combined with thettl_thread_scoreparameter, ensures each partition task has a throughput limit, ensuring proper resource allocation.Enhanced monitoring for OBKV-HBase
The new version optimizes the monitoring system for the OBKV-HBase server. Main improvements include: differentiating monitoring metrics for HBase Get and Scan operations, and distinguishing between single
Put/Deleteand batch operations. It also adds seven new types of monitoring item, namelyTABLE_API_HBASE_CHECK_AND_MUTATE,TABLE_API_HBASE_SCAN,TABLE_API_HBASE_GET,TABLE_API_HBASE_BATCH_PUT,TABLE_API_HBASE_BATCH_DELETE,TABLE_API_HBASE_BATCH_GET, andTABLE_API_FAILED_OP. Clients can use theMetricsExporteror JMX interface to obtain detailed performance metrics such as QPS, RT, and P99 for more accurate performance monitoring and analysis. This feature requires the new version of OBKV-Table 2.3.0+ and OBKV-HBase 1.5.0+/2.5.0+ clients.Hotspot key query optimization in OBKV-HBase
OBKV-HBase queries perform full-scan strategies. All data under
qualifiersis scanned and then filtered. This process results in a large amount of redundant scan and processing overhead, especially when querying for a wide row or a multi-version row that involves only a fewqualifiers. This causes significant delays. In V4.6.0, theHBASE_HTABLE_HOTKEY_GET_OPTIMIZE_ENABLEparameter is introduced to enable the optimization feature for hotKey Getoperations when executing theGetoperation. When this feature is enabled, therescantechnique is applied on the server side to optimize queries involving hot keys, thereby improving query performance in hot key scenarios.
Enhanced OBCDC
OBCDC supports incremental data synchronization after table-level restoration.
Table-level recovery is an offline DDL operation across tenants. In traditional solutions, OBCDC cannot obtain the source table information associated with the hidden table, leading to synchronization failures. The new version introduces the following improvements: filters the DDL operations during table-level recovery to prevent abnormal termination of OBCDC; and at the end of table-level recovery, the observer side records the final table name as a
TABLE_RECOVER_ENDtype DDL operation. OBCDC uses this final table name for black and white list matching. This feature applies to incremental data synchronization after table-level recovery is completed, ensuring the stable operation of OBCDC in the table-level recovery scenario.OBCDC support for virtual generated columns
V4.6.0 supports computing virtual generated column values in real time on the OBCDC side in MySQL-compatible mode, so downstream systems stay compatible with MySQL Binlog Service. The feature is controlled by the
enable_output_virtual_generated_columnparameter. When it is enabled, OBCDC uses the SQL module calculation APIs together with virtual generated column dependency metadata from the data dictionary (including session-fixed variables and column default values) to compute and fill in virtual generated column values during formatting. This preserves correctness and MySQL Binlog behavior.
Security and reliability enhancements
Non-intrusive column encryption
Added sensitive column data protection feature, allowing database administrators to create column-level data protection rules on tables. When a user executes a
SELECToperation and the projected columns include an encrypted column, the server checks if the user has plaintext access permissions. If yes, the encrypted column's plaintext results are returned normally. If not, the data is encrypted according to the data protection rules before being returned to the user.Support for caching_sha2_password authentication
Added support for the
caching_sha2_passwordauthentication method in both MySQL and Oracle modes. This method can be specified when creating users, providing a more secure password protection mechanism. It requires ODP 4.4.0 for compatibility.Optimized read from standby replicas
The new version optimizes the transaction state inference mechanism. Instead of each replica independently collecting participant states, the leader coordinator now uniformly collects and caches these states, significantly reducing message interactions. Additionally, a retry mechanism is introduced to automatically forward requests to other replicas when a replica falls behind, preventing blocking or incorrect transaction state inference.
Enhanced backup verification
V4.6.0 introduces physical backup verification to check the integrity and availability of backup and archive data. This feature allows users to verify backup sets (Backup Set) and archive logs (Archive Log) without performing actual recovery operations, checking for issues like missing files, physical data corruption, or logical inconsistencies. It offers two verification levels: BASIC (file list integrity check) and PHYSICAL (physical data correctness and logical check), catering to different verification needs. Supports verification of the entire backup path, specific backup sets, and specified archive pieces, with cross-cluster path verification capabilities.
Resource limits for large queries
Addressing the issue of long-running large queries consuming excessive CPU resources and affecting small query responses, the new version further optimizes the large query resource limits feature. The older version already supported estimating large query resource group retries based on plan cache historical durations before execution. V4.6.0 introduces a new trigger path, monitoring actual execution durations during runtime. If the threshold is exceeded, threads automatically switch to the large query resource group for throttling. The large query resource group uses
cgroupto control CPU usage, defaulting to 30% to prevent large queries from starving new requests. When the number of threads in the large query resource group exceeds half of the tenant's maximum thread count, new large queries are no longer accepted, ensuring the default resource group has sufficient resources to handle small queries, maintaining coexistence with prioritized small query performance. The_large_query_cpu_quota_adjustment_stepparameter can adjust the CPU throttling step size, with larger values resulting in faster throttling but potential QPS oscillation, and smaller values providing stable QPS but slower throttling. Please test carefully before use.
Operation and maintenance enhancements
Enhanced SQLSTAT capabilities
To address the issue of SQL statements without execution plans consuming excessive memory due to shared memory with the Plan Cache, we restructured the SQLSTAT storage architecture. We implemented a three-tier storage scheme to better manage memory usage. This scheme includes:
- Cache the execution statistics of the last N runs on the physical plan.
- Global SQLSTAT is managed independently of its lifecycle.
- Use batched updates to reduce hash table access overheads.
Supports the independent eviction mechanism of SQLSTAT to avoid interference with the Plan Cache. Introduced a background thread that regularly monitors memory usage and evicts the least frequently used SQLSTAT entries when the memory limit is exceeded. Retained the physical plan pointer cache for SQLSTAT, accumulating execution statistics into a global HashMap at completion. Addressed the issues of incomplete data statistics and memory contention, ensuring the accuracy and continuity of SQL execution performance statistics.
ASH data integrity enhancement
In new versions, we introduce a compressed ASH data integrity enhancement mechanism to address the issue of ASH buffer overflow and loss of diagnostic information caused by a large accumulation of requests in the network queue. This mechanism uses tenant ID and PCode to aggregate requests in the queue, thereby compressing ASH write operations and preventing each request from writing multiple records. It dynamically detects the write speed using a sliding window average algorithm. If the average write amount over the past N seconds exceeds the buffer capacity threshold, compression is automatically enabled. If it falls below the threshold, compression is automatically disabled. The ASH table now includes a
weightcolumn and an aggregation flag column, ensuring that aggregated column data is completely written to the WR table, preventing sampling distortion. This addresses the problem of losing diagnostic information in high-concurrency scenarios, providing more complete session history data for troubleshooting and enhancing system observability.Refined wait events
Wait events are essential for performance diagnostics. In earlier versions, many internal wait events of OceanBase Database were grouped under a single default wait event, making it difficult to pinpoint specific issues based on wait event names. In this release, default wait events are split out and largely replaced with dedicated wait events for more precise diagnostics.
Plan Cache diagnostics enhancements
The new version provides a systematic diagnostic capability for plan cache misses, simplifying the diagnostic process. It records detailed reasons for plan cache misses during the plan matching process, including 14 issues such as non-existent cache nodes, type mismatches, schema mismatches, and others. This information is stored in the newly added
create_reasoncolumn of the[G]V$OB_PLAN_CACHE_PLAN_STATview. Additionally, new columns such ascache_node_id,pcv_id, andplan_set_idare added to facilitate quick root cause analysis for plan misses and reduce operational complexity.Plan Cache Memory Management Optimizations
This optimization addresses the memory allocation problem for the Plan Cache module by changing the memory judgment standard from
mem_hold(page occupancy) tomem_use(actual usage). In older VOS versions, 8KB pages were used for memory allocation, which can lead to memory fragmentation. When diagnostic objects such as SQL_STAT consume a large amount of memory with fragmentation, even if the actual memory usage is low,mem_holdcan reach its upper limit. This can prevent physical plans from being added to the cache and trigger LRU eviction. This optimization ensures more accurate reflection of the memory size of cache objects and avoids issues with cache mechanism failures caused by memory fragmentation.SQL Audit/Plan Monitor query efficiency improvement
The query performance of the
[G]V$OB_SQL_AUDITand[G]V$SQL_PLAN_MONITORsystem tables is enhanced by introducing late materialization technology. In earlier versions, due to the lack of support forfilter pushdown, all columns had to be projected before filtering calculations. The new version generates pushdown filters during thecode generatorphase. During execution, it first projects and pre-filters the columns involved in filtering before projecting alloutputcolumns. This optimization significantly improves system table query performance in scenarios liketrace_idequality queries.LOB Consistency Check and Repair
If LOB data is stored separately in the main and auxiliary tables, there is a risk of inconsistency after failures. This version provides end-to-end consistency support: scheduled or manual main/auxiliary checks for three anomaly types (auxiliary row missing on the main side, main row missing on the auxiliary side, and length mismatch); the
dbms_lob_managerpackage for repair, job control, and progress; and Resource Manager integration to cap resources for check jobs so background scans do not hurt workloads. New viewsDBA_OB_LOB_CHECK_TASKSandDBA_OB_LOB_CHECK_EXCEPTION_RESULTshow job progress and exception details, addressing query failures and disk growth from inconsistent LOB data.Performance optimization for DROP/PURGE DATABASE operation
Currently, the
DROP DATABASEoperation is executed atomically, during which table locks and writes to internal tables are executed serially. In scenarios with a large number of tables, this can lead to timeouts and failures. In the new version, asynchronous parallel deletion of tables is supported to improve performance. To enable this feature, set the_enable_atomic_drop_databaseparameter toFalse. Once enabled, the database is dropped in a non-atomic way. If deletion fails, tables that were already dropped are not rolled back. Use with caution. ForPURGE DATABASEwith the recycle bin enabled, parallel deletion of tables in__all_recyclebinis also supported. Performance improves substantially—for example, about 7x for 6,000 complex tables (56 seconds) and about 14.5x for 20,000 complex tables (101 seconds) in internal tests.Heterogeneous zones
A tenant can now use up to two different
UNIT NUMvalues across zones, improving operations and scale-out/scale-in flexibility. Previously, every zone of a tenant had to use the sameUNIT NUM; when hardware failed and could not be replaced, you had to lowerUNIT NUMin all zones, widening the maintenance impact. For capacity changes, you can use this capability to work onFollowerreplicas first, then switch the Leader to the zone where scaling has finished, for smoother expansion or shrinkage.Window compaction
Traditional major compactions use a globally consistent snapshot to guarantee consistency, support primary and index table checks and primary/standby verification, and provide strong consistency guarantees. However, the mandatory global merge strategy becomes costly and hard to bound in time as incremental data grows, partition counts reach the millions, or workloads are columnstore-heavy. With large business datasets, merges can also spill into peak traffic windows.
To address this, V4.6.0 introduces the feature of window compaction. The built-in scheduled job
DAILY_MAINTENANCE_WINDOWinDBMS_SCHEDULERenables users to initiate compactions on a scheduled time window (by default, 2:00 to 8:00). This feature initiates compactions only for tables with incremental data or partitions meeting specific criteria, prioritizing tasks based on factors like incremental row count, table popularity, and days since the last compaction. The compactions automatically stop at the end of the specified window. You can enable or disable this feature by setting theenable_window_compactionparameter and control the number of compaction threads using theSET_THREAD_COUNT()procedure. Optionally, you can use a dedicated resource plan to precisely manage the resources consumed during compactions, helping to ensure minimal impact on peak business performance.Optimized deadlock detection
OceanBase Database exposes deadlock history in
CDB/DBA_OB_DEADLOCK_EVENT_HISTORY, but earlier releases did not record SQL forholdernodes on the deadlock cycle, which made diagnosis harder. The new release enriches deadlock diagnostics: for example, it adds thevisitorsession id, the machine, log stream, and tablet where the wait occurs, plus the SQL currently running, the SQL holding the lock, and the lock request time after a deadlock—making deadlock analysis much easier.
Compatibility changes
Changes in product behavior
| Feature | Change description |
|---|---|
| Limitations added to the OBKV-HBase time series model | Starting from V4.4.1, the time series model is supported based on the OBKV-HBase model. However, many HBase interfaces are not effective or meaningful in the time series model. New versions define the scope and behavior limitations of the time series model. The time series model is designed for write-heavy, read-light scenarios such as monitoring, IoT, and vehicle networking. It must be enabled with the distributed switch. This model does not support multiple column families, TimeToLive, and Cell TTL parameters, and some HBase interfaces are disabled. At the data operation level, only core interfaces such as Put, Put, Get, and Scan are supported, and operations like Delete, Append, and Increment are not supported. Get and Scan operations have parameter restrictions, but necessary features such as TimeRange, setLimit, setCaching, and setMaxResultSize are open to meet the query needs of time series data. These limitations ensure the efficiency and stability of the time series model and provide dedicated data access interfaces for time series scenarios. |
Parameter changes
| Parameter | Change type | Change description |
|---|---|---|
| default_delta_format | New | Specifies the default storage format for incremental data when creating a table. In AP load scenarios, the default is encoding, and in other load scenarios, the default is flat. |
| default_skip_index_level | New | Specifies the default Skip Index creation level when creating a table. In AP load scenarios, the default is 1, and in other scenarios, the default is 0. |
| enable_ttl | New | Controls whether to enable the TTL (Time To Live) feature. When enabled, background threads periodically delete expired rows defined by TTL. The default is False. |
| ttl_duty_time | New | Controls the scheduled time for TTL (Time To Live) background tasks. The default is "01:00". |
| default_table_merge_engine | Value change | Adds the append_only option. The merge_engine attribute in create table statements also synchronizes this option. |
| ob_vector_index_active_segment_max_size | New | Controls the maximum memory usage of the vector index Active Segment for a single partition. If exceeded, it will forcibly trigger a freeze persistence operation. The default is 0, indicating system adaptive. |
| ob_vector_index_merge_trigger_percentage | New | Controls the maximum proportion of memory that the incremental segment of the vector index can occupy relative to the corresponding partition's vector index memory. If exceeded, the system will automatically trigger a merge operation with the baseline segment. The default is 20. |
| enable_window_compaction | New | Controls whether to use the window compaction feature for tenants. If True, window compaction is executed daily; if False, tenant compaction is executed daily. The default is False. |
| sql_func_extension_mode | New | Controls whether to enable mainstream compatibility-style functions whose names differ from MySQL, such as ClickHouse. The default is '', meaning off. |
| zone_deploy_mode | New | Controls whether the current tenant is in a homogeneous or heterogeneous zone mode. The default value is HOMO, indicating a homogeneous zone mode. This value can be changed to HETERO to enable heterogeneous zone capabilities, supporting up to two different UNIT NUM types. Current heterogeneous zones cannot be changed to homogeneous zones. |
| enable_gts_standalone | New | Controls whether to enable the GTS service to use a dedicated group of units. Enabling this can improve the performance and stability of the GTS service. |
| enable_mlog_auto_maintenance | New | Tenant-level parameter that controls automatic creation and management of MLOG. The default value is True. |
| mlog_trim_interval | New | Tenant-level parameter that controls the scheduling interval of the MLOG background trim task. The default value is 1d. |
| enable_mv_binlog_minimal_mode | New | Tenant-level parameter that controls whether Minimal mode is enabled for materialized view DML. When enabled, CLOG payload is simplified to speed up materialized view refresh; downstream consumers that rely on full materialized view CLOG may break. Evaluate before enabling. The default is False. |
| px_target_workers_per_cpu | New | Controls the ratio of expected parallel execution threads to tenant CPU count. The default is 8. In older versions, this functionality was implemented by the system variable parallel_servers_target, which specified the absolute value of px threads and required adjustment after scaling, which was not flexible. Therefore, the new version changes to system adaptive dynamic adjustment. |
| px_target_low_watermark | New | Sets the low watermark percentage threshold for parallel threads. When exceeded, it starts to downgrade the parallelism of newly scheduled DFOs to avoid CPU resource exhaustion. The default is 60. |
| px_target_high_watermark | New | Sets the high watermark percentage threshold for parallel threads. The closer the real-time load is to this watermark, the greater the parallelism downgrade. If exceeded, parallelism is reduced to 1. The default is 60. |
| kv_ttl_delete_max_ops | New | Controls the maximum number of operations per second for TTL tasks in a tenant. The default is 0, indicating no throttling for TTL task deletions. |
| system_protected_tenant_parameters | New | Cluster-level parameter that lists tenant-level parameters locked from modification by ordinary users. The default is ''. |
| document_ai_file_max_size | New | Limits the maximum file size that can be processed in one operation for Document AI-related expressions (LOAD_FILE and AI_PARSE_DOCUMENT). |
System variable changes
| System variable | Change type | Description |
|---|---|---|
| ap_query_route_policy | New | Controls replica adaptive routing: OFF disables it; AUTO routes analytical queries to columnstore replicas by query traits and cost; FORCE prefers columnstore replicas. Default is AUTO. Can also be set per query with the opt_param hint. |
| ap_query_cost_threshold | New | Cost threshold for adaptive columnstore replica selection. Default is 200000. Parallel SELECT plans follow rules to use a columnstore replica; for single-threaded rowstore SELECT, if cost exceeds this value, planning uses a columnstore replica instead. |
| ap_query_replica_fallback | New | When TP/AP automatic routing is on, controls whether to fall back to a non-columnstore replica if no columnstore replica is available. Default is True (fall back). |
| caching_sha2_password_digest_rounds | New | This variable specifies the number of SHA256 iterations performed before storing the password hash to disk when using the caching_sha2_password authentication plugin. The default value is 5000. |
View changes
| View | Change type | Description |
|---|---|---|
| CDB/DBA_OB_BACKUP_VALIDATE_JOBS | New | Displays information about backup validation jobs initiated by users. |
| CDB/DBA_OB_BACKUP_VALIDATE_JOB_HISTORY | New | Displays historical information about backup validation jobs initiated by users. |
| CDB/DBA_OB_BACKUP_VALIDATE_TASKS | New | Displays information about backup validation tasks initiated by users. |
| CDB/DBA_OB_BACKUP_VALIDATE_TASK_HISTORY | New | Displays historical information about backup validation tasks initiated by users. |
| CDB/DBA_TABLES | Added column |
|
| [G]V$OB_EXTERNAL_CATALOG_CLIENT_POOL_STAT | New | Obtains the client pooling information of a catalog. |
| CDB/DBA_OB_TTL_TASKS | New | Displays the execution status of TTL tasks. |
| CDB/DBA_OB_TTL_TASK_HISTORY | New | Displays the execution history of TTL tasks. |
| [G]V$OB_HNSW_INDEX_SEGMENT_INFO | New | Provides observability at the Segment level for vector indexes. |
| DBA_OB_VECTOR_INDEX_TASKS | Added column |
|
| DBA_OB_VECTOR_INDEX_TASK_HISTORY | Added column |
|
| CDB/DBA_OB_LOB_CHECK_TASKS | New | Records the progress of LOB consistency checks and recovery tasks. |
| CDB/DBA_OB_LOB_CHECK_EXCEPTION_RESULT | New | Records the exception tables and tablets and the exception type of LOB consistency checks. |
| CDB/DBA_OB_MAJOR_COMPACTION | Added column | Added the MODE column to record the type of daily major compactions. |
| CDB/DBA_OB_DEADLOCK_EVENT_HISTORY | Value Changed |
|
| CDB/DBA_OB_LS | Added column | Added the UNIT_LIST column to describe the UNIT distribution list of a log stream. |
| CDB/DBA_OB_BALANCE_JOBS | Added column | Added the ZONE_UNIT_NUM_LIST and PARAMETER_LIST columns to record each zone's UNIT NUM and balance-related parameters for the tenant. |
| CDB/DBA_OB_BALANCE_JOB_HISTORY | Added column | Added the ZONE_UNIT_NUM_LIST and PARAMETER_LIST columns to record each zone's UNIT NUM and balance-related parameters for the tenant. |
| CDB/DBA_WR_ACTIVE_SESSION_HISTORY | Added column | Added the WEIGHT column to record the weight of an ASH record. |
| [G]V$ACTIVE_SESSION_HISTORY | Added column |
|
| [G]V$OB_ACTIVE_SESSION_HISTORY | Added column |
|
| CDB/DBA_WR_SQLSTAT | Added column | Added the SAMPLE_TIME column to record the time when the SQLSTAT snapshot is written to disk. |
| [G]V$OB_SQLSTAT | Added column | Added the LATEST_ACTIVE_TIME column to record the last active time of a plan corresponding to a sql_id + plan_hash pair on a node. |
| [G]V$OB_SQL_AUDIT | Added column and Semantics Changed |
|
| CDB/DBA_OB_SENSITIVE_RULES | New | Displays the definitions and attributes of sensitive rules. |
| CDB/DBA_OB_SENSITIVE_COLUMNS | New | Displays the column information protected by sensitive rules. |
| CDB/DBA_OB_SENSITIVE_RULE_PLAINACCESS_USERS | New | Displays the users or roles with the PLAINACCESS privilege on sensitive rules. |
| ALL/DBA/USER_OBJECTS | Object Added | Added the display of the SENSITIVE RULE object. |
| ALL/DBA/USER/ROLE_TAB_PRIVS | Object Added | Added the display of the PLAINACCESS privilege at the object level. |
| DBA/USER/ROLE_SYS_PRIVS | Object Added | Added the display of the PLAINACCESS and CREATE SENSITIVE RULE privileges at the user level. |
| [G]V$OB_SINDI_INDEX_INFO | New | Displays the information of the SINDI vector index. |
| [G]V$OB_HNSW_INDEX_INFO | Column name change | The SNAP_INDEX_CNT and INCR_INDEX_CNT columns are updated to SNAP_VECTOR_CNT and INCR_VECTOR_CNT. |
| [G]V$OB_PLAN_CACHE_PLAN_STAT | New column | The create_reason, cache_node_id, pcv_id, and plan_set_id columns are added to display the reason why the current plan is added to the Plan Cache. |
| CDB/DBA_OB_USERS | New column | The plugin column is added to record the name of the plugin used for password hash calculation. |
Syntax changes
| Syntax | Description |
|---|---|
| New ALTER SYSTEM VALIDATE BACKUP/BACKUPSET/ARCHIVELOG_PIECE syntax | Used to initiate backup validation. |
| New ALTER SYSTEM CANCEL VALIDATE BACKUP syntax | Used to cancel a backup validation task. |
| New delta_format table attribute in CREATE TABLE/ALTER TABLE syntax | Used to control the format of incremental data. In AP workloads, the default value is encoding, and in other workloads, the default value is flat. |
| New skip_index_level table attribute in CREATE TABLE/ALTER TABLE syntax | Used to control the data range for generating Skip Index aggregate statistics. The default value is 0.
|
| New REST catalog-related attributes in CREATE EXTERNAL CATALOG syntax | Added the TYPE = 'REST' catalog type to access Iceberg tables through standardized HTTP REST APIs. |
| New SHOW DATABASES FROM catalog and SHOW TABLES FROM catalog.database syntax | Allows users to view the databases in a specified catalog and the tables in a specified catalog.database. |
| New TTL (Time To Live) syntax in CREATE TABLE | When creating a table, you can specify TTL ora_rowscn + INTERVAL 10 DAY BY COMPACTION to set the table to automatically delete data older than 10 days. |
| New Iceberg table time travel syntax | In Iceberg table queries, you can specify {VERSION \| TIMESTAMP} AS OF '<version_identifier>' to query data from a specific snapshot in the past. |
| New UNIT_LIST clause in ALTER SYSTEM CREATE/MODIFY LS syntax | Added a new log stream maintenance syntax for specifying the unit list for log stream distribution in heterogeneous zone mode. |
| New cascading refresh-related parameters in DBMS_MVIEW.REFRESH command | DBMS_MVIEW.REFRESH('mv_name', 'refresh_method', nested=> true, nested_refresh_mode => 'consistent/inconsistent')
|
| New individual/inconsistent/consistent refresh strategy attributes in CREATE/ALTER MATERIALIZED VIEW | CREATE MATERIALIZED VIEW mv1 REFRESH [individual/inconsistent/consistent] AS xxx specifies the refresh strategy when creating or modifying a nested materialized view:
|
| New AS OF PROCTIME() attribute for materialized view base tables | For example, CREATE MATERIALIZED VIEW mv REFRESH FAST ON DEMAND AS SELECT xxx FROM t1 LEFT JOIN t2 AS OF PROCTIME() ON t1.c1 = t2.c1 LEFT JOIN t3 ON t1.c2 = t3.c2 WHERE xxx; allows you to specify AS OF PROCTIME() for the base table when creating a materialized view to skip refreshing this table during incremental refresh. Tables with AS OF PROCTIME() do not need an MLOG. |
| New SENSITIVE RULE-related syntax |
|
| New vector_index hint in vector queries | /*+ vector_index (table_name index_name [ filter_type]) */ specifies the vector index and filter type to use for vector index queries. |
| New type=sindi_sq index type in vector indexes | vector index idx1(c2) with (distance=inner_product, type=sindi_sq, lib=vsag) |
| New ALTER SYSTEM FLUSH SQL AUDIT/PS CACHE command in Oracle mode |
|
| Support for subqueries in WITH CHECK OPTION | Compatible with the behavior of WITH CHECK OPTION with subqueries in Oracle and MySQL. |
| CREATE USER statement with caching_sha2_password authentication |
|
| Syntax for creating a search index |
|
| Added FTS_INDEX_TYPE option for full-text indexes | The FTS_INDEX_TYPE [=] FILTER/MATCH/PHRASE_MATCH option is added to FULLTEXT INDEX in CREATE TABLE, ALTER TABLE ADD FULLTEXT INDEX, and CREATE FULLTEXT INDEX. |
| ALTER SYSTEM MAJOR FREEZE TENANT [=] tenant_name TABLE_ID = table_id syntax | Added syntax for specifying table-level major compactions. |
System package changes
| System package | Change type | Description |
|---|---|---|
| dbms_lob_manager | New | Used to execute LOB consistency detection tasks. It includes multiple subprograms such as CHECK_LOB, CANCEL_JOB, SUSPEND_JOB, RESUME_JOB, and RESCHEDULE_JOB. |
| dbms_daily_maintenance | New | Used to manage window compaction tasks. It includes subprograms such as TRIGGER_WINDOW_COMPACTION_PROC and SET_THREAD_COUNT. |
| dbms_scheduler | New internal task | The DAILY_MAINTENANCE_WINDOW internal task is added. You can use the DBMS_SCHEDULER.DISABLE, ENABLE, and SET_ATTRIBUTE subprograms to manage this task. |
| dbms_resource_manager | New subprogram | The COPY_PLAN subprogram is added. It is used to copy an existing resource management plan and its related resource plan directives. |
| sdo_util | New | The GETVERTICES subfunction is supported. It is used to extract the vertex coordinates of a geometry object (Geometry). |
| sdo_geom | New subfunction | The SDO_AREA subfunction is added. It is used to calculate the area of a two-dimensional geometry object. |
| dbms_workload_repository | New parameter | The SQLSTAT_INTERVAL parameter is added to the MODIFY_SNAPSHOT_SETTINGS subprogram. It represents the SQLSTAT sampling interval in minutes. |
| dbms_vector | New subfunction | The QUERY_RECALL subfunction is added. It is used to calculate the recall rate of an SQL statement. The INDEX_RECALL subfunction is added. It is used to calculate the recall rate of a vector index. |
Function changes
| Function | Change type | Description |
|---|---|---|
| SDO_CONTAINS | New | Determines whether one geometry (geometry A) strictly contains another geometry (geometry B). |
| SDO_ANYINTERAC | New | Determines whether any form of intersection or contact exists between two geometries. |
| MAX_PT | New | Retrieves the maximum partition value that contains data in the current table. Recommended for use in list partitioning scenarios. |
| DATE_TRUNC | New | Used for truncating dates. |
| MONTHS_ADD | New | Used for date addition calculations. |
| TO_DATE | New | Used for converting strings to date types in a specific format. |
| CAST AS BIGINT | New | Used for converting data to the bigint type. |
| ARBITRARY | New | Used for non-deterministically returning any non-NULL value from a set of rows. |
| ISNAN | New | A ClickHouse-compatible extension function for determining whether a numeric value number is "not a number" (NaN). |
| ADDDATE | New | A ClickHouse-compatible extension function for date addition calculations. |
| FORMATDATETIME | New | A ClickHouse-compatible extension function for formatting dates or date times. |
| TOUNIXTIMESTAMP | New | A ClickHouse-compatible extension function for converting a date time expression to a Unix timestamp. |
| EDITDISTANCE | New | A ClickHouse-compatible extension function for calculating the edit distance between two strings. |
| STDDEVSAMP | New | A ClickHouse-compatible extension function for returning the sample standard deviation of an expression. |
| UNIQ | New | A ClickHouse-compatible extension function for approximately calculating the number of distinct values in a column. |
| VARSAMP | New | A ClickHouse-compatible extension function for returning the sample variance of an expression. |
| ANY | New | A ClickHouse-compatible extension function for selecting any value from a set of values. |
| GROUPCONCAT | New | A ClickHouse-compatible extension function for concatenating the values of multiple rows in a group into a single string. |
| LAGINFRAME | New | A ClickHouse-compatible extension function for returning the value of expr from offset (default is 1) rows before the current row within the current window frame. |
| LEADINFRAME | New | A ClickHouse-compatible extension function for returning the value of expr from offset (default is 1) rows after the current row within the current window frame. |
| MD5_CONCAT_WS | New | Concatenates multiple strings into one string and returns the MD5 hash of the concatenated string as the result. |
| LOAD_FILE | New | Reads external storage files into the database via a temporary external table interface, supporting local files, HDFS, OSS, and S3 files, and returning corresponding BLOB data. |
| AI_PARSE_DOCUMENT | New | Invokes a specified visual understanding/OCR model to identify and parse document data (e.g., PDF) into a specified text format. |
| AI_SPLIT_DOCUMENT | New | Splits the content of a document in a specified format and outputs a fixed-format table to record the split content. |
Upgrade notes
Non-shared storage mode
- You can smoothly upgrade from V4.4.1 and earlier versions of the V4.4.x series to V4.6.0.
- You can smoothly upgrade from V4.3.5 BP2 and earlier versions of the V4.3.x series to V4.6.0, but only in POC and test environments. This upgrade may carry some risks, and if the upgrade fails, the environment may need to be rebuilt. We recommend that you upgrade to V4.4.1 first and then to V4.6.0.
- You can smoothly upgrade from V4.5.0 to V4.6.0.
- You cannot upgrade from V4.2.x or earlier versions to V4.6.0.
- You cannot upgrade from V4.4.2 to V4.6.0.
Shared storage mode
- You cannot upgrade from a lower version of the shared storage mode to V4.6.0.
Recommended versions of components and tools
| Component | Version |
|---|---|
| ODP | ODP 4.3.6.2
NoteIf you use the |
| OCP | OCP 4.4.2 |
| ODC | ODC 4.4.2 |
| OBCDC | OBCDC 4.6.0 |
| OMS | OMS 4.3.2 BP1
NoteIf you do not need incremental data from OceanBase Database V4.6.0, use OMS V4.3.2 BP1. If you do need that incremental data, apply a hot patch to Store. |
| Binlog | Binlog 4.3.6.3 |
| OCCI | OCCI 1.0.6 |
| OBCI | OBCI 2.1.1.2 |
| ECOB | ECOB 1.2.1 |
| OBClient | OBClient 2.2.13 |
| LibOBClient | LibOBClient 2.2.13 |
| JDBC | JDBC 2.4.17 |
| ODBC | ODBC 2.0.9.8 |
| .NET | .NET 1.0.0 |
| Import and export | ob-loader-dumper 4.3.5 |
