OceanBase Database Enterprise Edition V4.6.0|V4.6.0|OceanBase Database| docs|Distributed Database

V4.6.0

Version information

Release date: April 10, 2026
Version: V4.6.0
RPM version: oceanbase-4.6.0.0-100000132026040922

Version overview

OceanBase Database V4.6.0 is a major version upgrade designed for modern data processing.

Comprehensive upgrade of AI hybrid search capabilities

This release introduces a native SQL hybrid search interface that supports multi-modal integrated queries across vector, full-text, and scalar data. It adds match phrase search and search index, and refactors the hybrid search execution framework. Additionally, this release significantly optimizes both the querying and building performance of full-text indexes, delivering a high-performance, high-precision hybrid search experience for scenarios such as AI knowledge bases.
Enhanced vector database features

This release optimizes the performance of HNSW indexes in incremental scenarios, extends IVF indexes to support tables without primary keys, and improves large-scale clustering results using HGraph. It also introduces recall rate evaluation for vector indexes and sparse vector optimization. Combined with Document AI processing capabilities, this release builds an end-to-end "RAG all-in-one" knowledge base solution.
Improved HTAP mixed workload capabilities

This release implements intelligent routing and strong consistency reads for columnstore replicas, and introduces automatic partition splitting for columnstore tables. It optimizes the PX parallel plan queuing and adaptive task partitioning mechanisms, and supports incremental data encoding as well as skip index, providing more balanced and efficient analytical processing capabilities for mixed workloads.
PL engine performance optimization

This version improves system package execution efficiency and supports reusing Plan Cache for dynamic SQL inside PL. It enhances FORALL batch execution, UDF index lookups, asynchronous transaction commit, and streaming cursors, significantly improving stored procedures and complex business logic and providing stronger support for high-concurrency OLTP workloads.
Improved compatibility and functionality integrity

This version supports MySQL Session temporary tables and optimizes the performance of global temporary tables in Oracle. It enhances SQL capabilities with features such as recursive CTE, partition exchange, and materialized views. Additionally, it supports Iceberg V3 format and V2 write operations, provides the REST catalog protocol, and supports complex data types. This offers a more comprehensive solution for enterprise data integration and analysis.
Optimized system performance

The performance of large-capacity domestic machines is enhanced. The performance of PDML and replica table queries is optimized, achieving a 14x performance improvement in high-concurrency scenarios. A lock-free memory allocator is introduced, leading to a 168% performance improvement in single-threaded memory allocation and significantly enhancing the overall system performance.
Enhanced OBKV data model capabilities

This release adds weak read routing to the nearest IDC, TTL task index scanning, and rate-limiting optimizations, and improves HBase monitoring metrics. It also supports low-level put operations for the time-series model and optimizes hot key queries, providing more efficient data access and O&M monitoring capabilities for time-series and wide-table scenarios.
Enhanced OBCDC features

This release supports incremental data synchronization after table-level recovery to ensure the stable operation of OBCDC. Additionally, it supports the real-time calculation of virtual generated columns to ensure downstream MySQL Binlog compatibility, improving the reliability and consistency of data synchronization.
Comprehensive improvements in security and reliability

This release introduces non-intrusive column encryption and the caching_sha2_password authentication mechanism, and optimizes the standby database read feature to reduce message interactions. It also enhances the integrity verification of backup data and improves resource limits for large queries to ensure service quality when large and small queries coexist, providing more comprehensive data protection and stability guarantees for core enterprise businesses.
Enhanced O&M and management capabilities

This release refactors the SQLSTAT and ASH architectures to ensure complete and accurate monitoring data. It refines wait events, optimizes Plan Cache memory management and diagnostics, and improves LOB consistency verification capabilities as well as DROP/PURGE DATABASE performance. Furthermore, this release adds support for heterogeneous zones and introduces the window compaction feature to optimize compaction performance, providing smarter and more efficient O&M guarantees for large-scale enterprise production environments.

Key features

Enhanced hybrid search capabilities

Hybrid search SQL syntax

Starting from V4.4.1, you can use the DBMS_HYBRID_SEARCH system package to perform hybrid search on vector, full-text, and scalar indexes. In V4.6.0, a more user-friendly SQL hybrid search interface is introduced, supporting native SQL syntax for multimodal fusion search. The new SELECT ... FROM HYBRID_SEARCH(TABLE table_name, DSL_STRING) syntax allows you to access the DAS hybrid search framework through pure SQL, providing better query performance. In heap tables, you can perform hybrid combinations of vector search (KNN), full-text search, scalar filtering (term/range/terms), and JSON/array expression queries. It supports various fusion algorithms, including weighted fusion, RRF (Reciprocal Rank Fusion), and WRRF (Weighted RRF). It also supports multi-vector queries, complex Boolean combinations (must/should/filter/must_not), and min_score score filtering. Additionally, it supports score normalization for multi-query results.
Match Phrase feature in full-text indexes

To meet the short phrase search requirements in knowledge base scenarios, the new version supports the Match Phrase feature. This feature maintains the position arrays of tokens within documents in the inverted index auxiliary table, enabling precise phrase matching and fuzzy matching with tolerance distance (slop) in the full-text index. It supports the match_phrase syntax similar to Elasticsearch, including the slop parameter for controlling the tolerance of token distances. It also maintains compatibility with the old MATCH AGAINST PHRASE MODE syntax while improving performance. The new FTS_INDEX_TYPE=PHRASE_MATCH index type option is available, with position information storage enabled by default but can be disabled to save storage space. During queries, a dedicated Phrase Query iterator and slop calculation algorithm are used for pre-filtering optimization, suitable for document search scenarios with higher phrase weights after algorithm processing in RAG workflows, enhancing the accuracy and efficiency of hybrid search.
Search Index feature

To meet the efficient search requirements for complex types like JSON/ARRAY/MAP in hybrid search scenarios, the new version supports the Search Index feature. Compared to traditional function indexes and multi-value indexes, Search Index supports equality, inclusion, and range queries on all paths of JSON fields through a single index, addressing performance issues in Ad Hoc queries and schema-less scenarios. Search Index is created using the CREATE SEARCH INDEX syntax and allows path filtering with include_paths/exclude_paths and type filtering with include_types/exclude_types. The new PICK JSON_TYPE syntax (e.g., pick json_number/pick json_string) enables precise filtering of JSON scalar types, optimizing query accuracy. The index uses an inverted table structure and supports Index Merge for combined queries. Initially, it is only supported for heap tables, providing high-performance indexing capabilities for flexible JSON queries in knowledge base platforms, enhancing query efficiency in hybrid search scenarios.
Optimized hybrid search execution framework

The new version provides a unified dynamic pruning execution framework for advanced hybrid search scenarios like RAG knowledge bases. This framework supports unified docid pruning and block max score pruning across full-text, scalar, and multi-value indexes. By restructuring into composable search operators, it offers excellent extensibility and supports multi-layer Boolean query semantics described with ES DSL. Adaptive execution strategies are introduced, including cost-driven primary iterator selection, data prefetch width optimization, sort/lookup strategy selection, and switching between sparse pruning and vectorized execution strategies, enhancing hybrid search performance. It also provides hybrid search diagnostics based on query profiles, supporting partition-level profile display of operator execution trees and key statistics to help identify performance bottlenecks.
Optimized full-text index query performance

The new version has been deeply optimized for full-text search TopK and incremental data scenarios. By implementing a lightweight advance_scan mechanism at the storage layer, optimizing BM25 expression calculation parameters, and improving incremental data query algorithms, the efficiency of full-text index search has been significantly enhanced. In practical tests, this optimization can improve TopK query performance by 80% in scenarios with pure incremental data and achieve significant performance improvements in mixed data scenarios.
Optimized full-text index construction performance

The new version optimizes the full-text index construction process to address the issue of excessively long construction times for large-scale data. Previously, the main bottlenecks in full-text index construction in OceanBase Database were sorting, writing, and tokenization. The new version improves performance through several core optimizations, streamlining the construction process to ensure that inverted tables and forward tables are tokenized only once. By scanning the main table, data is ordered by docid, eliminating the need for global sorting. Writing efficiency is improved by using memory-based CONST or RLE encoding formats to handle duplicate rows, reducing the impact of storage row expansion. Tokenization performance is optimized by using a swiss hash map to reduce cache misses, with the tokenization hash map created only once and the tokenizer implementation optimized.

For example, with 32 parallel threads, the index construction time for space tokenization in the msmacro dataset decreased from 313.37 seconds to 46.15 seconds; for the mldr_en dataset, it decreased from 277.14 seconds to 28.22 seconds; for the mldr_zh dataset using the IK smart tokenizer, it decreased from 5152.72 seconds to 62.75 seconds; and for the mldr_zh dataset using the IK max_word tokenizer, it decreased from 5463.48 seconds to 70.63 seconds.

Comprehensive enhancement of vector capabilities

HNSW vector index incremental optimization

To address the pain points of memory bloat, degraded query performance, and slow restart loading caused by excessive incremental data in HNSW vector indexes, this release introduces a segmented architecture and an automatic memory management mechanism:
1. It divides incremental data into four segment layers (Active, Frozen, Incr, and Base), supporting freezing, persistence, and background compaction.
2. It controls the upper memory limit of the Active Segment and the compaction trigger threshold through the ob_vector_index_active_segment_max_size and ob_vector_index_merge_trigger_percentage parameters, balancing performance and resource consumption.
3. It provides the dbms_vector.flush_index and dbms_vector.compact_index interfaces to manually trigger the flush and compaction of vector indexes.
4. It adds the GV$OB_HNSW_INDEX_SEGMENT_INFO view to enhance observability. After optimization, the memory footprint of vector indexes in frequent write scenarios is controllable, query performance is stable, and fault recovery is faster. Furthermore, it maintains compatibility with original user interfaces, allowing users to achieve performance improvements without modifying their business logic.
IVF index support for tables without primary keys and heap tables

Traditional IVF indexes require a table to have a primary key, which limits their application on data without primary keys. This release extends the capabilities of the IVF index, allowing it to be applied to tables without primary keys and heap tables. This enables enterprises to build efficient vector indexes on data without primary keys, such as historical logs and user behavior, providing more possibilities for data analysis and search. This release adapts the index table structures of IVF_flat and IVF_pq indexes for tables without primary keys, ensuring they can be used together with other index types, such as HNSW and full-text indexes, in hybrid search scenarios. It supports multi-partition scenarios as well as correct DDL, DML, and query operations, with query performance also improved in certain scenarios.
IVF index support for HGraph

To address large-scale vector index scenarios where the number of IVF cluster centers (nlist) exceeds 5,000, traditional IVF using heap sort search incurs excessive computational costs. This release supports building a lightweight graph structure (HGraph) on the IVF cluster center table, significantly accelerating cluster center search speeds by replacing brute-force search with graph traversal algorithms. During the building process, each iteration accelerates the graph construction for cluster centers and reduces the memory consumption of the center_distance matrix, while supporting the rapid location of candidate clusters during the query phase. It maintains a high recall rate with controllable memory overhead, making it particularly suitable for large-scale vector search scenarios where nlist ranges from 5,000 to 65,000. It is recommended for cosine distance calculations; it is not recommended for large nlist scenarios under L2 distance due to the massive memory consumption of pre-computed tables.
Vector index recall rate evaluation

This release provides user-queryable vector index quality evaluation capabilities to solve the challenges of vector index performance tuning. Through the query_recall and index_recall interfaces in the DBMS_VECTOR package, users can proactively calculate the approximate search recall rate for specific queries or specified indexes, evaluating the matching degree between approximate search results and exact brute-force search results. The background automatically samples user queries, periodically calculates the average recall rate in batches, and persists it to the DBA_VECTOR_INDEX_RECALL_REPORT system view. This feature supports configuring sampling rates, time windows, and resource limits, providing quantitative evaluation metrics for vector indexes to facilitate index tuning.
In-memory sparse vector index optimization

This release includes several key optimizations for in-memory sparse vectors:
1. It optimizes sparse vector parsing by replacing regular expressions with character stream parsing, improving performance by approximately 8%.
2. It migrates the sparse vector rearrangement mechanism from the vsag library to the observer side to reduce memory usage.
3. It optimizes the pre-filter to skip unnecessary calculations in high-filtering scenarios.
4. It supports SINDI_SQ quantized indexes to reduce storage costs.
5. It adds the [G]V$OB_SINDI_INDEX_INFO view for sparse vector index information.
6. It adds the index_vector_memory_advisor subprogram to the DBMS_VECTOR system package to estimate the memory usage of vector indexes for tables yet to be created, and the index_vector_memory_estimate subprogram to analyze the memory usage of vector indexes for existing tables.
These optimizations improve the query performance and memory efficiency of sparse vector processing, and enhance the ease of use for vector index maintenance and diagnostics.
Support for multiple semantic indexes on a single column

Currently, to test the performance of different types of semantic indexes, users can only create them on different columns or tables for comparison. To make it easier for users to conduct performance testing and comparisons of semantic indexes under different models or parameters, this release supports creating multiple semantic indexes on a single column. It introduces the vector_index hint syntax, allowing users to specify the semantic index and filtering method (pre_filter or post_filter) used by the query via hints. When multiple semantic indexes are created on the same column, queries must explicitly specify the semantic index to be queried using the vector_index hint; otherwise, an error will be reported. This feature can only be used in testing environments and must be enabled via the _enable_multiple_semantic_indexes_on_column parameter. Do not enable it in production environments.
Document AI capability enhancement

V4.6.0 introduces Document AI features, enhancing document processing capabilities. Includes three core functions:
1. LOAD_FILE expression reads files from external storage, supporting local files, HDFS, OSS, S3, etc.
2. AI_PARSE_DOCUMENT parses content of complex-type documents.
3. AI_SPLIT_DOCUMENT table function splits document content and outputs segment information.
Combined with AI_EMBED, AI_RERANK, AI_COMPLETE, and hybrid search capabilities, it enables "RAG all in One" knowledge base solutions.

Optimizations for mixed HTAP workloads

Unified entry for columnstore replicas and full-featured (F) replicas

To support mixed TP and AP workloads in HTAP scenarios where business cannot be separated, OceanBase Database V4.6.0 introduces automatic routing for columnstore replicas. This feature allows columnstore replicas and F replicas to use the same request entry. The optimizer automatically routes complex analytical queries to columnstore replicas based on features such as weak reads, parallel execution, or when the cost of row-based queries exceeds a threshold. This enables automatic traffic splitting between TP and AP workloads, improving query efficiency and resource utilization. The feature provides three control modes:
1. OFF (disabled).
2. AUTO (intelligent routing).
3. FORCE (force columnar execution).
New system variables ap_query_route_policy, ap_query_cost_threshold, ap_query_replica_fallback, and hints provide flexible query-level control. These settings are compatible with the existing ob_route_policy strategy, and when set to COLUMN_STORE_ONLY, they take precedence.
Strong consistency reads on columnstore replicas

Traditional analytical queries often tolerate weak consistency reads, which provide cost-effective performance scaling. However, in HTAP scenarios with strict data consistency requirements, strong consistency reads are necessary to ensure the real-time and accurate results of analytical queries. OceanBase Database V4.6.0 introduces strong consistency reads on columnstore replicas. When an AP query is automatically identified and routed to a columnstore replica, if the ob_read_consistency parameter is set to 'strong', the system retrieves a global consistent snapshot version using GTS and checks the replay status of the columnstore replica. If the replica has not replayed to the required snapshot, it waits for the replay to complete before executing the read operation. This ensures that the latest data consistent with the primary replica is read. Compared to weak reads, strong reads provide stronger consistency guarantees, supporting AP queries requiring transaction-level consistency.
Automatic partition splitting for columnstore tables

Earlier versions of OceanBase Database supported automatic partition splitting for rowstore tables. V4.6.0 extends this feature to columnstore tables with a primary key. This feature determines the split point based on sampled memory and SSTable data within a partition and splits the data by row number into two target partitions. It shares the same parameters (enable_auto_split and auto_split_tablet_size) and syntax as automatic partition splitting for rowstore tables. However, by default, automatic partition splitting for columnar SSTables is triggered only when the number of rows exceeds 1 million. This feature helps distribute large table data across multiple partitions, improving AP query and write performance and achieving load balancing.
Optimized parallel execution queuing mechanism for PX

The previous approach of capping threads using the parallel_servers_target parameter was inflexible and could misallocate resources. OceanBase Database V4.6.0 introduces a new parallel execution queuing strategy based on real-time load. By statistically analyzing the real-time number of PX threads, this strategy enables queuing before execution and dynamic adjustment of parallelism during execution. When the real-time thread count exceeds the px_target_low_watermark low watermark, the parallelism is linearly reduced. When it exceeds the px_target_high_watermark high watermark, the parallelism is reduced to 1 to avoid CPU overload. This new strategy better addresses resource scheduling imbalances and resource waste in large-scale parallel scenarios.
Adaptive task partitioning for PX

The new version optimizes the phenomenon of long-tail tasks in SQL parallel execution scans. It dynamically adjusts the parallel task partitioning strategy based on data distribution and system load, improving the efficiency and resource utilization of parallel execution. In columnstore table scans, after using Skip Index for rapid filtering, data is concentrated in a few granules, while other granules are quickly consumed. When the remaining granules are fewer than the PX threads, the previous version would experience a drop in parallelism and long-tail scanning. The new version introduces PX task rebalancing capabilities. During runtime, if the remaining granules are fewer than the PX threads, the remaining granules are further split. This ensures a more balanced distribution of scanning tasks, preventing some threads from being idle while others bear excessive workload, thereby improving the parallel scanning efficiency of columnstore tables.
Optimized query performance for incremental data

Based on the LSM-Tree architecture, OceanBase Database provides a storage mode where baseline data is stored by column and incremental data is stored by row. This is suitable for scenarios with real-time, high-frequency data writes and complex analysis. However, when there is a large amount of incremental data, query performance may be affected. The new version optimizes incremental data read speed by supporting encoding and Skip Index for incremental data. It also pushes down expression aggregation statistics at the data block level. For the TPC-H 100G dataset, when the table mode is Delete-Insert and all data is incremental, hot query performance improves by approximately 260%, and cold query performance improves by approximately 150%. Incremental encoding can be enabled via the tenant-level parameter default_delta_format or the table-level attribute delta_format. Incremental Skip Index can be enabled via the tenant-level parameter default_skip_index_level or the table-level attribute skip_index_level.

Optimizer and executor enhancements

Optimized hard parsing performance for SQL

To address excessive memory usage during hard parsing of complex SQL statements with many repeated elements (such as large IN, OR, and INSERT clauses), the new version optimizes memory usage by restructuring the array data structures used during SQL parsing. Key optimizations include:
1. Introducing a new ObSqlArray to replace ObSEArray, reducing memory overhead.
2. Removing the auto_free mechanism and using allocator for explicit construction to prevent memory bloat.
3. Eliminating pre-allocated arrays to reduce initial memory consumption.
4. Adjusting the expansion strategy to dynamically calculate block_size based on data types.
These changes effectively address static memory overhead for empty arrays and dynamic expansion memory bloat, significantly improving memory efficiency during hard parsing of complex SQL statements.
Enhanced predicate derivation capabilities

The new version optimizes the logic for general predicate derivation (IN, LIKE, !=) to allow cross-view derivation. It also uses rules to determine the derivation relationships between different types of predicates. Additionally, it supports the removal of redundant NLJ (Nested Loop Join) predicates, identifying and eliminating those that are redundantly calculated on the right side if they were already executed on the left side, especially when the right side is a subquery or view. This allows for predicate elimination at the base table level, reducing unnecessary computational overhead.
Enhanced plan pushdown capabilities

In V4.6.0, the generation of Runtime Filter has been moved from the first stage to the second stage of the optimizer, enhancing plan robustness. The version control mechanism removes the first-stage generation logic, and in the second stage, Runtime Filter is added as a supplement to the selected plan, reducing reliance on statistics. The effective producer judgment logic has been enhanced, identifying operators with filtering effects instead of just filter operators. Runtime Filter now supports cross-DFO (Data Flow Operator) transmission and broader pushdown capabilities, penetrating more operators such as Window, Group By, and Set Operation. It also improves column derivation for equality join conditions, enhancing data filtering efficiency.

Additionally, the new version enhances Group By pushdown capabilities, solving the issue of intelligent pushdown for aggregation operators. Using a post-order traversal of the logical plan tree, Group By is pushed down during plan generation, avoiding high overhead from cost verification. It supports splitting Group By at JOIN nodes, performing Local Shuffle before aggregation to improve sharding matching, and intelligent placement decisions for Group By. It identifies and pushes down Single Group By, optimizing query performance while maintaining useful sharding, particularly beneficial for large-scale scenarios like star schemas, reducing intermediate data transmission.
Optimized common temporary table extraction

In V4.6.0, the logic for extracting common subexpressions in SQL statements has been optimized. Within a composite SQL statement, the new version can more intelligently identify shared computational parts among multiple similar queries with GROUP BY clauses. For example, for multiple similar queries like SELECT SUM(t1.c1), t2.c3 FROM t1, t2 WHERE t1.c1 = t2.c1 AND t2.c3 = some_value GROUP BY t2.c3, the old version could only extract common table expressions without GROUP BY, while the new version can extract the complete computation including GROUP BY into a common temporary table. Each branch then filters the results of the common temporary table based on the condition t2.c3 = some_value. By sharing the same grouping computation results, it reduces redundant calculations and improves performance.
Optimized execution of vector assignment syntax

The execution layer introduces the AliasRef expression to replace the previous cross-layer direct reference method. This addresses the expression specification issues when using vector assignment syntax like SET (col1, col2) = (SELECT ...) in UPDATE statements, ensuring correct execution and proper specification description during the optimizer phase.
Multi-value index support for Index Merge

To address performance issues with JSON multi-value indexes in scenarios with multiple multi-value predicates, the new version allows multi-value indexes in the Index Merge path. It generates separate scan nodes for each multi-value predicate, enabling precise predicate extraction and reducing expression computation. It also supports rewriting JSON_CONTAINS into multiple MEMBER OF predicates, avoiding non-precise predicate calculations and improving query efficiency for multi-value index queries, especially in JSON arrays with multiple query conditions.
Optimized Join Order selection algorithm

When SQL queries need to join multiple tables (such as in an order system querying user, product, and logistics information), different join orders can drastically affect query speed. Previous versions used the Iterative Dynamic Programming (IDP) algorithm for Join Order enumeration to find the optimal plan, but this was time-consuming with large datasets. V4.6.0 introduces a heuristic permutation enumeration algorithm that can quickly and with less memory enumerate a relatively optimal plan, accelerating hard parsing. It also employs an adaptive dual-algorithm strategy: for small-scale queries, it uses precise dynamic programming to find the optimal solution; for complex queries, it switches to the efficient permutation enumeration algorithm to quickly find a good approximate solution. This optimization is particularly beneficial for report queries and data analysis, significantly reducing query plan generation time and avoiding computational explosion through intelligent pruning.
Optimized performance for distinct aggregation queries

Traditional three-phase parallel algorithms for queries with multiple distinct aggregations can suffer from data replication bloat, high memory usage, and disk spilling. The new version introduces several optimizations:
1. Replaces redundant data replication with the Expansion operator to reduce shuffle data volume.
2. The Hash Distinct operator supports independent hash table construction for distinct columns, reducing row length and computational overhead.
3. Supports intelligent rewriting of filter distinct scenarios, converting count(distinct if(...)) into a combination of any_true and count to avoid redundant replication of the same data column.
4. Optimizes transmission encoding by compressing invalid NULL columns. This significantly reduces memory consumption and disk I/O, preventing performance degradation from data bloat, especially in large-scale scenarios with multiple distinct column aggregations.
Optimized Hash Group By Limit Pushdown

To address high memory and CPU consumption in GROUP BY ... LIMIT K queries where the Hash Group By operator may need to compute all groups, the new version introduces Limit pushdown optimization. When the number of groups in the hash table reaches the Limit threshold, it stops creating new groups and only processes data that belongs to existing groups. By pushing the LIMIT value down to the Hash Group By operator, the algorithm complexity is reduced from O(N) to O(K), significantly decreasing unnecessary calculations and memory usage. Testing shows a 7-8x performance improvement and a 10x reduction in Hash Group By operator execution time for 1M rows with Limit 10, substantially reducing resource consumption for large-scale group queries.
Enhanced function computation performance

V4.6.0 optimizes the computation performance of several functions, including approx_count_distinct, listagg ... rollup, time extraction functions for string types, cast string to datetime/date, sec_to_time, replace, hex, reverse, instr, locate, concat, lrpad, left, right, sign, exp, sha/sha2, aesdecrypt, crc32, from_base64, bitwise operation functions, and IP-related functions. This further accelerates data processing and meets the requirements for real-time computing. The new version also enhances the self-validation capabilities of the vectorized expression framework, improving the stability of expression computations.
Window operator support for streaming computation

To address the issue of high memory and CPU overhead caused by materializing entire partitions during window function computations, V4.6.0 implements fully streaming execution. It supports row_number, rank, and dense_rank functions, as well as first_value, last_value, and nth_value under specific window definitions (ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) and rules, and aggregate functions without the distinct clause. It also supports first_value and nth_value under specific window definitions (ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) and rules. By restructuring the window operator framework and optimizing modules such as partition encoding, extraction of common operations, and temporary memory lifecycle management, the system can compute and output results row by row without materializing the entire partition when conditions are met. When the cardinality of the partition key is low and concurrency is high, it combines range distribution optimization to enable concurrent execution.
System variable loading optimization

The new version optimizes the performance of loading system variables during the initialization of inner session from multiple dimensions. In the old version, each time a new inner session was created, over 800 system variables needed to be loaded with their default values, which introduced performance bottlenecks such as string comparisons, length calculations, and type conversions. The new version adopts a hybrid loading strategy, preloading 6 core variables (such as character_set_connection and sql_mode) and over 50 INFLUENCE_PLAN variables that affect query plans, while lazily loading other variables as needed. It also optimizes the underlying implementation by using id indexing to accelerate lookups and implementing a dedicated fast type converter to avoid the overhead of the general framework. In complex query scenarios, the percentage of init_session_info dropped from 34% to 3%, and the execution time decreased from 100 seconds to 55.889 seconds, demonstrating significant optimization.
Performance optimization for Hive/Iceberg external table queries

The new version accelerates predicate filtering on string columns in Hive/Iceberg tables that use dictionary encoding. By filtering at the dictionary code level, it avoids unnecessary string decoding overhead. It also supports a memory optimization mechanism that copies dictionary values to shared memory for multiple rows. It provides both white-box and black-box filtering modes. In test scenarios such as TPCH Q19, integer comparisons at the dictionary code level significantly outperform string comparisons, especially in high-selectivity filtering scenarios where the number of unique values in a column is much smaller than the total number of rows.
Support for Bloom Filter in external tables for Parquet/ORC files

The new version introduces support for Bloom Filter indexes in external table queries to accelerate equality predicate filtering on Parquet and ORC files. By leveraging the built-in Bloom Filter metadata of the file format, it can quickly determine whether a target value may exist in the current data block before reading data when the query condition is an equality predicate such as col = 'xxx'. The implementation uses a lazy loading strategy, loading the Bloom Filter only when the min-max range cannot determine the filtering. This optimization is particularly effective for equality queries on high-cardinality columns (such as UUIDs and usernames), significantly reducing unnecessary I/O reads.
Parallel parsing of CSV files in external tables

V4.6.0 introduces parallel parsing of single CSV files to improve the efficiency of large-scale data imports and external table queries. It implements intelligent file splitting by combining the gambling boundary method with full-scan boundary detection. When the number of files is less than or equal to twice the number of threads, the system automatically selects a parallel parsing strategy based on the complexity of the data format. New FORMAT options (PARALLEL_PARSE_ON_SINGLE_FILE, PARALLEL_PARSE_FILE_SIZE_THRESHOLD, and MAX_ROW_LENGTH) provide flexible control over parallel behavior. By leveraging the multi-state conversion mechanism of the GI operator and SQC coordination, it achieves efficient parallel processing while maintaining data integrity. The new version also supports the SNAPPY compression algorithm for CSV files in external tables.
Optimization for downloading large partitions from ODPS (MaxCompute) across multiple machines

To address the issue of high data volume in a single partition and bandwidth limitations during single-machine downloads in ODPS external table imports, V4.6.0 implements parallel downloading of the same partition across multiple machines. During the QC phase, it identifies large partitions by parallelly obtaining their physical sizes. In the SQC phase, it dynamically and finely splits tasks. By utilizing the transaction consistency feature of the ODPS Tunnel API, it deserializes sessions in SQC to enable parallel reading across multiple machines. The task scheduling algorithm is optimized to prioritize large partitions and ensure load balancing, reducing initialization overhead while maintaining compatibility with single-machine parallel processing. This optimization effectively addresses partition skew scenarios, fully utilizing cluster network bandwidth to enhance import performance.

Transaction and storage engine enhancements

Optimized small transaction dump status

For high-frequency small transaction scenarios, the transaction status refill mechanism is optimized to speed up data cleanup. This avoids long transactions blocking data recycling: even when some transactions are still uncommitted, the system can clean up data from completed transactions, improving read and write performance.
Migration and replication support for partition splitting and macroblock reuse

In scenarios such as migration, replication, and physical recovery, if the source side has multiple versions of major SSTables or macroblocks that can be reused after partition splitting, the destination side still needs to copy them repeatedly, leading to wasted space and bandwidth. V4.6.0 optimizes the macroblock reuse capability during migration and replication. By building a reuse mapping table, scheduling the splitting tablets in sequence, and modifying the macroblock copy RPC, the destination side can reuse the major macroblocks from the source side and shared macroblocks between multiple versions of major SSTables. The new TabletGroupGeneratorDag dynamically schedules dependencies, ensuring that the source side is migrated before the destination side. This feature reduces the amount of macroblock copying, improves migration efficiency, and reduces storage space usage.
Append-only tables

append-only is a new table update model in OceanBase Database, following partial-update and delete-insert. append-only tables only allow INSERT operations and prohibit UPDATE, DELETE, and other operations that may involve updates, such as REPLACE INTO. Data written to an append-only table cannot be modified. This feature is suitable for scenarios requiring immutable data, such as financial transaction records, logs, and IoT data monitoring. It meets the requirements for audit compliance, efficient appending, and high-performance queries. To create an append-only table, use the MERGE_ENGINE = append_only syntax. The write order does not require rowkeys to be incrementally increasing, and you can combine it with TTL (Time To Live) for expired data cleanup.
TTL (Time To Live)

The TTL (Time To Live) feature is a new data expiration deletion feature based on the merge mechanism. It defines expiration strategies for SQL-mode relational tables. Unlike traditional KV-mode tables, which use background threads to write DELETE rows, TTL asynchronously deletes data during merges without occupying Clogs or write bandwidth. Deletion information is only recognized internally by OceanBase Database and is not synchronized to downstream systems. Users can configure the enable_ttl switch, set the ttl_duty_time time point, or manually trigger a TTL task. When a TTL task is triggered, the system synchronizes the deletion rules to the MDS through a transaction. During merges, data is filtered and physically deleted based on the recycle version number. TTL supports DELETE_INSERT and APPEND_ONLY table modes. Currently, it only supports expiration strategies based on the hidden column ora_rowscn.

PL engine enhancements

Performance optimization of the system package execution framework

The new version optimizes the system package execution framework to avoid the need to store and deeply copy PL symbol tables in paramstore. In a test scenario with 1 million data records, the execution time for select count(utl_raw.CAST_TO_RAW(c1)) from t2 decreased from 6.05 seconds to 5.17 seconds (a 14.5% improvement), and the execution time for select count(dbms_lob.getlength(c1)) from t2 decreased from 4.89 seconds to 3.89 seconds (a 20.4% improvement).
Performance optimization of the PL execution framework

The new version optimizes the column convert overhead in scenarios where PL input parameters are of basic types. It supports precomputing parameter default values and shallow copying constant parameters. By eliminating memory contexts, avoiding schema guard refreshes, canceling column conversions, and preallocating parameter storage memory, the inner call performance improves by 3%-11%. Optimizations such as avoiding unnecessary deep copies by storing results after checking returned rows and caching tenant configurations in the top-level PL context improve loop execution performance. Moving sys_var_in_pl_cache_str_ to the session-level cache eliminates hotspots caused by repeated serialization during PL cache retrieval. Optimizing the execution frameworks for multiple layers of package routine and subprogram routine improves performance by 24%-39% through top-level PL context caching and avoiding unnecessary execution context creation. Caching execution environment information at the session level avoids repeated system variable loading, improving performance by 5%. Optimizing the execution frameworks for multiple layers of PL improves performance by 44%-75% through top-level PL context caching, disabling end-to-end tracing, optimizing expression cleanup logic, and sharing vslice allocator across multiple layers of PL.
Performance optimization of expression evaluation within PL

The new version resolves the issue of long paths for reading and writing package variables, refines data type conversion logic, and eliminates unnecessary conversions in assignment statements, default value generation, and deep copies of complex types. It also removes the clear expr flag logic during expression evaluation. This results in a 12%-26% performance improvement in expression evaluation within PL.
Enhanced support for FORALL in PL/SQL

The new version enhances the functionality of the FORALL statement in PL/SQL, supporting advanced use cases such as dynamic SQL batch execution, indices of collection, and values of index_collection. It also supports complex types (such as PL extension types, JSON, and geometric types) as members of collection. This optimization significantly improves the performance of batch DML operations in PL/SQL, allowing loop execution scenarios to leverage the arraybinding feature of the SQL engine for efficient batch processing.
Performance optimization of the dbms_random system package

The new version optimizes the performance of the dbms_random system package through eight improvements: eliminating unnecessary expression evaluations, simplifying package variable access paths, optimizing the execution framework, and eliminating unnecessary column conversions and UDF calls. In some scenarios, this results in a 42% performance improvement.
Support for query range extraction for non-deterministic UDFs with constant parameters

The new version supports extracting query ranges for non-deterministic UDFs with constant parameters, enabling index filtering for scenarios where col = udf(). This significantly improves SQL computation performance in these scenarios.
Asynchronous commit support for PL transactions

The new version introduces asynchronous commit logic at the protocol layer to manage PL transaction commit status, supporting parallel processing of transaction commits and PL execution in distributed transaction scenarios. This achieves asynchronous transaction commits within stored procedures, effectively addressing thread occupancy issues caused by frequent synchronous commits at the PL layer, thereby improving system throughput. In complex transaction scenarios, performance can improve by up to 39%, providing significant performance benefits for high-concurrency PL workloads.
Streaming cursor prefetch optimization

The new version reduces the overhead of SQL auditing and cursor environment switching by prefetching data in batches in memory. This optimization significantly improves the performance of streaming cursors, with performance improvements of 44%-54% when cursor data volume is greater than or equal to the batch cache size (100 rows). This optimization requires enabling the _enable_streaming_cursor_prefetch parameter.
Support for reading uncommitted transactions in streaming cursors

In Oracle mode, reading uncommitted data from streaming cursors and then committing the transaction causes the cursor to become invalid, which is incompatible with Oracle behavior. The new version optimizes this behavior by checking whether uncommitted data was read when the streaming cursor is opened and marking it accordingly. When the transaction is committed, affected streaming cursors are converted to non-streaming cursors, and data is loaded into local memory starting from the current cursor position, avoiding the "fetch out of sequence" error caused by incompatibility with Oracle behavior.
Trigger performance optimization

The new version optimizes the trigger execution path by introducing a direct method for retrieving trigger package bodies and adding the get_cached_package_body() method to avoid unnecessary package specification lookups. This reduces schema operation overhead during trigger execution, thereby improving trigger performance.
Performance optimization of dynamic SQL execution

The new version optimizes dynamic SQL precompilation by prioritizing PS cache lookups for precompiled statements, avoiding repeated execution plan creation, and adding parameter and expression counters to improve cache matching accuracy. This enhances dynamic SQL execution performance.
Parameterization support for dynamic SQL within PL

In older versions, only CALL statements were parameterized in PL, and internal SQL statements were not. Due to different parameter literals, internal SQL statements had different SQL IDs, preventing them from utilizing the Plan Cache. This led to frequent cache plan evictions and increased hard parsing and RT (response time) for PL requests. The new version uses Fast Parser technology to quickly identify constants and replace them with "?" placeholders, ensuring consistent SQL ID generation for the same logic. It also handles special syntax like IS NULL to prevent incorrect parameterization, ensuring syntax correctness. This optimization enables effective reuse of Plan Cache for internal SQL statements in high-concurrency scenarios, reducing plan generation overhead and improving performance.
Rewriting select into from dual statements to PL assignment statements

The new version supports rewriting select into from dual statements in PL/SQL to assignment statements to avoid SQL execution overhead. This optimization is controlled by the system variable PLSQL_CAN_TRANSFORM_TO_ASSIGN. This optimization improves performance by approximately 84%, significantly reducing the execution cost of such statements.

Compatibility and feature enhancements

Session-level temporary tables in MySQL mode

V4.6.0 introduces session-level private temporary tables in MySQL mode, supporting creation, modification, deletion of temporary tables, and index creation based on temporary tables, enhancing the smooth migration from MySQL.
Performance optimization for global temporary tables in Oracle mode

V4.6.0 restructures transaction-level and session-level global temporary tables in Oracle mode, introducing temporary tablet for storing temporary table data. This data is released at the end of the session, optimizing performance.
MySQL Recursive Union Distinct

MySQL 8.0 supports Recursive Union All and Recursive Union Distinct in Common Table Expressions (CTEs). OceanBase Database supports Recursive Union in MySQL mode since V3.2.3, but only Recursive Union All. V4.6.0 extends compatibility with MySQL Recursive Union Distinct, ensuring output data uniqueness. It also enhances Recursive Union All, supporting data persistence when memory is insufficient.
Prepare-phase parameters under the combined PS protocol in Oracle mode

With the combined PS protocol, the new version supports parsing during the Prepare phase when parameters are present, addressing type inference issues in Oracle mode when parameter information was unknown. Upgrades keep legacy behavior by default; enable _prexec_prepare_with_params when you need this feature.
Partition exchange between primary and subpartitioned tables

The new version supports partition exchange between a single-level partitioned table and a two-level partitioned table. The target table must be a two-level partitioned table whose first-level partitioning is RANGE or LIST. The source table must be a single-level partitioned table, and its partitioning must exactly match the corresponding subpartition definitions of the target table. Use the ALTER TABLE target_partition_table_name EXCHANGE PARTITION first_level_partition_name WITH TABLE origin_partition_table_name WITHOUT VALIDATION; syntax for flexible partition management, suitable for fast large-scale data migration. Note that statistics become invalid after the exchange and must be regathered.
JSON column conversion support

The new version supports converting JSON columns to text, mediumtext, or longtext, simplifying O&M tasks.
Enhanced DBMS_SCHEDULER in Oracle mode

The new version adds BYMONTH, BYYEARDAY, BYMONTHDAY, BYDAY, BYHOUR, BYMINUTE, and BYSECOND to the repeat_interval option in DBMS_SCHEDULER, offering flexible job scheduling. It also supports Chinese names for string parameters in the GBK character set.
Extended spatial functions in Oracle mode

Oracle mode introduces SDO_CONTAINS, SDO_ANYINTERACT, SDO_AREA, and SDO_UTIL.GETVERTICES for common GIS operations like spatial containment, intersection, area calculation, and vertex extraction. These functions are suitable for spatial data analysis, geospatial calculations, map rendering, and geometric quality checks.
Enhanced materialized views

The new version significantly enhances materialized view capabilities, including:
1. Incremental refresh now supports outer joins, UNION ALL, and non-aggregated single-table mode.
2. Materialized view logs support column addition and view renaming.
3. Nested materialized view cascading refresh resolves multi-level dependency issues.
4. Supports AS OF PROCTIME() to skip refreshing a dimension table during incremental refresh, improving efficiency. Incremental materialized views with multi-table joins can reference ordinary views that are declared with AS OF PROCTIME().
5. Aggregated incremental refresh now supports min/max functions and non-basic column parameters.
6. Supports nested incremental refresh and LEFT JOIN aggregation for non-key-based materialized views.
7. Automated MLOG management, automatically creating and cleaning redundant MLOG tables.
8. Full refresh materialized views now support UDFs.
9. New Minimal mode switch improves columnar wide table refresh performance.
10. Added range predicate comparison logic (<, <=, >, >=, =, BETWEEN AND) for materialized view rewriting, determining if the query range is fully contained.
These enhancements improve flexibility and performance for complex business scenarios.
Iceberg V3 format support

V4.6.0 extends support for Iceberg V3, enabling Time Travel queries to access historical data via branch names, tags, snapshot IDs, or timestamps, even after data deletion or modification. It also supports Deletion Vector, allowing retrieval and removal of marked-deleted rows during Iceberg table reads.
Iceberg Insert support

The new version supports writing data to Iceberg tables using INSERT INTO catalog.db.table SELECT ... syntax, with file types parquet and orc (controlled by write.format.default, defaulting to parquet). Supports serial and parallel write modes via parallel hint. Currently supports only V2 Iceberg tables, not partitioned tables, and does not collect column size, min/max, or null counts statistics. Maintains data type consistency with select into parquet/orc for data lake integration.
REST Catalog protocol support

Adds REST Catalog support, allowing standardized HTTP REST API access to Iceberg tables. OceanBase Database includes an HTTP client connection pool with configurable maximum connections, timeout, and keep-alive settings, reducing connection overhead. Provides monitoring view [G]V$OB_EXTERNAL_CATALOG_CLIENT_POOL_STAT for real-time client pool status. Use CREATE EXTERNAL CATALOG to create REST type catalogs, configurable with server address and authentication methods.
Parquet/ORC file support for complex data types

The new version supports reading and writing complex data types (Array, Map, JSON) in Parquet/ORC files. Files containing arrays (Array), maps (Map), or nested JSON can be directly imported via external or temporary tables and exported in the same format.

System performance improvements

Performance optimization in large-scale domestic machines

Optimization of CPU hotspots in typical scenarios for domestic CPUs such as Hanguang and Kunpeng, including PDML and replication table query performance. In a large 256-core environment, the high-concurrency PDML performance improves by 25%, and memory consumption in non-PDML scenarios decreases. Business query performance when connecting to followers for replication tables improves 14x, increasing from 20,000 QPS to 270,000+, with a gap of less than 10% compared to the 300,000 QPS in direct leader queries. In a small 32-core environment, the same optimizations also result in performance improvements.
Memory allocation for library functions is optimized.

We have designed a lock-free memory allocator based on the mimalloc algorithm. It optimizes the performance of small memory allocations when OceanBase Database integrates various external libraries. This allocator uses a fragmented memory block design, a delay-idle recovery strategy, and optimizes lock contention. It breaks through the performance of local small object allocation in threads. The core architecture includes a thread-local allocator, size-class-based management, and fast/slow path allocation mechanisms. The fast path provides nearly O(1) performance, while the slow path handles memory released across threads in batches. In a single-threaded scenario, its performance improves by about 168% compared to that of glibmalloc. In a 16-threaded local allocation scenario, the performance improves by about 180%. This supports more efficient memory management for large-scale and high-concurrency scenarios.

Enhancements to the OBKV data model

OBKV supports reading from read replicas and from the same region's data centers.

OBKV adds weak reads and closest-IDC reads: you can read from a non-leader replica to reduce leader load and prefer the replica in the nearest IDC to cut latency. This applies to OBKV-HBase 2.5.0+ and OBKV-Table 2.3.0+ clients. Use the follower_first or follower_only routing strategy, configured globally or per statement in the client configuration or code.
TTL-based tasks support index scans and rate limiting.

The TTL task now supports index scanning, allowing users to accelerate the TTL data cleanup process by specifying indexes. Traditional primary key scans require reading complete rows to determine if data has expired, but local indexes including the expiration column can significantly reduce I/O overhead and improve scan speed. This feature is enabled through the TTLScanIndex parameter in KV_ATTRIBUTES, supporting both Table and HBase models. The index must be a local index containing TTL-related columns, and global indexes are not supported.

At the same time, OBKV TTL tasks feature new throttling capabilities, allowing you to set the kv_ttl_delete_max_ops parameter to control the maximum throughput of delete operations for TTL tasks per second. When a tenant has limited resources or high request pressure, TTL tasks may impact system performance. Throttling helps effectively manage CPU and clog write bandwidth. The default value 0 indicates no throttling, while setting it to a value greater than 0 allows precise control over the deletion rate. This feature, combined with the ttl_thread_score parameter, ensures each partition task has a throughput limit, ensuring proper resource allocation.
Enhanced monitoring for OBKV-HBase

The new version optimizes the monitoring system for the OBKV-HBase server. Main improvements include: differentiating monitoring metrics for HBase Get and Scan operations, and distinguishing between single Put/Delete and batch operations. It also adds seven new types of monitoring item, namely TABLE_API_HBASE_CHECK_AND_MUTATE, TABLE_API_HBASE_SCAN, TABLE_API_HBASE_GET, TABLE_API_HBASE_BATCH_PUT, TABLE_API_HBASE_BATCH_DELETE, TABLE_API_HBASE_BATCH_GET, and TABLE_API_FAILED_OP. Clients can use the MetricsExporter or JMX interface to obtain detailed performance metrics such as QPS, RT, and P99 for more accurate performance monitoring and analysis. This feature requires the new version of OBKV-Table 2.3.0+ and OBKV-HBase 1.5.0+/2.5.0+ clients.
Hotspot key query optimization in OBKV-HBase

OBKV-HBase queries perform full-scan strategies. All data under qualifiers is scanned and then filtered. This process results in a large amount of redundant scan and processing overhead, especially when querying for a wide row or a multi-version row that involves only a few qualifiers. This causes significant delays. In V4.6.0, the HBASE_HTABLE_HOTKEY_GET_OPTIMIZE_ENABLE parameter is introduced to enable the optimization feature for hot Key Get operations when executing the Get operation. When this feature is enabled, the rescan technique is applied on the server side to optimize queries involving hot keys, thereby improving query performance in hot key scenarios.

Enhanced OBCDC

OBCDC supports incremental data synchronization after table-level restoration.

Table-level recovery is an offline DDL operation across tenants. In traditional solutions, OBCDC cannot obtain the source table information associated with the hidden table, leading to synchronization failures. The new version introduces the following improvements: filters the DDL operations during table-level recovery to prevent abnormal termination of OBCDC; and at the end of table-level recovery, the observer side records the final table name as a TABLE_RECOVER_END type DDL operation. OBCDC uses this final table name for black and white list matching. This feature applies to incremental data synchronization after table-level recovery is completed, ensuring the stable operation of OBCDC in the table-level recovery scenario.
OBCDC support for virtual generated columns

V4.6.0 supports computing virtual generated column values in real time on the OBCDC side in MySQL-compatible mode, so downstream systems stay compatible with MySQL Binlog Service. The feature is controlled by the enable_output_virtual_generated_column parameter. When it is enabled, OBCDC uses the SQL module calculation APIs together with virtual generated column dependency metadata from the data dictionary (including session-fixed variables and column default values) to compute and fill in virtual generated column values during formatting. This preserves correctness and MySQL Binlog behavior.

Security and reliability enhancements

Non-intrusive column encryption

Added sensitive column data protection feature, allowing database administrators to create column-level data protection rules on tables. When a user executes a SELECT operation and the projected columns include an encrypted column, the server checks if the user has plaintext access permissions. If yes, the encrypted column's plaintext results are returned normally. If not, the data is encrypted according to the data protection rules before being returned to the user.
Support for caching_sha2_password authentication

Added support for the caching_sha2_password authentication method in both MySQL and Oracle modes. This method can be specified when creating users, providing a more secure password protection mechanism. It requires ODP 4.4.0 for compatibility.
Optimized read from standby replicas

The new version optimizes the transaction state inference mechanism. Instead of each replica independently collecting participant states, the leader coordinator now uniformly collects and caches these states, significantly reducing message interactions. Additionally, a retry mechanism is introduced to automatically forward requests to other replicas when a replica falls behind, preventing blocking or incorrect transaction state inference.
Enhanced backup verification

V4.6.0 introduces physical backup verification to check the integrity and availability of backup and archive data. This feature allows users to verify backup sets (Backup Set) and archive logs (Archive Log) without performing actual recovery operations, checking for issues like missing files, physical data corruption, or logical inconsistencies. It offers two verification levels: BASIC (file list integrity check) and PHYSICAL (physical data correctness and logical check), catering to different verification needs. Supports verification of the entire backup path, specific backup sets, and specified archive pieces, with cross-cluster path verification capabilities.
Resource limits for large queries

Addressing the issue of long-running large queries consuming excessive CPU resources and affecting small query responses, the new version further optimizes the large query resource limits feature. The older version already supported estimating large query resource group retries based on plan cache historical durations before execution. V4.6.0 introduces a new trigger path, monitoring actual execution durations during runtime. If the threshold is exceeded, threads automatically switch to the large query resource group for throttling. The large query resource group uses cgroup to control CPU usage, defaulting to 30% to prevent large queries from starving new requests. When the number of threads in the large query resource group exceeds half of the tenant's maximum thread count, new large queries are no longer accepted, ensuring the default resource group has sufficient resources to handle small queries, maintaining coexistence with prioritized small query performance. The _large_query_cpu_quota_adjustment_step parameter can adjust the CPU throttling step size, with larger values resulting in faster throttling but potential QPS oscillation, and smaller values providing stable QPS but slower throttling. Please test carefully before use.

Operation and maintenance enhancements

Enhanced SQLSTAT capabilities

To address the issue of SQL statements without execution plans consuming excessive memory due to shared memory with the Plan Cache, we restructured the SQLSTAT storage architecture. We implemented a three-tier storage scheme to better manage memory usage. This scheme includes:
1. Cache the execution statistics of the last N runs on the physical plan.
2. Global SQLSTAT is managed independently of its lifecycle.
3. Use batched updates to reduce hash table access overheads.
Supports the independent eviction mechanism of SQLSTAT to avoid interference with the Plan Cache. Introduced a background thread that regularly monitors memory usage and evicts the least frequently used SQLSTAT entries when the memory limit is exceeded. Retained the physical plan pointer cache for SQLSTAT, accumulating execution statistics into a global HashMap at completion. Addressed the issues of incomplete data statistics and memory contention, ensuring the accuracy and continuity of SQL execution performance statistics.
ASH data integrity enhancement

In new versions, we introduce a compressed ASH data integrity enhancement mechanism to address the issue of ASH buffer overflow and loss of diagnostic information caused by a large accumulation of requests in the network queue. This mechanism uses tenant ID and PCode to aggregate requests in the queue, thereby compressing ASH write operations and preventing each request from writing multiple records. It dynamically detects the write speed using a sliding window average algorithm. If the average write amount over the past N seconds exceeds the buffer capacity threshold, compression is automatically enabled. If it falls below the threshold, compression is automatically disabled. The ASH table now includes a weight column and an aggregation flag column, ensuring that aggregated column data is completely written to the WR table, preventing sampling distortion. This addresses the problem of losing diagnostic information in high-concurrency scenarios, providing more complete session history data for troubleshooting and enhancing system observability.
Refined wait events

Wait events are essential for performance diagnostics. In earlier versions, many internal wait events of OceanBase Database were grouped under a single default wait event, making it difficult to pinpoint specific issues based on wait event names. In this release, default wait events are split out and largely replaced with dedicated wait events for more precise diagnostics.
Plan Cache diagnostics enhancements

The new version provides a systematic diagnostic capability for plan cache misses, simplifying the diagnostic process. It records detailed reasons for plan cache misses during the plan matching process, including 14 issues such as non-existent cache nodes, type mismatches, schema mismatches, and others. This information is stored in the newly added create_reason column of the [G]V$OB_PLAN_CACHE_PLAN_STAT view. Additionally, new columns such as cache_node_id, pcv_id, and plan_set_id are added to facilitate quick root cause analysis for plan misses and reduce operational complexity.
Plan Cache Memory Management Optimizations

This optimization addresses the memory allocation problem for the Plan Cache module by changing the memory judgment standard from mem_hold (page occupancy) to mem_use (actual usage). In older VOS versions, 8KB pages were used for memory allocation, which can lead to memory fragmentation. When diagnostic objects such as SQL_STAT consume a large amount of memory with fragmentation, even if the actual memory usage is low, mem_hold can reach its upper limit. This can prevent physical plans from being added to the cache and trigger LRU eviction. This optimization ensures more accurate reflection of the memory size of cache objects and avoids issues with cache mechanism failures caused by memory fragmentation.
SQL Audit/Plan Monitor query efficiency improvement

The query performance of the [G]V$OB_SQL_AUDIT and [G]V$SQL_PLAN_MONITOR system tables is enhanced by introducing late materialization technology. In earlier versions, due to the lack of support for filter pushdown, all columns had to be projected before filtering calculations. The new version generates pushdown filters during the code generator phase. During execution, it first projects and pre-filters the columns involved in filtering before projecting all output columns. This optimization significantly improves system table query performance in scenarios like trace_id equality queries.
LOB Consistency Check and Repair

If LOB data is stored separately in the main and auxiliary tables, there is a risk of inconsistency after failures. This version provides end-to-end consistency support: scheduled or manual main/auxiliary checks for three anomaly types (auxiliary row missing on the main side, main row missing on the auxiliary side, and length mismatch); the dbms_lob_manager package for repair, job control, and progress; and Resource Manager integration to cap resources for check jobs so background scans do not hurt workloads. New views DBA_OB_LOB_CHECK_TASKS and DBA_OB_LOB_CHECK_EXCEPTION_RESULT show job progress and exception details, addressing query failures and disk growth from inconsistent LOB data.
Performance optimization for DROP/PURGE DATABASE operation

Currently, the DROP DATABASE operation is executed atomically, during which table locks and writes to internal tables are executed serially. In scenarios with a large number of tables, this can lead to timeouts and failures. In the new version, asynchronous parallel deletion of tables is supported to improve performance. To enable this feature, set the _enable_atomic_drop_database parameter to False. Once enabled, the database is dropped in a non-atomic way. If deletion fails, tables that were already dropped are not rolled back. Use with caution. For PURGE DATABASE with the recycle bin enabled, parallel deletion of tables in __all_recyclebin is also supported. Performance improves substantially—for example, about 7x for 6,000 complex tables (56 seconds) and about 14.5x for 20,000 complex tables (101 seconds) in internal tests.
Heterogeneous zones

A tenant can now use up to two different UNIT NUM values across zones, improving operations and scale-out/scale-in flexibility. Previously, every zone of a tenant had to use the same UNIT NUM; when hardware failed and could not be replaced, you had to lower UNIT NUM in all zones, widening the maintenance impact. For capacity changes, you can use this capability to work on Follower replicas first, then switch the Leader to the zone where scaling has finished, for smoother expansion or shrinkage.
Window compaction

Traditional major compactions use a globally consistent snapshot to guarantee consistency, support primary and index table checks and primary/standby verification, and provide strong consistency guarantees. However, the mandatory global merge strategy becomes costly and hard to bound in time as incremental data grows, partition counts reach the millions, or workloads are columnstore-heavy. With large business datasets, merges can also spill into peak traffic windows.

To address this, V4.6.0 introduces the feature of window compaction. The built-in scheduled job DAILY_MAINTENANCE_WINDOW in DBMS_SCHEDULER enables users to initiate compactions on a scheduled time window (by default, 2:00 to 8:00). This feature initiates compactions only for tables with incremental data or partitions meeting specific criteria, prioritizing tasks based on factors like incremental row count, table popularity, and days since the last compaction. The compactions automatically stop at the end of the specified window. You can enable or disable this feature by setting the enable_window_compaction parameter and control the number of compaction threads using the SET_THREAD_COUNT() procedure. Optionally, you can use a dedicated resource plan to precisely manage the resources consumed during compactions, helping to ensure minimal impact on peak business performance.
Optimized deadlock detection

OceanBase Database exposes deadlock history in CDB/DBA_OB_DEADLOCK_EVENT_HISTORY, but earlier releases did not record SQL for holder nodes on the deadlock cycle, which made diagnosis harder. The new release enriches deadlock diagnostics: for example, it adds the visitor session id, the machine, log stream, and tablet where the wait occurs, plus the SQL currently running, the SQL holding the lock, and the lock request time after a deadlock—making deadlock analysis much easier.

Compatibility changes

Changes in product behavior

Feature	Change description
Limitations added to the OBKV-HBase time series model	Starting from V4.4.1, the time series model is supported based on the OBKV-HBase model. However, many HBase interfaces are not effective or meaningful in the time series model. New versions define the scope and behavior limitations of the time series model. The time series model is designed for write-heavy, read-light scenarios such as monitoring, IoT, and vehicle networking. It must be enabled with the distributed switch. This model does not support multiple column families, TimeToLive, and Cell TTL parameters, and some HBase interfaces are disabled. At the data operation level, only core interfaces such as Put, Put, Get, and Scan are supported, and operations like `Delete`, `Append`, and `Increment` are not supported. `Get` and `Scan` operations have parameter restrictions, but necessary features such as `TimeRange`, `setLimit`, `setCaching`, and `setMaxResultSize` are open to meet the query needs of time series data. These limitations ensure the efficiency and stability of the time series model and provide dedicated data access interfaces for time series scenarios.

Feature

Change description

Limitations added to the OBKV-HBase time series model Starting from V4.4.1, the time series model is supported based on the OBKV-HBase model. However, many HBase interfaces are not effective or meaningful in the time series model. New versions define the scope and behavior limitations of the time series model. The time series model is designed for write-heavy, read-light scenarios such as monitoring, IoT, and vehicle networking. It must be enabled with the distributed switch. This model does not support multiple column families, TimeToLive, and Cell TTL parameters, and some HBase interfaces are disabled. At the data operation level, only core interfaces such as Put, Put, Get, and Scan are supported, and operations like Delete, Append, and Increment are not supported. Get and Scan operations have parameter restrictions, but necessary features such as TimeRange, setLimit, setCaching, and setMaxResultSize are open to meet the query needs of time series data. These limitations ensure the efficiency and stability of the time series model and provide dedicated data access interfaces for time series scenarios.

Parameter changes

Parameter	Change type	Change description
default_delta_format	New	Specifies the default storage format for incremental data when creating a table. In AP load scenarios, the default is `encoding`, and in other load scenarios, the default is `flat`.
default_skip_index_level	New	Specifies the default Skip Index creation level when creating a table. In AP load scenarios, the default is 1, and in other scenarios, the default is 0.
enable_ttl	New	Controls whether to enable the TTL (Time To Live) feature. When enabled, background threads periodically delete expired rows defined by TTL. The default is `False`.
ttl_duty_time	New	Controls the scheduled time for TTL (Time To Live) background tasks. The default is "01:00".
default_table_merge_engine	Value change	Adds the `append_only` option. The `merge_engine` attribute in create table statements also synchronizes this option.
ob_vector_index_active_segment_max_size	New	Controls the maximum memory usage of the vector index Active Segment for a single partition. If exceeded, it will forcibly trigger a freeze persistence operation. The default is 0, indicating system adaptive.
ob_vector_index_merge_trigger_percentage	New	Controls the maximum proportion of memory that the incremental segment of the vector index can occupy relative to the corresponding partition's vector index memory. If exceeded, the system will automatically trigger a merge operation with the baseline segment. The default is 20.
enable_window_compaction	New	Controls whether to use the window compaction feature for tenants. If `True`, window compaction is executed daily; if `False`, tenant compaction is executed daily. The default is `False`.
sql_func_extension_mode	New	Controls whether to enable mainstream compatibility-style functions whose names differ from MySQL, such as ClickHouse. The default is '', meaning off.
zone_deploy_mode	New	Controls whether the current tenant is in a homogeneous or heterogeneous zone mode. The default value is HOMO, indicating a homogeneous zone mode. This value can be changed to `HETERO` to enable heterogeneous zone capabilities, supporting up to two different `UNIT NUM` types. Current heterogeneous zones cannot be changed to homogeneous zones.
enable_gts_standalone	New	Controls whether to enable the GTS service to use a dedicated group of units. Enabling this can improve the performance and stability of the GTS service.
enable_mlog_auto_maintenance	New	Tenant-level parameter that controls automatic creation and management of MLOG. The default value is `True`.
mlog_trim_interval	New	Tenant-level parameter that controls the scheduling interval of the MLOG background trim task. The default value is 1d.
enable_mv_binlog_minimal_mode	New	Tenant-level parameter that controls whether Minimal mode is enabled for materialized view DML. When enabled, CLOG payload is simplified to speed up materialized view refresh; downstream consumers that rely on full materialized view CLOG may break. Evaluate before enabling. The default is `False`.
px_target_workers_per_cpu	New	Controls the ratio of expected parallel execution threads to tenant CPU count. The default is 8. In older versions, this functionality was implemented by the system variable `parallel_servers_target`, which specified the absolute value of px threads and required adjustment after scaling, which was not flexible. Therefore, the new version changes to system adaptive dynamic adjustment.
px_target_low_watermark	New	Sets the low watermark percentage threshold for parallel threads. When exceeded, it starts to downgrade the parallelism of newly scheduled DFOs to avoid CPU resource exhaustion. The default is 60.
px_target_high_watermark	New	Sets the high watermark percentage threshold for parallel threads. The closer the real-time load is to this watermark, the greater the parallelism downgrade. If exceeded, parallelism is reduced to 1. The default is 60.
kv_ttl_delete_max_ops	New	Controls the maximum number of operations per second for TTL tasks in a tenant. The default is 0, indicating no throttling for TTL task deletions.
system_protected_tenant_parameters	New	Cluster-level parameter that lists tenant-level parameters locked from modification by ordinary users. The default is ''.
document_ai_file_max_size	New	Limits the maximum file size that can be processed in one operation for Document AI-related expressions (`LOAD_FILE` and `AI_PARSE_DOCUMENT`).

System variable changes

System variable	Change type	Description
ap_query_route_policy	New	Controls replica adaptive routing: `OFF` disables it; `AUTO` routes analytical queries to columnstore replicas by query traits and cost; `FORCE` prefers columnstore replicas. Default is `AUTO`. Can also be set per query with the `opt_param` hint.
ap_query_cost_threshold	New	Cost threshold for adaptive columnstore replica selection. Default is 200000. Parallel `SELECT` plans follow rules to use a columnstore replica; for single-threaded rowstore `SELECT`, if cost exceeds this value, planning uses a columnstore replica instead.
ap_query_replica_fallback	New	When TP/AP automatic routing is on, controls whether to fall back to a non-columnstore replica if no columnstore replica is available. Default is `True` (fall back).
caching_sha2_password_digest_rounds	New	This variable specifies the number of `SHA256` iterations performed before storing the password hash to disk when using the `caching_sha2_password` authentication plugin. The default value is 5000.

View changes

View	Change type	Description
CDB/DBA_OB_BACKUP_VALIDATE_JOBS	New	Displays information about backup validation jobs initiated by users.
CDB/DBA_OB_BACKUP_VALIDATE_JOB_HISTORY	New	Displays historical information about backup validation jobs initiated by users.
CDB/DBA_OB_BACKUP_VALIDATE_TASKS	New	Displays information about backup validation tasks initiated by users.
CDB/DBA_OB_BACKUP_VALIDATE_TASK_HISTORY	New	Displays historical information about backup validation tasks initiated by users.
CDB/DBA_TABLES	Added column	Added the `DELTA_FORMAT` column to display the storage format of incremental data for a table. Added the `SKIP_INDEX_LEVEL` column to display the creation level of Skip Indexes. Added the `TTL_DEFINITION` column to display the TTL attribute of a table.
[G]V$OB_EXTERNAL_CATALOG_CLIENT_POOL_STAT	New	Obtains the client pooling information of a catalog.
CDB/DBA_OB_TTL_TASKS	New	Displays the execution status of TTL tasks.
CDB/DBA_OB_TTL_TASK_HISTORY	New	Displays the execution history of TTL tasks.
[G]V$OB_HNSW_INDEX_SEGMENT_INFO	New	Provides observability at the Segment level for vector indexes.
DBA_OB_VECTOR_INDEX_TASKS	Added column	Added the `TASK_TYPE` column to describe the task type. Added the `TASK_INFO` column to describe the progress of the task.
DBA_OB_VECTOR_INDEX_TASK_HISTORY	Added column	Added the `TASK_TYPE` column to describe the task type. Added the `TASK_INFO` column to describe the progress of the task.
CDB/DBA_OB_LOB_CHECK_TASKS	New	Records the progress of LOB consistency checks and recovery tasks.
CDB/DBA_OB_LOB_CHECK_EXCEPTION_RESULT	New	Records the exception tables and tablets and the exception type of LOB consistency checks.
CDB/DBA_OB_MAJOR_COMPACTION	Added column	Added the `MODE` column to record the type of daily major compactions.
CDB/DBA_OB_DEADLOCK_EVENT_HISTORY	Value Changed	`module`: Fixed to `transaction`. `visitor`: `{session_id:$1}:{txid:$2}`. `$1` and `$2` are numbers. (In earlier versions, it is `{txid:xx}`.) `object`: The primary key of a row. The format is `{addr:"$1:$2"}:{ls:$3}:{tablet:$4}:{row_key:{$5}}`. `$1` is the IP address, `$2` is the port, and the combination of `$1` and `$2` is the address of the process where the conflict occurred. `$3` is the log stream where the conflict occurred. `$4` is the tablet where the conflict occurred. `$5` is the string representation of the row lock primary key. (In earlier versions, it is `{row_key:xxx}`.) `extra_name1`: Fixed to `wait_sql`. (In earlier versions, it is `current sql`.) `extra_value1`: The SQL statement that is being executed when the deadlock occurs. `extra_name2`: Fixed to `hold_sql_request_time`. (In earlier versions, it is empty.) `extra_value2`: The request time of the statement that holds the lock. (In earlier versions, it is empty.) `extra_name3`: Fixed to `hold_sql`. (In earlier versions, it is empty.) `extra_value3`: The SQL statement that holds the `cycle_idx - 1` record corresponding to the `object`. (In earlier versions, it is empty.)
CDB/DBA_OB_LS	Added column	Added the `UNIT_LIST` column to describe the UNIT distribution list of a log stream.
CDB/DBA_OB_BALANCE_JOBS	Added column	Added the `ZONE_UNIT_NUM_LIST` and `PARAMETER_LIST` columns to record each zone's `UNIT NUM` and balance-related parameters for the tenant.
CDB/DBA_OB_BALANCE_JOB_HISTORY	Added column	Added the `ZONE_UNIT_NUM_LIST` and `PARAMETER_LIST` columns to record each zone's `UNIT NUM` and balance-related parameters for the tenant.
CDB/DBA_WR_ACTIVE_SESSION_HISTORY	Added column	Added the `WEIGHT` column to record the weight of an ASH record.
[G]V$ACTIVE_SESSION_HISTORY	Added column	Added the `WEIGHT` column to record the weight of an ASH record. Added the `IS_WR_WEIGHT_SAMPLE` column to indicate whether the row is a weighted sample produced by compressing queue-backlog wait events.
[G]V$OB_ACTIVE_SESSION_HISTORY	Added column	Added the `WEIGHT` column to record the weight of an ASH record. Added the `IS_WR_WEIGHT_SAMPLE` column to indicate whether the row is a weighted sample produced by compressing queue-backlog wait events.
CDB/DBA_WR_SQLSTAT	Added column	Added the `SAMPLE_TIME` column to record the time when the `SQLSTAT` snapshot is written to disk.
[G]V$OB_SQLSTAT	Added column	Added the `LATEST_ACTIVE_TIME` column to record the last active time of a plan corresponding to a `sql_id + plan_hash` pair on a node.
[G]V$OB_SQL_AUDIT	Added column and Semantics Changed	Added the `COMMIT_TIME` column to record transaction commit duration. The semantics of `ELAPSED_TIME` now include commit time.
CDB/DBA_OB_SENSITIVE_RULES	New	Displays the definitions and attributes of sensitive rules.
CDB/DBA_OB_SENSITIVE_COLUMNS	New	Displays the column information protected by sensitive rules.
CDB/DBA_OB_SENSITIVE_RULE_PLAINACCESS_USERS	New	Displays the users or roles with the `PLAINACCESS` privilege on sensitive rules.
ALL/DBA/USER_OBJECTS	Object Added	Added the display of the `SENSITIVE RULE` object.
ALL/DBA/USER/ROLE_TAB_PRIVS	Object Added	Added the display of the `PLAINACCESS` privilege at the object level.
DBA/USER/ROLE_SYS_PRIVS	Object Added	Added the display of the `PLAINACCESS` and `CREATE SENSITIVE RULE` privileges at the user level.
[G]V$OB_SINDI_INDEX_INFO	New	Displays the information of the SINDI vector index.
[G]V$OB_HNSW_INDEX_INFO	Column name change	The `SNAP_INDEX_CNT` and `INCR_INDEX_CNT` columns are updated to `SNAP_VECTOR_CNT` and `INCR_VECTOR_CNT`.
[G]V$OB_PLAN_CACHE_PLAN_STAT	New column	The `create_reason`, `cache_node_id`, `pcv_id`, and `plan_set_id` columns are added to display the reason why the current plan is added to the Plan Cache.
CDB/DBA_OB_USERS	New column	The `plugin` column is added to record the name of the `plugin` used for password hash calculation.

Syntax changes

Syntax	Description
New ALTER SYSTEM VALIDATE BACKUP/BACKUPSET/ARCHIVELOG_PIECE syntax	Used to initiate backup validation.
New ALTER SYSTEM CANCEL VALIDATE BACKUP syntax	Used to cancel a backup validation task.
New delta_format table attribute in CREATE TABLE/ALTER TABLE syntax	Used to control the format of incremental data. In AP workloads, the default value is `encoding`, and in other workloads, the default value is `flat`.
New skip_index_level table attribute in CREATE TABLE/ALTER TABLE syntax	Used to control the data range for generating Skip Index aggregate statistics. The default value is 0. When set to 0, Skip Index aggregate information is generated for all columnar baselines of the table, and for row-based baselines, Skip Index aggregate information is generated based on the schema. Incremental sstables do not generate Skip Index aggregate information. When set to 1, in addition to the settings for 0, Skip Index aggregate information is generated for incremental sstables of tables in `DELETE_INSERT` or `INSERT_ONLY` mode based on the baseline behavior.
New REST catalog-related attributes in CREATE EXTERNAL CATALOG syntax	Added the `TYPE = 'REST'` catalog type to access Iceberg tables through standardized HTTP REST APIs.
New SHOW DATABASES FROM catalog and SHOW TABLES FROM catalog.database syntax	Allows users to view the `databases` in a specified catalog and the `tables` in a specified `catalog.database`.
New TTL (Time To Live) syntax in CREATE TABLE	When creating a table, you can specify `TTL ora_rowscn + INTERVAL 10 DAY BY COMPACTION` to set the table to automatically delete data older than 10 days.
New Iceberg table time travel syntax	In Iceberg table queries, you can specify `{VERSION \\| TIMESTAMP} AS OF '<version_identifier>'` to query data from a specific snapshot in the past.
New UNIT_LIST clause in ALTER SYSTEM CREATE/MODIFY LS syntax	Added a new log stream maintenance syntax for specifying the unit list for log stream distribution in heterogeneous zone mode.
New cascading refresh-related parameters in DBMS_MVIEW.REFRESH command	`DBMS_MVIEW.REFRESH('mv_name', 'refresh_method', nested=> true, nested_refresh_mode => 'consistent/inconsistent')` Specifies the refresh strategy for nested materialized views. If `nested` is set to `true`, cascading refresh is performed for nested materialized views; if `nested` is set to `false`, independent refresh is performed for nested materialized views. Specifies the consistency behavior for cascading refresh. If `nested` is set to `true`, `consistent` and `inconsistent` describe the two cascading refresh behaviors, respectively. If `nested_refresh_mode` is not specified, the default is cascading non-consistent refresh.
New individual/inconsistent/consistent refresh strategy attributes in CREATE/ALTER MATERIALIZED VIEW	`CREATE MATERIALIZED VIEW mv1 REFRESH [individual/inconsistent/consistent] AS xxx` specifies the refresh strategy when creating or modifying a nested materialized view: `individual` specifies independent refresh. `inconsistent` specifies cascading non-consistent refresh. `consistent` specifies cascading consistent refresh.
New AS OF PROCTIME() attribute for materialized view base tables	For example, `CREATE MATERIALIZED VIEW mv REFRESH FAST ON DEMAND AS SELECT xxx FROM t1 LEFT JOIN t2 AS OF PROCTIME() ON t1.c1 = t2.c1 LEFT JOIN t3 ON t1.c2 = t3.c2 WHERE xxx;` allows you to specify `AS OF PROCTIME()` for the base table when creating a materialized view to skip refreshing this table during incremental refresh. Tables with `AS OF PROCTIME()` do not need an MLOG.
New SENSITIVE RULE-related syntax	`CREATE SENSITIVE RULE <rule_name> ON <sensitive_field_list> USING <sensitivity_protection_spec>;` is used to create a sensitive data protection rule. `DROP SENSITIVE RULE <rule_name>;` is used to drop a sensitive data protection rule. `ALTER SENSITIVE RULE <rule_name> <alter_sensitive_rule_action>;` is used to modify the properties of a sensitive data protection rule. `GRANT/REVOKE CREATE SENSITIVE RULE ON . TO/FROM user_or_role;` is used to grant or revoke the permission to create sensitive rules for a user or role. `GRANT/REVOKE PLAINACCESS ON SENSITIVE RULE <rule_name> TO/FROM user_or_role;` or `GRANT/REVOKE PLAINACCESS ON . TO/FROM user_or_role;` is used to grant or revoke the permission to access columns protected by sensitive rules in plaintext for a user or role. `SHOW SENSITIVE RULES;` is used to view all sensitive data protection rules and the columns they protect, with support for filtering by rule name, table name, or database name.
New vector_index hint in vector queries	`/+ vector_index (table_name index_name [ filter_type]) /` specifies the vector index and filter type to use for vector index queries.
New type=sindi_sq index type in vector indexes	`vector index idx1(c2) with (distance=inner_product, type=sindi_sq, lib=vsag)`
New ALTER SYSTEM FLUSH SQL AUDIT/PS CACHE command in Oracle mode	Supports using the ALTER SYSTEM FLUSH SQL AUDIT statement in Oracle mode to refresh SQL audit records. Supports using the ALTER SYSTEM FLUSH PS CACHE statement in Oracle mode to refresh PS cache.
Support for subqueries in WITH CHECK OPTION	Compatible with the behavior of WITH CHECK OPTION with subqueries in Oracle and MySQL.
CREATE USER statement with caching_sha2_password authentication	`CREATE USER IDENTIFIED WITH caching_sha2_password BY PASSWD;` `CREATE USER IDENTIFIED WITH caching_sha2_password BY PASSWORD auth_string;`
Syntax for creating a search index	`CREATE TABLE t (c1 INT, c2 JSON, c3 json, ..., SEARCH INDEX idx_search (c1, c2 [WITH ...], c3));`: Used to create a Search Index with the table. `CREATE SEARCH INDEX idx_search ON t (c1, c2 [WITH...], c3, ...);`: Used to add a Search Index after the table exists. `SELECT JSON_EXPR(jdoc, path [pick {json_number \| json_string}]) FROM t;` added parameters for search index queries, enabling precise filtering of JSON scalar types to optimize query accuracy.
Added FTS_INDEX_TYPE option for full-text indexes	The `FTS_INDEX_TYPE [=] FILTER/MATCH/PHRASE_MATCH` option is added to `FULLTEXT INDEX` in `CREATE TABLE`, `ALTER TABLE ADD FULLTEXT INDEX`, and `CREATE FULLTEXT INDEX`.
ALTER SYSTEM MAJOR FREEZE TENANT [=] tenant_name TABLE_ID = table_id syntax	Added syntax for specifying table-level major compactions.

System package changes

System package	Change type	Description
dbms_lob_manager	New	Used to execute LOB consistency detection tasks. It includes multiple subprograms such as `CHECK_LOB`, `CANCEL_JOB`, `SUSPEND_JOB`, `RESUME_JOB`, and `RESCHEDULE_JOB`.
dbms_daily_maintenance	New	Used to manage window compaction tasks. It includes subprograms such as `TRIGGER_WINDOW_COMPACTION_PROC` and `SET_THREAD_COUNT`.
dbms_scheduler	New internal task	The `DAILY_MAINTENANCE_WINDOW` internal task is added. You can use the `DBMS_SCHEDULER.DISABLE`, `ENABLE`, and `SET_ATTRIBUTE` subprograms to manage this task.
dbms_resource_manager	New subprogram	The `COPY_PLAN` subprogram is added. It is used to copy an existing resource management plan and its related resource plan directives.
sdo_util	New	The `GETVERTICES` subfunction is supported. It is used to extract the vertex coordinates of a geometry object (Geometry).
sdo_geom	New subfunction	The `SDO_AREA` subfunction is added. It is used to calculate the area of a two-dimensional geometry object.
dbms_workload_repository	New parameter	The `SQLSTAT_INTERVAL` parameter is added to the `MODIFY_SNAPSHOT_SETTINGS` subprogram. It represents the `SQLSTAT` sampling interval in minutes.
dbms_vector	New subfunction	The `QUERY_RECALL` subfunction is added. It is used to calculate the recall rate of an SQL statement. The `INDEX_RECALL` subfunction is added. It is used to calculate the recall rate of a vector index.

Function changes

Function	Change type	Description
SDO_CONTAINS	New	Determines whether one geometry (geometry A) strictly contains another geometry (geometry B).
SDO_ANYINTERAC	New	Determines whether any form of intersection or contact exists between two geometries.
MAX_PT	New	Retrieves the maximum partition value that contains data in the current table. Recommended for use in list partitioning scenarios.
DATE_TRUNC	New	Used for truncating dates.
MONTHS_ADD	New	Used for date addition calculations.
TO_DATE	New	Used for converting strings to date types in a specific format.
CAST AS BIGINT	New	Used for converting data to the `bigint` type.
ARBITRARY	New	Used for non-deterministically returning any non-`NULL` value from a set of rows.
ISNAN	New	A ClickHouse-compatible extension function for determining whether a numeric value `number` is "not a number" (NaN).
ADDDATE	New	A ClickHouse-compatible extension function for date addition calculations.
FORMATDATETIME	New	A ClickHouse-compatible extension function for formatting dates or date times.
TOUNIXTIMESTAMP	New	A ClickHouse-compatible extension function for converting a date time expression to a Unix timestamp.
EDITDISTANCE	New	A ClickHouse-compatible extension function for calculating the edit distance between two strings.
STDDEVSAMP	New	A ClickHouse-compatible extension function for returning the sample standard deviation of an expression.
UNIQ	New	A ClickHouse-compatible extension function for approximately calculating the number of distinct values in a column.
VARSAMP	New	A ClickHouse-compatible extension function for returning the sample variance of an expression.
ANY	New	A ClickHouse-compatible extension function for selecting any value from a set of values.
GROUPCONCAT	New	A ClickHouse-compatible extension function for concatenating the values of multiple rows in a group into a single string.
LAGINFRAME	New	A ClickHouse-compatible extension function for returning the value of `expr` from `offset` (default is 1) rows before the current row within the current window frame.
LEADINFRAME	New	A ClickHouse-compatible extension function for returning the value of `expr` from `offset` (default is 1) rows after the current row within the current window frame.
MD5_CONCAT_WS	New	Concatenates multiple strings into one string and returns the MD5 hash of the concatenated string as the result.
LOAD_FILE	New	Reads external storage files into the database via a temporary external table interface, supporting local files, HDFS, OSS, and S3 files, and returning corresponding BLOB data.
AI_PARSE_DOCUMENT	New	Invokes a specified visual understanding/OCR model to identify and parse document data (e.g., PDF) into a specified text format.
AI_SPLIT_DOCUMENT	New	Splits the content of a document in a specified format and outputs a fixed-format table to record the split content.

Upgrade notes

Non-shared storage mode

You can smoothly upgrade from V4.4.1 and earlier versions of the V4.4.x series to V4.6.0.
You can smoothly upgrade from V4.3.5 BP2 and earlier versions of the V4.3.x series to V4.6.0, but only in POC and test environments. This upgrade may carry some risks, and if the upgrade fails, the environment may need to be rebuilt. We recommend that you upgrade to V4.4.1 first and then to V4.6.0.
You can smoothly upgrade from V4.5.0 to V4.6.0.
You cannot upgrade from V4.2.x or earlier versions to V4.6.0.
You cannot upgrade from V4.4.2 to V4.6.0.

Shared storage mode

You cannot upgrade from a lower version of the shared storage mode to V4.6.0.

Recommended versions of components and tools

Component	Version
ODP	ODP 4.3.6.2 Note If you use the `caching_sha2_password` authentication method, you must use ODP 4.4.0.
OCP	OCP 4.4.2
ODC	ODC 4.4.2
OBCDC	OBCDC 4.6.0
OMS	OMS 4.3.2 BP1 Note If you do not need incremental data from OceanBase Database V4.6.0, use OMS V4.3.2 BP1. If you do need that incremental data, apply a hot patch to Store.
Binlog	Binlog 4.3.6.3
OCCI	OCCI 1.0.6
OBCI	OBCI 2.1.1.2
ECOB	ECOB 1.2.1
OBClient	OBClient 2.2.13
LibOBClient	LibOBClient 2.2.13
JDBC	JDBC 2.4.17
ODBC	ODBC 2.0.9.8
.NET	.NET 1.0.0
Import and export	ob-loader-dumper 4.3.5

V4.6.0

Version information

Release date: April 10, 2026
Version: V4.6.0
RPM version: oceanbase-4.6.0.0-100000132026040922

Version overview

OceanBase Database V4.6.0 is a major version upgrade designed for modern data processing.

Comprehensive upgrade of AI hybrid search capabilities

This release introduces a native SQL hybrid search interface that supports multi-modal integrated queries across vector, full-text, and scalar data. It adds match phrase search and search index, and refactors the hybrid search execution framework. Additionally, this release significantly optimizes both the querying and building performance of full-text indexes, delivering a high-performance, high-precision hybrid search experience for scenarios such as AI knowledge bases.
Enhanced vector database features

This release optimizes the performance of HNSW indexes in incremental scenarios, extends IVF indexes to support tables without primary keys, and improves large-scale clustering results using HGraph. It also introduces recall rate evaluation for vector indexes and sparse vector optimization. Combined with Document AI processing capabilities, this release builds an end-to-end "RAG all-in-one" knowledge base solution.
Improved HTAP mixed workload capabilities

This release implements intelligent routing and strong consistency reads for columnstore replicas, and introduces automatic partition splitting for columnstore tables. It optimizes the PX parallel plan queuing and adaptive task partitioning mechanisms, and supports incremental data encoding as well as skip index, providing more balanced and efficient analytical processing capabilities for mixed workloads.
PL engine performance optimization

This version improves system package execution efficiency and supports reusing Plan Cache for dynamic SQL inside PL. It enhances FORALL batch execution, UDF index lookups, asynchronous transaction commit, and streaming cursors, significantly improving stored procedures and complex business logic and providing stronger support for high-concurrency OLTP workloads.
Improved compatibility and functionality integrity

This version supports MySQL Session temporary tables and optimizes the performance of global temporary tables in Oracle. It enhances SQL capabilities with features such as recursive CTE, partition exchange, and materialized views. Additionally, it supports Iceberg V3 format and V2 write operations, provides the REST catalog protocol, and supports complex data types. This offers a more comprehensive solution for enterprise data integration and analysis.
Optimized system performance

The performance of large-capacity domestic machines is enhanced. The performance of PDML and replica table queries is optimized, achieving a 14x performance improvement in high-concurrency scenarios. A lock-free memory allocator is introduced, leading to a 168% performance improvement in single-threaded memory allocation and significantly enhancing the overall system performance.
Enhanced OBKV data model capabilities

This release adds weak read routing to the nearest IDC, TTL task index scanning, and rate-limiting optimizations, and improves HBase monitoring metrics. It also supports low-level put operations for the time-series model and optimizes hot key queries, providing more efficient data access and O&M monitoring capabilities for time-series and wide-table scenarios.
Enhanced OBCDC features

This release supports incremental data synchronization after table-level recovery to ensure the stable operation of OBCDC. Additionally, it supports the real-time calculation of virtual generated columns to ensure downstream MySQL Binlog compatibility, improving the reliability and consistency of data synchronization.
Comprehensive improvements in security and reliability

This release introduces non-intrusive column encryption and the caching_sha2_password authentication mechanism, and optimizes the standby database read feature to reduce message interactions. It also enhances the integrity verification of backup data and improves resource limits for large queries to ensure service quality when large and small queries coexist, providing more comprehensive data protection and stability guarantees for core enterprise businesses.
Enhanced O&M and management capabilities

This release refactors the SQLSTAT and ASH architectures to ensure complete and accurate monitoring data. It refines wait events, optimizes Plan Cache memory management and diagnostics, and improves LOB consistency verification capabilities as well as DROP/PURGE DATABASE performance. Furthermore, this release adds support for heterogeneous zones and introduces the window compaction feature to optimize compaction performance, providing smarter and more efficient O&M guarantees for large-scale enterprise production environments.

Key features

Enhanced hybrid search capabilities

Hybrid search SQL syntax

Starting from V4.4.1, you can use the DBMS_HYBRID_SEARCH system package to perform hybrid search on vector, full-text, and scalar indexes. In V4.6.0, a more user-friendly SQL hybrid search interface is introduced, supporting native SQL syntax for multimodal fusion search. The new SELECT ... FROM HYBRID_SEARCH(TABLE table_name, DSL_STRING) syntax allows you to access the DAS hybrid search framework through pure SQL, providing better query performance. In heap tables, you can perform hybrid combinations of vector search (KNN), full-text search, scalar filtering (term/range/terms), and JSON/array expression queries. It supports various fusion algorithms, including weighted fusion, RRF (Reciprocal Rank Fusion), and WRRF (Weighted RRF). It also supports multi-vector queries, complex Boolean combinations (must/should/filter/must_not), and min_score score filtering. Additionally, it supports score normalization for multi-query results.
Match Phrase feature in full-text indexes

To meet the short phrase search requirements in knowledge base scenarios, the new version supports the Match Phrase feature. This feature maintains the position arrays of tokens within documents in the inverted index auxiliary table, enabling precise phrase matching and fuzzy matching with tolerance distance (slop) in the full-text index. It supports the match_phrase syntax similar to Elasticsearch, including the slop parameter for controlling the tolerance of token distances. It also maintains compatibility with the old MATCH AGAINST PHRASE MODE syntax while improving performance. The new FTS_INDEX_TYPE=PHRASE_MATCH index type option is available, with position information storage enabled by default but can be disabled to save storage space. During queries, a dedicated Phrase Query iterator and slop calculation algorithm are used for pre-filtering optimization, suitable for document search scenarios with higher phrase weights after algorithm processing in RAG workflows, enhancing the accuracy and efficiency of hybrid search.
Search Index feature

To meet the efficient search requirements for complex types like JSON/ARRAY/MAP in hybrid search scenarios, the new version supports the Search Index feature. Compared to traditional function indexes and multi-value indexes, Search Index supports equality, inclusion, and range queries on all paths of JSON fields through a single index, addressing performance issues in Ad Hoc queries and schema-less scenarios. Search Index is created using the CREATE SEARCH INDEX syntax and allows path filtering with include_paths/exclude_paths and type filtering with include_types/exclude_types. The new PICK JSON_TYPE syntax (e.g., pick json_number/pick json_string) enables precise filtering of JSON scalar types, optimizing query accuracy. The index uses an inverted table structure and supports Index Merge for combined queries. Initially, it is only supported for heap tables, providing high-performance indexing capabilities for flexible JSON queries in knowledge base platforms, enhancing query efficiency in hybrid search scenarios.
Optimized hybrid search execution framework

The new version provides a unified dynamic pruning execution framework for advanced hybrid search scenarios like RAG knowledge bases. This framework supports unified docid pruning and block max score pruning across full-text, scalar, and multi-value indexes. By restructuring into composable search operators, it offers excellent extensibility and supports multi-layer Boolean query semantics described with ES DSL. Adaptive execution strategies are introduced, including cost-driven primary iterator selection, data prefetch width optimization, sort/lookup strategy selection, and switching between sparse pruning and vectorized execution strategies, enhancing hybrid search performance. It also provides hybrid search diagnostics based on query profiles, supporting partition-level profile display of operator execution trees and key statistics to help identify performance bottlenecks.
Optimized full-text index query performance

The new version has been deeply optimized for full-text search TopK and incremental data scenarios. By implementing a lightweight advance_scan mechanism at the storage layer, optimizing BM25 expression calculation parameters, and improving incremental data query algorithms, the efficiency of full-text index search has been significantly enhanced. In practical tests, this optimization can improve TopK query performance by 80% in scenarios with pure incremental data and achieve significant performance improvements in mixed data scenarios.
Optimized full-text index construction performance

The new version optimizes the full-text index construction process to address the issue of excessively long construction times for large-scale data. Previously, the main bottlenecks in full-text index construction in OceanBase Database were sorting, writing, and tokenization. The new version improves performance through several core optimizations, streamlining the construction process to ensure that inverted tables and forward tables are tokenized only once. By scanning the main table, data is ordered by docid, eliminating the need for global sorting. Writing efficiency is improved by using memory-based CONST or RLE encoding formats to handle duplicate rows, reducing the impact of storage row expansion. Tokenization performance is optimized by using a swiss hash map to reduce cache misses, with the tokenization hash map created only once and the tokenizer implementation optimized.

For example, with 32 parallel threads, the index construction time for space tokenization in the msmacro dataset decreased from 313.37 seconds to 46.15 seconds; for the mldr_en dataset, it decreased from 277.14 seconds to 28.22 seconds; for the mldr_zh dataset using the IK smart tokenizer, it decreased from 5152.72 seconds to 62.75 seconds; and for the mldr_zh dataset using the IK max_word tokenizer, it decreased from 5463.48 seconds to 70.63 seconds.

Comprehensive enhancement of vector capabilities

HNSW vector index incremental optimization

To address the pain points of memory bloat, degraded query performance, and slow restart loading caused by excessive incremental data in HNSW vector indexes, this release introduces a segmented architecture and an automatic memory management mechanism:
1. It divides incremental data into four segment layers (Active, Frozen, Incr, and Base), supporting freezing, persistence, and background compaction.
2. It controls the upper memory limit of the Active Segment and the compaction trigger threshold through the ob_vector_index_active_segment_max_size and ob_vector_index_merge_trigger_percentage parameters, balancing performance and resource consumption.
3. It provides the dbms_vector.flush_index and dbms_vector.compact_index interfaces to manually trigger the flush and compaction of vector indexes.
4. It adds the GV$OB_HNSW_INDEX_SEGMENT_INFO view to enhance observability. After optimization, the memory footprint of vector indexes in frequent write scenarios is controllable, query performance is stable, and fault recovery is faster. Furthermore, it maintains compatibility with original user interfaces, allowing users to achieve performance improvements without modifying their business logic.
IVF index support for tables without primary keys and heap tables

Traditional IVF indexes require a table to have a primary key, which limits their application on data without primary keys. This release extends the capabilities of the IVF index, allowing it to be applied to tables without primary keys and heap tables. This enables enterprises to build efficient vector indexes on data without primary keys, such as historical logs and user behavior, providing more possibilities for data analysis and search. This release adapts the index table structures of IVF_flat and IVF_pq indexes for tables without primary keys, ensuring they can be used together with other index types, such as HNSW and full-text indexes, in hybrid search scenarios. It supports multi-partition scenarios as well as correct DDL, DML, and query operations, with query performance also improved in certain scenarios.
IVF index support for HGraph

To address large-scale vector index scenarios where the number of IVF cluster centers (nlist) exceeds 5,000, traditional IVF using heap sort search incurs excessive computational costs. This release supports building a lightweight graph structure (HGraph) on the IVF cluster center table, significantly accelerating cluster center search speeds by replacing brute-force search with graph traversal algorithms. During the building process, each iteration accelerates the graph construction for cluster centers and reduces the memory consumption of the center_distance matrix, while supporting the rapid location of candidate clusters during the query phase. It maintains a high recall rate with controllable memory overhead, making it particularly suitable for large-scale vector search scenarios where nlist ranges from 5,000 to 65,000. It is recommended for cosine distance calculations; it is not recommended for large nlist scenarios under L2 distance due to the massive memory consumption of pre-computed tables.
Vector index recall rate evaluation

This release provides user-queryable vector index quality evaluation capabilities to solve the challenges of vector index performance tuning. Through the query_recall and index_recall interfaces in the DBMS_VECTOR package, users can proactively calculate the approximate search recall rate for specific queries or specified indexes, evaluating the matching degree between approximate search results and exact brute-force search results. The background automatically samples user queries, periodically calculates the average recall rate in batches, and persists it to the DBA_VECTOR_INDEX_RECALL_REPORT system view. This feature supports configuring sampling rates, time windows, and resource limits, providing quantitative evaluation metrics for vector indexes to facilitate index tuning.
In-memory sparse vector index optimization

This release includes several key optimizations for in-memory sparse vectors:
1. It optimizes sparse vector parsing by replacing regular expressions with character stream parsing, improving performance by approximately 8%.
2. It migrates the sparse vector rearrangement mechanism from the vsag library to the observer side to reduce memory usage.
3. It optimizes the pre-filter to skip unnecessary calculations in high-filtering scenarios.
4. It supports SINDI_SQ quantized indexes to reduce storage costs.
5. It adds the [G]V$OB_SINDI_INDEX_INFO view for sparse vector index information.
6. It adds the index_vector_memory_advisor subprogram to the DBMS_VECTOR system package to estimate the memory usage of vector indexes for tables yet to be created, and the index_vector_memory_estimate subprogram to analyze the memory usage of vector indexes for existing tables.
These optimizations improve the query performance and memory efficiency of sparse vector processing, and enhance the ease of use for vector index maintenance and diagnostics.
Support for multiple semantic indexes on a single column

Currently, to test the performance of different types of semantic indexes, users can only create them on different columns or tables for comparison. To make it easier for users to conduct performance testing and comparisons of semantic indexes under different models or parameters, this release supports creating multiple semantic indexes on a single column. It introduces the vector_index hint syntax, allowing users to specify the semantic index and filtering method (pre_filter or post_filter) used by the query via hints. When multiple semantic indexes are created on the same column, queries must explicitly specify the semantic index to be queried using the vector_index hint; otherwise, an error will be reported. This feature can only be used in testing environments and must be enabled via the _enable_multiple_semantic_indexes_on_column parameter. Do not enable it in production environments.
Document AI capability enhancement

V4.6.0 introduces Document AI features, enhancing document processing capabilities. Includes three core functions:
1. LOAD_FILE expression reads files from external storage, supporting local files, HDFS, OSS, S3, etc.
2. AI_PARSE_DOCUMENT parses content of complex-type documents.
3. AI_SPLIT_DOCUMENT table function splits document content and outputs segment information.
Combined with AI_EMBED, AI_RERANK, AI_COMPLETE, and hybrid search capabilities, it enables "RAG all in One" knowledge base solutions.

Optimizations for mixed HTAP workloads

Unified entry for columnstore replicas and full-featured (F) replicas

To support mixed TP and AP workloads in HTAP scenarios where business cannot be separated, OceanBase Database V4.6.0 introduces automatic routing for columnstore replicas. This feature allows columnstore replicas and F replicas to use the same request entry. The optimizer automatically routes complex analytical queries to columnstore replicas based on features such as weak reads, parallel execution, or when the cost of row-based queries exceeds a threshold. This enables automatic traffic splitting between TP and AP workloads, improving query efficiency and resource utilization. The feature provides three control modes:
1. OFF (disabled).
2. AUTO (intelligent routing).
3. FORCE (force columnar execution).
New system variables ap_query_route_policy, ap_query_cost_threshold, ap_query_replica_fallback, and hints provide flexible query-level control. These settings are compatible with the existing ob_route_policy strategy, and when set to COLUMN_STORE_ONLY, they take precedence.
Strong consistency reads on columnstore replicas

Traditional analytical queries often tolerate weak consistency reads, which provide cost-effective performance scaling. However, in HTAP scenarios with strict data consistency requirements, strong consistency reads are necessary to ensure the real-time and accurate results of analytical queries. OceanBase Database V4.6.0 introduces strong consistency reads on columnstore replicas. When an AP query is automatically identified and routed to a columnstore replica, if the ob_read_consistency parameter is set to 'strong', the system retrieves a global consistent snapshot version using GTS and checks the replay status of the columnstore replica. If the replica has not replayed to the required snapshot, it waits for the replay to complete before executing the read operation. This ensures that the latest data consistent with the primary replica is read. Compared to weak reads, strong reads provide stronger consistency guarantees, supporting AP queries requiring transaction-level consistency.
Automatic partition splitting for columnstore tables

Earlier versions of OceanBase Database supported automatic partition splitting for rowstore tables. V4.6.0 extends this feature to columnstore tables with a primary key. This feature determines the split point based on sampled memory and SSTable data within a partition and splits the data by row number into two target partitions. It shares the same parameters (enable_auto_split and auto_split_tablet_size) and syntax as automatic partition splitting for rowstore tables. However, by default, automatic partition splitting for columnar SSTables is triggered only when the number of rows exceeds 1 million. This feature helps distribute large table data across multiple partitions, improving AP query and write performance and achieving load balancing.
Optimized parallel execution queuing mechanism for PX

The previous approach of capping threads using the parallel_servers_target parameter was inflexible and could misallocate resources. OceanBase Database V4.6.0 introduces a new parallel execution queuing strategy based on real-time load. By statistically analyzing the real-time number of PX threads, this strategy enables queuing before execution and dynamic adjustment of parallelism during execution. When the real-time thread count exceeds the px_target_low_watermark low watermark, the parallelism is linearly reduced. When it exceeds the px_target_high_watermark high watermark, the parallelism is reduced to 1 to avoid CPU overload. This new strategy better addresses resource scheduling imbalances and resource waste in large-scale parallel scenarios.
Adaptive task partitioning for PX

The new version optimizes the phenomenon of long-tail tasks in SQL parallel execution scans. It dynamically adjusts the parallel task partitioning strategy based on data distribution and system load, improving the efficiency and resource utilization of parallel execution. In columnstore table scans, after using Skip Index for rapid filtering, data is concentrated in a few granules, while other granules are quickly consumed. When the remaining granules are fewer than the PX threads, the previous version would experience a drop in parallelism and long-tail scanning. The new version introduces PX task rebalancing capabilities. During runtime, if the remaining granules are fewer than the PX threads, the remaining granules are further split. This ensures a more balanced distribution of scanning tasks, preventing some threads from being idle while others bear excessive workload, thereby improving the parallel scanning efficiency of columnstore tables.
Optimized query performance for incremental data

Based on the LSM-Tree architecture, OceanBase Database provides a storage mode where baseline data is stored by column and incremental data is stored by row. This is suitable for scenarios with real-time, high-frequency data writes and complex analysis. However, when there is a large amount of incremental data, query performance may be affected. The new version optimizes incremental data read speed by supporting encoding and Skip Index for incremental data. It also pushes down expression aggregation statistics at the data block level. For the TPC-H 100G dataset, when the table mode is Delete-Insert and all data is incremental, hot query performance improves by approximately 260%, and cold query performance improves by approximately 150%. Incremental encoding can be enabled via the tenant-level parameter default_delta_format or the table-level attribute delta_format. Incremental Skip Index can be enabled via the tenant-level parameter default_skip_index_level or the table-level attribute skip_index_level.

Optimizer and executor enhancements

Optimized hard parsing performance for SQL

To address excessive memory usage during hard parsing of complex SQL statements with many repeated elements (such as large IN, OR, and INSERT clauses), the new version optimizes memory usage by restructuring the array data structures used during SQL parsing. Key optimizations include:
1. Introducing a new ObSqlArray to replace ObSEArray, reducing memory overhead.
2. Removing the auto_free mechanism and using allocator for explicit construction to prevent memory bloat.
3. Eliminating pre-allocated arrays to reduce initial memory consumption.
4. Adjusting the expansion strategy to dynamically calculate block_size based on data types.
These changes effectively address static memory overhead for empty arrays and dynamic expansion memory bloat, significantly improving memory efficiency during hard parsing of complex SQL statements.
Enhanced predicate derivation capabilities

The new version optimizes the logic for general predicate derivation (IN, LIKE, !=) to allow cross-view derivation. It also uses rules to determine the derivation relationships between different types of predicates. Additionally, it supports the removal of redundant NLJ (Nested Loop Join) predicates, identifying and eliminating those that are redundantly calculated on the right side if they were already executed on the left side, especially when the right side is a subquery or view. This allows for predicate elimination at the base table level, reducing unnecessary computational overhead.
Enhanced plan pushdown capabilities

In V4.6.0, the generation of Runtime Filter has been moved from the first stage to the second stage of the optimizer, enhancing plan robustness. The version control mechanism removes the first-stage generation logic, and in the second stage, Runtime Filter is added as a supplement to the selected plan, reducing reliance on statistics. The effective producer judgment logic has been enhanced, identifying operators with filtering effects instead of just filter operators. Runtime Filter now supports cross-DFO (Data Flow Operator) transmission and broader pushdown capabilities, penetrating more operators such as Window, Group By, and Set Operation. It also improves column derivation for equality join conditions, enhancing data filtering efficiency.

Additionally, the new version enhances Group By pushdown capabilities, solving the issue of intelligent pushdown for aggregation operators. Using a post-order traversal of the logical plan tree, Group By is pushed down during plan generation, avoiding high overhead from cost verification. It supports splitting Group By at JOIN nodes, performing Local Shuffle before aggregation to improve sharding matching, and intelligent placement decisions for Group By. It identifies and pushes down Single Group By, optimizing query performance while maintaining useful sharding, particularly beneficial for large-scale scenarios like star schemas, reducing intermediate data transmission.
Optimized common temporary table extraction

In V4.6.0, the logic for extracting common subexpressions in SQL statements has been optimized. Within a composite SQL statement, the new version can more intelligently identify shared computational parts among multiple similar queries with GROUP BY clauses. For example, for multiple similar queries like SELECT SUM(t1.c1), t2.c3 FROM t1, t2 WHERE t1.c1 = t2.c1 AND t2.c3 = some_value GROUP BY t2.c3, the old version could only extract common table expressions without GROUP BY, while the new version can extract the complete computation including GROUP BY into a common temporary table. Each branch then filters the results of the common temporary table based on the condition t2.c3 = some_value. By sharing the same grouping computation results, it reduces redundant calculations and improves performance.
Optimized execution of vector assignment syntax

The execution layer introduces the AliasRef expression to replace the previous cross-layer direct reference method. This addresses the expression specification issues when using vector assignment syntax like SET (col1, col2) = (SELECT ...) in UPDATE statements, ensuring correct execution and proper specification description during the optimizer phase.
Multi-value index support for Index Merge

To address performance issues with JSON multi-value indexes in scenarios with multiple multi-value predicates, the new version allows multi-value indexes in the Index Merge path. It generates separate scan nodes for each multi-value predicate, enabling precise predicate extraction and reducing expression computation. It also supports rewriting JSON_CONTAINS into multiple MEMBER OF predicates, avoiding non-precise predicate calculations and improving query efficiency for multi-value index queries, especially in JSON arrays with multiple query conditions.
Optimized Join Order selection algorithm

When SQL queries need to join multiple tables (such as in an order system querying user, product, and logistics information), different join orders can drastically affect query speed. Previous versions used the Iterative Dynamic Programming (IDP) algorithm for Join Order enumeration to find the optimal plan, but this was time-consuming with large datasets. V4.6.0 introduces a heuristic permutation enumeration algorithm that can quickly and with less memory enumerate a relatively optimal plan, accelerating hard parsing. It also employs an adaptive dual-algorithm strategy: for small-scale queries, it uses precise dynamic programming to find the optimal solution; for complex queries, it switches to the efficient permutation enumeration algorithm to quickly find a good approximate solution. This optimization is particularly beneficial for report queries and data analysis, significantly reducing query plan generation time and avoiding computational explosion through intelligent pruning.
Optimized performance for distinct aggregation queries

Traditional three-phase parallel algorithms for queries with multiple distinct aggregations can suffer from data replication bloat, high memory usage, and disk spilling. The new version introduces several optimizations:
1. Replaces redundant data replication with the Expansion operator to reduce shuffle data volume.
2. The Hash Distinct operator supports independent hash table construction for distinct columns, reducing row length and computational overhead.
3. Supports intelligent rewriting of filter distinct scenarios, converting count(distinct if(...)) into a combination of any_true and count to avoid redundant replication of the same data column.
4. Optimizes transmission encoding by compressing invalid NULL columns. This significantly reduces memory consumption and disk I/O, preventing performance degradation from data bloat, especially in large-scale scenarios with multiple distinct column aggregations.
Optimized Hash Group By Limit Pushdown

To address high memory and CPU consumption in GROUP BY ... LIMIT K queries where the Hash Group By operator may need to compute all groups, the new version introduces Limit pushdown optimization. When the number of groups in the hash table reaches the Limit threshold, it stops creating new groups and only processes data that belongs to existing groups. By pushing the LIMIT value down to the Hash Group By operator, the algorithm complexity is reduced from O(N) to O(K), significantly decreasing unnecessary calculations and memory usage. Testing shows a 7-8x performance improvement and a 10x reduction in Hash Group By operator execution time for 1M rows with Limit 10, substantially reducing resource consumption for large-scale group queries.
Enhanced function computation performance

V4.6.0 optimizes the computation performance of several functions, including approx_count_distinct, listagg ... rollup, time extraction functions for string types, cast string to datetime/date, sec_to_time, replace, hex, reverse, instr, locate, concat, lrpad, left, right, sign, exp, sha/sha2, aesdecrypt, crc32, from_base64, bitwise operation functions, and IP-related functions. This further accelerates data processing and meets the requirements for real-time computing. The new version also enhances the self-validation capabilities of the vectorized expression framework, improving the stability of expression computations.
Window operator support for streaming computation

To address the issue of high memory and CPU overhead caused by materializing entire partitions during window function computations, V4.6.0 implements fully streaming execution. It supports row_number, rank, and dense_rank functions, as well as first_value, last_value, and nth_value under specific window definitions (ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) and rules, and aggregate functions without the distinct clause. It also supports first_value and nth_value under specific window definitions (ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) and rules. By restructuring the window operator framework and optimizing modules such as partition encoding, extraction of common operations, and temporary memory lifecycle management, the system can compute and output results row by row without materializing the entire partition when conditions are met. When the cardinality of the partition key is low and concurrency is high, it combines range distribution optimization to enable concurrent execution.
System variable loading optimization

The new version optimizes the performance of loading system variables during the initialization of inner session from multiple dimensions. In the old version, each time a new inner session was created, over 800 system variables needed to be loaded with their default values, which introduced performance bottlenecks such as string comparisons, length calculations, and type conversions. The new version adopts a hybrid loading strategy, preloading 6 core variables (such as character_set_connection and sql_mode) and over 50 INFLUENCE_PLAN variables that affect query plans, while lazily loading other variables as needed. It also optimizes the underlying implementation by using id indexing to accelerate lookups and implementing a dedicated fast type converter to avoid the overhead of the general framework. In complex query scenarios, the percentage of init_session_info dropped from 34% to 3%, and the execution time decreased from 100 seconds to 55.889 seconds, demonstrating significant optimization.
Performance optimization for Hive/Iceberg external table queries

The new version accelerates predicate filtering on string columns in Hive/Iceberg tables that use dictionary encoding. By filtering at the dictionary code level, it avoids unnecessary string decoding overhead. It also supports a memory optimization mechanism that copies dictionary values to shared memory for multiple rows. It provides both white-box and black-box filtering modes. In test scenarios such as TPCH Q19, integer comparisons at the dictionary code level significantly outperform string comparisons, especially in high-selectivity filtering scenarios where the number of unique values in a column is much smaller than the total number of rows.
Support for Bloom Filter in external tables for Parquet/ORC files

The new version introduces support for Bloom Filter indexes in external table queries to accelerate equality predicate filtering on Parquet and ORC files. By leveraging the built-in Bloom Filter metadata of the file format, it can quickly determine whether a target value may exist in the current data block before reading data when the query condition is an equality predicate such as col = 'xxx'. The implementation uses a lazy loading strategy, loading the Bloom Filter only when the min-max range cannot determine the filtering. This optimization is particularly effective for equality queries on high-cardinality columns (such as UUIDs and usernames), significantly reducing unnecessary I/O reads.
Parallel parsing of CSV files in external tables

V4.6.0 introduces parallel parsing of single CSV files to improve the efficiency of large-scale data imports and external table queries. It implements intelligent file splitting by combining the gambling boundary method with full-scan boundary detection. When the number of files is less than or equal to twice the number of threads, the system automatically selects a parallel parsing strategy based on the complexity of the data format. New FORMAT options (PARALLEL_PARSE_ON_SINGLE_FILE, PARALLEL_PARSE_FILE_SIZE_THRESHOLD, and MAX_ROW_LENGTH) provide flexible control over parallel behavior. By leveraging the multi-state conversion mechanism of the GI operator and SQC coordination, it achieves efficient parallel processing while maintaining data integrity. The new version also supports the SNAPPY compression algorithm for CSV files in external tables.
Optimization for downloading large partitions from ODPS (MaxCompute) across multiple machines

To address the issue of high data volume in a single partition and bandwidth limitations during single-machine downloads in ODPS external table imports, V4.6.0 implements parallel downloading of the same partition across multiple machines. During the QC phase, it identifies large partitions by parallelly obtaining their physical sizes. In the SQC phase, it dynamically and finely splits tasks. By utilizing the transaction consistency feature of the ODPS Tunnel API, it deserializes sessions in SQC to enable parallel reading across multiple machines. The task scheduling algorithm is optimized to prioritize large partitions and ensure load balancing, reducing initialization overhead while maintaining compatibility with single-machine parallel processing. This optimization effectively addresses partition skew scenarios, fully utilizing cluster network bandwidth to enhance import performance.

Transaction and storage engine enhancements

Optimized small transaction dump status

For high-frequency small transaction scenarios, the transaction status refill mechanism is optimized to speed up data cleanup. This avoids long transactions blocking data recycling: even when some transactions are still uncommitted, the system can clean up data from completed transactions, improving read and write performance.
Migration and replication support for partition splitting and macroblock reuse

In scenarios such as migration, replication, and physical recovery, if the source side has multiple versions of major SSTables or macroblocks that can be reused after partition splitting, the destination side still needs to copy them repeatedly, leading to wasted space and bandwidth. V4.6.0 optimizes the macroblock reuse capability during migration and replication. By building a reuse mapping table, scheduling the splitting tablets in sequence, and modifying the macroblock copy RPC, the destination side can reuse the major macroblocks from the source side and shared macroblocks between multiple versions of major SSTables. The new TabletGroupGeneratorDag dynamically schedules dependencies, ensuring that the source side is migrated before the destination side. This feature reduces the amount of macroblock copying, improves migration efficiency, and reduces storage space usage.
Append-only tables

append-only is a new table update model in OceanBase Database, following partial-update and delete-insert. append-only tables only allow INSERT operations and prohibit UPDATE, DELETE, and other operations that may involve updates, such as REPLACE INTO. Data written to an append-only table cannot be modified. This feature is suitable for scenarios requiring immutable data, such as financial transaction records, logs, and IoT data monitoring. It meets the requirements for audit compliance, efficient appending, and high-performance queries. To create an append-only table, use the MERGE_ENGINE = append_only syntax. The write order does not require rowkeys to be incrementally increasing, and you can combine it with TTL (Time To Live) for expired data cleanup.
TTL (Time To Live)

The TTL (Time To Live) feature is a new data expiration deletion feature based on the merge mechanism. It defines expiration strategies for SQL-mode relational tables. Unlike traditional KV-mode tables, which use background threads to write DELETE rows, TTL asynchronously deletes data during merges without occupying Clogs or write bandwidth. Deletion information is only recognized internally by OceanBase Database and is not synchronized to downstream systems. Users can configure the enable_ttl switch, set the ttl_duty_time time point, or manually trigger a TTL task. When a TTL task is triggered, the system synchronizes the deletion rules to the MDS through a transaction. During merges, data is filtered and physically deleted based on the recycle version number. TTL supports DELETE_INSERT and APPEND_ONLY table modes. Currently, it only supports expiration strategies based on the hidden column ora_rowscn.

PL engine enhancements

Performance optimization of the system package execution framework

The new version optimizes the system package execution framework to avoid the need to store and deeply copy PL symbol tables in paramstore. In a test scenario with 1 million data records, the execution time for select count(utl_raw.CAST_TO_RAW(c1)) from t2 decreased from 6.05 seconds to 5.17 seconds (a 14.5% improvement), and the execution time for select count(dbms_lob.getlength(c1)) from t2 decreased from 4.89 seconds to 3.89 seconds (a 20.4% improvement).
Performance optimization of the PL execution framework

The new version optimizes the column convert overhead in scenarios where PL input parameters are of basic types. It supports precomputing parameter default values and shallow copying constant parameters. By eliminating memory contexts, avoiding schema guard refreshes, canceling column conversions, and preallocating parameter storage memory, the inner call performance improves by 3%-11%. Optimizations such as avoiding unnecessary deep copies by storing results after checking returned rows and caching tenant configurations in the top-level PL context improve loop execution performance. Moving sys_var_in_pl_cache_str_ to the session-level cache eliminates hotspots caused by repeated serialization during PL cache retrieval. Optimizing the execution frameworks for multiple layers of package routine and subprogram routine improves performance by 24%-39% through top-level PL context caching and avoiding unnecessary execution context creation. Caching execution environment information at the session level avoids repeated system variable loading, improving performance by 5%. Optimizing the execution frameworks for multiple layers of PL improves performance by 44%-75% through top-level PL context caching, disabling end-to-end tracing, optimizing expression cleanup logic, and sharing vslice allocator across multiple layers of PL.
Performance optimization of expression evaluation within PL

The new version resolves the issue of long paths for reading and writing package variables, refines data type conversion logic, and eliminates unnecessary conversions in assignment statements, default value generation, and deep copies of complex types. It also removes the clear expr flag logic during expression evaluation. This results in a 12%-26% performance improvement in expression evaluation within PL.
Enhanced support for FORALL in PL/SQL

The new version enhances the functionality of the FORALL statement in PL/SQL, supporting advanced use cases such as dynamic SQL batch execution, indices of collection, and values of index_collection. It also supports complex types (such as PL extension types, JSON, and geometric types) as members of collection. This optimization significantly improves the performance of batch DML operations in PL/SQL, allowing loop execution scenarios to leverage the arraybinding feature of the SQL engine for efficient batch processing.
Performance optimization of the dbms_random system package

The new version optimizes the performance of the dbms_random system package through eight improvements: eliminating unnecessary expression evaluations, simplifying package variable access paths, optimizing the execution framework, and eliminating unnecessary column conversions and UDF calls. In some scenarios, this results in a 42% performance improvement.
Support for query range extraction for non-deterministic UDFs with constant parameters

The new version supports extracting query ranges for non-deterministic UDFs with constant parameters, enabling index filtering for scenarios where col = udf(). This significantly improves SQL computation performance in these scenarios.
Asynchronous commit support for PL transactions

The new version introduces asynchronous commit logic at the protocol layer to manage PL transaction commit status, supporting parallel processing of transaction commits and PL execution in distributed transaction scenarios. This achieves asynchronous transaction commits within stored procedures, effectively addressing thread occupancy issues caused by frequent synchronous commits at the PL layer, thereby improving system throughput. In complex transaction scenarios, performance can improve by up to 39%, providing significant performance benefits for high-concurrency PL workloads.
Streaming cursor prefetch optimization

The new version reduces the overhead of SQL auditing and cursor environment switching by prefetching data in batches in memory. This optimization significantly improves the performance of streaming cursors, with performance improvements of 44%-54% when cursor data volume is greater than or equal to the batch cache size (100 rows). This optimization requires enabling the _enable_streaming_cursor_prefetch parameter.
Support for reading uncommitted transactions in streaming cursors

In Oracle mode, reading uncommitted data from streaming cursors and then committing the transaction causes the cursor to become invalid, which is incompatible with Oracle behavior. The new version optimizes this behavior by checking whether uncommitted data was read when the streaming cursor is opened and marking it accordingly. When the transaction is committed, affected streaming cursors are converted to non-streaming cursors, and data is loaded into local memory starting from the current cursor position, avoiding the "fetch out of sequence" error caused by incompatibility with Oracle behavior.
Trigger performance optimization

The new version optimizes the trigger execution path by introducing a direct method for retrieving trigger package bodies and adding the get_cached_package_body() method to avoid unnecessary package specification lookups. This reduces schema operation overhead during trigger execution, thereby improving trigger performance.
Performance optimization of dynamic SQL execution

The new version optimizes dynamic SQL precompilation by prioritizing PS cache lookups for precompiled statements, avoiding repeated execution plan creation, and adding parameter and expression counters to improve cache matching accuracy. This enhances dynamic SQL execution performance.
Parameterization support for dynamic SQL within PL

In older versions, only CALL statements were parameterized in PL, and internal SQL statements were not. Due to different parameter literals, internal SQL statements had different SQL IDs, preventing them from utilizing the Plan Cache. This led to frequent cache plan evictions and increased hard parsing and RT (response time) for PL requests. The new version uses Fast Parser technology to quickly identify constants and replace them with "?" placeholders, ensuring consistent SQL ID generation for the same logic. It also handles special syntax like IS NULL to prevent incorrect parameterization, ensuring syntax correctness. This optimization enables effective reuse of Plan Cache for internal SQL statements in high-concurrency scenarios, reducing plan generation overhead and improving performance.
Rewriting select into from dual statements to PL assignment statements

The new version supports rewriting select into from dual statements in PL/SQL to assignment statements to avoid SQL execution overhead. This optimization is controlled by the system variable PLSQL_CAN_TRANSFORM_TO_ASSIGN. This optimization improves performance by approximately 84%, significantly reducing the execution cost of such statements.

Compatibility and feature enhancements

Session-level temporary tables in MySQL mode

V4.6.0 introduces session-level private temporary tables in MySQL mode, supporting creation, modification, deletion of temporary tables, and index creation based on temporary tables, enhancing the smooth migration from MySQL.
Performance optimization for global temporary tables in Oracle mode

V4.6.0 restructures transaction-level and session-level global temporary tables in Oracle mode, introducing temporary tablet for storing temporary table data. This data is released at the end of the session, optimizing performance.
MySQL Recursive Union Distinct

MySQL 8.0 supports Recursive Union All and Recursive Union Distinct in Common Table Expressions (CTEs). OceanBase Database supports Recursive Union in MySQL mode since V3.2.3, but only Recursive Union All. V4.6.0 extends compatibility with MySQL Recursive Union Distinct, ensuring output data uniqueness. It also enhances Recursive Union All, supporting data persistence when memory is insufficient.
Prepare-phase parameters under the combined PS protocol in Oracle mode

With the combined PS protocol, the new version supports parsing during the Prepare phase when parameters are present, addressing type inference issues in Oracle mode when parameter information was unknown. Upgrades keep legacy behavior by default; enable _prexec_prepare_with_params when you need this feature.
Partition exchange between primary and subpartitioned tables

The new version supports partition exchange between a single-level partitioned table and a two-level partitioned table. The target table must be a two-level partitioned table whose first-level partitioning is RANGE or LIST. The source table must be a single-level partitioned table, and its partitioning must exactly match the corresponding subpartition definitions of the target table. Use the ALTER TABLE target_partition_table_name EXCHANGE PARTITION first_level_partition_name WITH TABLE origin_partition_table_name WITHOUT VALIDATION; syntax for flexible partition management, suitable for fast large-scale data migration. Note that statistics become invalid after the exchange and must be regathered.
JSON column conversion support

The new version supports converting JSON columns to text, mediumtext, or longtext, simplifying O&M tasks.
Enhanced DBMS_SCHEDULER in Oracle mode

The new version adds BYMONTH, BYYEARDAY, BYMONTHDAY, BYDAY, BYHOUR, BYMINUTE, and BYSECOND to the repeat_interval option in DBMS_SCHEDULER, offering flexible job scheduling. It also supports Chinese names for string parameters in the GBK character set.
Extended spatial functions in Oracle mode

Oracle mode introduces SDO_CONTAINS, SDO_ANYINTERACT, SDO_AREA, and SDO_UTIL.GETVERTICES for common GIS operations like spatial containment, intersection, area calculation, and vertex extraction. These functions are suitable for spatial data analysis, geospatial calculations, map rendering, and geometric quality checks.
Enhanced materialized views

The new version significantly enhances materialized view capabilities, including:
1. Incremental refresh now supports outer joins, UNION ALL, and non-aggregated single-table mode.
2. Materialized view logs support column addition and view renaming.
3. Nested materialized view cascading refresh resolves multi-level dependency issues.
4. Supports AS OF PROCTIME() to skip refreshing a dimension table during incremental refresh, improving efficiency. Incremental materialized views with multi-table joins can reference ordinary views that are declared with AS OF PROCTIME().
5. Aggregated incremental refresh now supports min/max functions and non-basic column parameters.
6. Supports nested incremental refresh and LEFT JOIN aggregation for non-key-based materialized views.
7. Automated MLOG management, automatically creating and cleaning redundant MLOG tables.
8. Full refresh materialized views now support UDFs.
9. New Minimal mode switch improves columnar wide table refresh performance.
10. Added range predicate comparison logic (<, <=, >, >=, =, BETWEEN AND) for materialized view rewriting, determining if the query range is fully contained.
These enhancements improve flexibility and performance for complex business scenarios.
Iceberg V3 format support

V4.6.0 extends support for Iceberg V3, enabling Time Travel queries to access historical data via branch names, tags, snapshot IDs, or timestamps, even after data deletion or modification. It also supports Deletion Vector, allowing retrieval and removal of marked-deleted rows during Iceberg table reads.
Iceberg Insert support

The new version supports writing data to Iceberg tables using INSERT INTO catalog.db.table SELECT ... syntax, with file types parquet and orc (controlled by write.format.default, defaulting to parquet). Supports serial and parallel write modes via parallel hint. Currently supports only V2 Iceberg tables, not partitioned tables, and does not collect column size, min/max, or null counts statistics. Maintains data type consistency with select into parquet/orc for data lake integration.
REST Catalog protocol support

Adds REST Catalog support, allowing standardized HTTP REST API access to Iceberg tables. OceanBase Database includes an HTTP client connection pool with configurable maximum connections, timeout, and keep-alive settings, reducing connection overhead. Provides monitoring view [G]V$OB_EXTERNAL_CATALOG_CLIENT_POOL_STAT for real-time client pool status. Use CREATE EXTERNAL CATALOG to create REST type catalogs, configurable with server address and authentication methods.
Parquet/ORC file support for complex data types

The new version supports reading and writing complex data types (Array, Map, JSON) in Parquet/ORC files. Files containing arrays (Array), maps (Map), or nested JSON can be directly imported via external or temporary tables and exported in the same format.

System performance improvements

Performance optimization in large-scale domestic machines

Optimization of CPU hotspots in typical scenarios for domestic CPUs such as Hanguang and Kunpeng, including PDML and replication table query performance. In a large 256-core environment, the high-concurrency PDML performance improves by 25%, and memory consumption in non-PDML scenarios decreases. Business query performance when connecting to followers for replication tables improves 14x, increasing from 20,000 QPS to 270,000+, with a gap of less than 10% compared to the 300,000 QPS in direct leader queries. In a small 32-core environment, the same optimizations also result in performance improvements.
Memory allocation for library functions is optimized.

We have designed a lock-free memory allocator based on the mimalloc algorithm. It optimizes the performance of small memory allocations when OceanBase Database integrates various external libraries. This allocator uses a fragmented memory block design, a delay-idle recovery strategy, and optimizes lock contention. It breaks through the performance of local small object allocation in threads. The core architecture includes a thread-local allocator, size-class-based management, and fast/slow path allocation mechanisms. The fast path provides nearly O(1) performance, while the slow path handles memory released across threads in batches. In a single-threaded scenario, its performance improves by about 168% compared to that of glibmalloc. In a 16-threaded local allocation scenario, the performance improves by about 180%. This supports more efficient memory management for large-scale and high-concurrency scenarios.

Enhancements to the OBKV data model

OBKV supports reading from read replicas and from the same region's data centers.

OBKV adds weak reads and closest-IDC reads: you can read from a non-leader replica to reduce leader load and prefer the replica in the nearest IDC to cut latency. This applies to OBKV-HBase 2.5.0+ and OBKV-Table 2.3.0+ clients. Use the follower_first or follower_only routing strategy, configured globally or per statement in the client configuration or code.
TTL-based tasks support index scans and rate limiting.

The TTL task now supports index scanning, allowing users to accelerate the TTL data cleanup process by specifying indexes. Traditional primary key scans require reading complete rows to determine if data has expired, but local indexes including the expiration column can significantly reduce I/O overhead and improve scan speed. This feature is enabled through the TTLScanIndex parameter in KV_ATTRIBUTES, supporting both Table and HBase models. The index must be a local index containing TTL-related columns, and global indexes are not supported.

At the same time, OBKV TTL tasks feature new throttling capabilities, allowing you to set the kv_ttl_delete_max_ops parameter to control the maximum throughput of delete operations for TTL tasks per second. When a tenant has limited resources or high request pressure, TTL tasks may impact system performance. Throttling helps effectively manage CPU and clog write bandwidth. The default value 0 indicates no throttling, while setting it to a value greater than 0 allows precise control over the deletion rate. This feature, combined with the ttl_thread_score parameter, ensures each partition task has a throughput limit, ensuring proper resource allocation.
Enhanced monitoring for OBKV-HBase

The new version optimizes the monitoring system for the OBKV-HBase server. Main improvements include: differentiating monitoring metrics for HBase Get and Scan operations, and distinguishing between single Put/Delete and batch operations. It also adds seven new types of monitoring item, namely TABLE_API_HBASE_CHECK_AND_MUTATE, TABLE_API_HBASE_SCAN, TABLE_API_HBASE_GET, TABLE_API_HBASE_BATCH_PUT, TABLE_API_HBASE_BATCH_DELETE, TABLE_API_HBASE_BATCH_GET, and TABLE_API_FAILED_OP. Clients can use the MetricsExporter or JMX interface to obtain detailed performance metrics such as QPS, RT, and P99 for more accurate performance monitoring and analysis. This feature requires the new version of OBKV-Table 2.3.0+ and OBKV-HBase 1.5.0+/2.5.0+ clients.
Hotspot key query optimization in OBKV-HBase

OBKV-HBase queries perform full-scan strategies. All data under qualifiers is scanned and then filtered. This process results in a large amount of redundant scan and processing overhead, especially when querying for a wide row or a multi-version row that involves only a few qualifiers. This causes significant delays. In V4.6.0, the HBASE_HTABLE_HOTKEY_GET_OPTIMIZE_ENABLE parameter is introduced to enable the optimization feature for hot Key Get operations when executing the Get operation. When this feature is enabled, the rescan technique is applied on the server side to optimize queries involving hot keys, thereby improving query performance in hot key scenarios.

Enhanced OBCDC

OBCDC supports incremental data synchronization after table-level restoration.

Table-level recovery is an offline DDL operation across tenants. In traditional solutions, OBCDC cannot obtain the source table information associated with the hidden table, leading to synchronization failures. The new version introduces the following improvements: filters the DDL operations during table-level recovery to prevent abnormal termination of OBCDC; and at the end of table-level recovery, the observer side records the final table name as a TABLE_RECOVER_END type DDL operation. OBCDC uses this final table name for black and white list matching. This feature applies to incremental data synchronization after table-level recovery is completed, ensuring the stable operation of OBCDC in the table-level recovery scenario.
OBCDC support for virtual generated columns

V4.6.0 supports computing virtual generated column values in real time on the OBCDC side in MySQL-compatible mode, so downstream systems stay compatible with MySQL Binlog Service. The feature is controlled by the enable_output_virtual_generated_column parameter. When it is enabled, OBCDC uses the SQL module calculation APIs together with virtual generated column dependency metadata from the data dictionary (including session-fixed variables and column default values) to compute and fill in virtual generated column values during formatting. This preserves correctness and MySQL Binlog behavior.

Security and reliability enhancements

Non-intrusive column encryption

Added sensitive column data protection feature, allowing database administrators to create column-level data protection rules on tables. When a user executes a SELECT operation and the projected columns include an encrypted column, the server checks if the user has plaintext access permissions. If yes, the encrypted column's plaintext results are returned normally. If not, the data is encrypted according to the data protection rules before being returned to the user.
Support for caching_sha2_password authentication

Added support for the caching_sha2_password authentication method in both MySQL and Oracle modes. This method can be specified when creating users, providing a more secure password protection mechanism. It requires ODP 4.4.0 for compatibility.
Optimized read from standby replicas

The new version optimizes the transaction state inference mechanism. Instead of each replica independently collecting participant states, the leader coordinator now uniformly collects and caches these states, significantly reducing message interactions. Additionally, a retry mechanism is introduced to automatically forward requests to other replicas when a replica falls behind, preventing blocking or incorrect transaction state inference.
Enhanced backup verification

V4.6.0 introduces physical backup verification to check the integrity and availability of backup and archive data. This feature allows users to verify backup sets (Backup Set) and archive logs (Archive Log) without performing actual recovery operations, checking for issues like missing files, physical data corruption, or logical inconsistencies. It offers two verification levels: BASIC (file list integrity check) and PHYSICAL (physical data correctness and logical check), catering to different verification needs. Supports verification of the entire backup path, specific backup sets, and specified archive pieces, with cross-cluster path verification capabilities.
Resource limits for large queries

Addressing the issue of long-running large queries consuming excessive CPU resources and affecting small query responses, the new version further optimizes the large query resource limits feature. The older version already supported estimating large query resource group retries based on plan cache historical durations before execution. V4.6.0 introduces a new trigger path, monitoring actual execution durations during runtime. If the threshold is exceeded, threads automatically switch to the large query resource group for throttling. The large query resource group uses cgroup to control CPU usage, defaulting to 30% to prevent large queries from starving new requests. When the number of threads in the large query resource group exceeds half of the tenant's maximum thread count, new large queries are no longer accepted, ensuring the default resource group has sufficient resources to handle small queries, maintaining coexistence with prioritized small query performance. The _large_query_cpu_quota_adjustment_step parameter can adjust the CPU throttling step size, with larger values resulting in faster throttling but potential QPS oscillation, and smaller values providing stable QPS but slower throttling. Please test carefully before use.

Operation and maintenance enhancements

Enhanced SQLSTAT capabilities

To address the issue of SQL statements without execution plans consuming excessive memory due to shared memory with the Plan Cache, we restructured the SQLSTAT storage architecture. We implemented a three-tier storage scheme to better manage memory usage. This scheme includes:
1. Cache the execution statistics of the last N runs on the physical plan.
2. Global SQLSTAT is managed independently of its lifecycle.
3. Use batched updates to reduce hash table access overheads.
Supports the independent eviction mechanism of SQLSTAT to avoid interference with the Plan Cache. Introduced a background thread that regularly monitors memory usage and evicts the least frequently used SQLSTAT entries when the memory limit is exceeded. Retained the physical plan pointer cache for SQLSTAT, accumulating execution statistics into a global HashMap at completion. Addressed the issues of incomplete data statistics and memory contention, ensuring the accuracy and continuity of SQL execution performance statistics.
ASH data integrity enhancement

In new versions, we introduce a compressed ASH data integrity enhancement mechanism to address the issue of ASH buffer overflow and loss of diagnostic information caused by a large accumulation of requests in the network queue. This mechanism uses tenant ID and PCode to aggregate requests in the queue, thereby compressing ASH write operations and preventing each request from writing multiple records. It dynamically detects the write speed using a sliding window average algorithm. If the average write amount over the past N seconds exceeds the buffer capacity threshold, compression is automatically enabled. If it falls below the threshold, compression is automatically disabled. The ASH table now includes a weight column and an aggregation flag column, ensuring that aggregated column data is completely written to the WR table, preventing sampling distortion. This addresses the problem of losing diagnostic information in high-concurrency scenarios, providing more complete session history data for troubleshooting and enhancing system observability.
Refined wait events

Wait events are essential for performance diagnostics. In earlier versions, many internal wait events of OceanBase Database were grouped under a single default wait event, making it difficult to pinpoint specific issues based on wait event names. In this release, default wait events are split out and largely replaced with dedicated wait events for more precise diagnostics.
Plan Cache diagnostics enhancements

The new version provides a systematic diagnostic capability for plan cache misses, simplifying the diagnostic process. It records detailed reasons for plan cache misses during the plan matching process, including 14 issues such as non-existent cache nodes, type mismatches, schema mismatches, and others. This information is stored in the newly added create_reason column of the [G]V$OB_PLAN_CACHE_PLAN_STAT view. Additionally, new columns such as cache_node_id, pcv_id, and plan_set_id are added to facilitate quick root cause analysis for plan misses and reduce operational complexity.
Plan Cache Memory Management Optimizations

This optimization addresses the memory allocation problem for the Plan Cache module by changing the memory judgment standard from mem_hold (page occupancy) to mem_use (actual usage). In older VOS versions, 8KB pages were used for memory allocation, which can lead to memory fragmentation. When diagnostic objects such as SQL_STAT consume a large amount of memory with fragmentation, even if the actual memory usage is low, mem_hold can reach its upper limit. This can prevent physical plans from being added to the cache and trigger LRU eviction. This optimization ensures more accurate reflection of the memory size of cache objects and avoids issues with cache mechanism failures caused by memory fragmentation.
SQL Audit/Plan Monitor query efficiency improvement

The query performance of the [G]V$OB_SQL_AUDIT and [G]V$SQL_PLAN_MONITOR system tables is enhanced by introducing late materialization technology. In earlier versions, due to the lack of support for filter pushdown, all columns had to be projected before filtering calculations. The new version generates pushdown filters during the code generator phase. During execution, it first projects and pre-filters the columns involved in filtering before projecting all output columns. This optimization significantly improves system table query performance in scenarios like trace_id equality queries.
LOB Consistency Check and Repair

If LOB data is stored separately in the main and auxiliary tables, there is a risk of inconsistency after failures. This version provides end-to-end consistency support: scheduled or manual main/auxiliary checks for three anomaly types (auxiliary row missing on the main side, main row missing on the auxiliary side, and length mismatch); the dbms_lob_manager package for repair, job control, and progress; and Resource Manager integration to cap resources for check jobs so background scans do not hurt workloads. New views DBA_OB_LOB_CHECK_TASKS and DBA_OB_LOB_CHECK_EXCEPTION_RESULT show job progress and exception details, addressing query failures and disk growth from inconsistent LOB data.
Performance optimization for DROP/PURGE DATABASE operation

Currently, the DROP DATABASE operation is executed atomically, during which table locks and writes to internal tables are executed serially. In scenarios with a large number of tables, this can lead to timeouts and failures. In the new version, asynchronous parallel deletion of tables is supported to improve performance. To enable this feature, set the _enable_atomic_drop_database parameter to False. Once enabled, the database is dropped in a non-atomic way. If deletion fails, tables that were already dropped are not rolled back. Use with caution. For PURGE DATABASE with the recycle bin enabled, parallel deletion of tables in __all_recyclebin is also supported. Performance improves substantially—for example, about 7x for 6,000 complex tables (56 seconds) and about 14.5x for 20,000 complex tables (101 seconds) in internal tests.
Heterogeneous zones

A tenant can now use up to two different UNIT NUM values across zones, improving operations and scale-out/scale-in flexibility. Previously, every zone of a tenant had to use the same UNIT NUM; when hardware failed and could not be replaced, you had to lower UNIT NUM in all zones, widening the maintenance impact. For capacity changes, you can use this capability to work on Follower replicas first, then switch the Leader to the zone where scaling has finished, for smoother expansion or shrinkage.
Window compaction

Traditional major compactions use a globally consistent snapshot to guarantee consistency, support primary and index table checks and primary/standby verification, and provide strong consistency guarantees. However, the mandatory global merge strategy becomes costly and hard to bound in time as incremental data grows, partition counts reach the millions, or workloads are columnstore-heavy. With large business datasets, merges can also spill into peak traffic windows.

To address this, V4.6.0 introduces the feature of window compaction. The built-in scheduled job DAILY_MAINTENANCE_WINDOW in DBMS_SCHEDULER enables users to initiate compactions on a scheduled time window (by default, 2:00 to 8:00). This feature initiates compactions only for tables with incremental data or partitions meeting specific criteria, prioritizing tasks based on factors like incremental row count, table popularity, and days since the last compaction. The compactions automatically stop at the end of the specified window. You can enable or disable this feature by setting the enable_window_compaction parameter and control the number of compaction threads using the SET_THREAD_COUNT() procedure. Optionally, you can use a dedicated resource plan to precisely manage the resources consumed during compactions, helping to ensure minimal impact on peak business performance.
Optimized deadlock detection

OceanBase Database exposes deadlock history in CDB/DBA_OB_DEADLOCK_EVENT_HISTORY, but earlier releases did not record SQL for holder nodes on the deadlock cycle, which made diagnosis harder. The new release enriches deadlock diagnostics: for example, it adds the visitor session id, the machine, log stream, and tablet where the wait occurs, plus the SQL currently running, the SQL holding the lock, and the lock request time after a deadlock—making deadlock analysis much easier.

Compatibility changes

Changes in product behavior

Feature	Change description
Limitations added to the OBKV-HBase time series model	Starting from V4.4.1, the time series model is supported based on the OBKV-HBase model. However, many HBase interfaces are not effective or meaningful in the time series model. New versions define the scope and behavior limitations of the time series model. The time series model is designed for write-heavy, read-light scenarios such as monitoring, IoT, and vehicle networking. It must be enabled with the distributed switch. This model does not support multiple column families, TimeToLive, and Cell TTL parameters, and some HBase interfaces are disabled. At the data operation level, only core interfaces such as Put, Put, Get, and Scan are supported, and operations like `Delete`, `Append`, and `Increment` are not supported. `Get` and `Scan` operations have parameter restrictions, but necessary features such as `TimeRange`, `setLimit`, `setCaching`, and `setMaxResultSize` are open to meet the query needs of time series data. These limitations ensure the efficiency and stability of the time series model and provide dedicated data access interfaces for time series scenarios.

Feature

Change description

Parameter changes

Parameter	Change type	Change description
default_delta_format	New	Specifies the default storage format for incremental data when creating a table. In AP load scenarios, the default is `encoding`, and in other load scenarios, the default is `flat`.
default_skip_index_level	New	Specifies the default Skip Index creation level when creating a table. In AP load scenarios, the default is 1, and in other scenarios, the default is 0.
enable_ttl	New	Controls whether to enable the TTL (Time To Live) feature. When enabled, background threads periodically delete expired rows defined by TTL. The default is `False`.
ttl_duty_time	New	Controls the scheduled time for TTL (Time To Live) background tasks. The default is "01:00".
default_table_merge_engine	Value change	Adds the `append_only` option. The `merge_engine` attribute in create table statements also synchronizes this option.
ob_vector_index_active_segment_max_size	New	Controls the maximum memory usage of the vector index Active Segment for a single partition. If exceeded, it will forcibly trigger a freeze persistence operation. The default is 0, indicating system adaptive.
ob_vector_index_merge_trigger_percentage	New	Controls the maximum proportion of memory that the incremental segment of the vector index can occupy relative to the corresponding partition's vector index memory. If exceeded, the system will automatically trigger a merge operation with the baseline segment. The default is 20.
enable_window_compaction	New	Controls whether to use the window compaction feature for tenants. If `True`, window compaction is executed daily; if `False`, tenant compaction is executed daily. The default is `False`.
sql_func_extension_mode	New	Controls whether to enable mainstream compatibility-style functions whose names differ from MySQL, such as ClickHouse. The default is '', meaning off.
zone_deploy_mode	New	Controls whether the current tenant is in a homogeneous or heterogeneous zone mode. The default value is HOMO, indicating a homogeneous zone mode. This value can be changed to `HETERO` to enable heterogeneous zone capabilities, supporting up to two different `UNIT NUM` types. Current heterogeneous zones cannot be changed to homogeneous zones.
enable_gts_standalone	New	Controls whether to enable the GTS service to use a dedicated group of units. Enabling this can improve the performance and stability of the GTS service.
enable_mlog_auto_maintenance	New	Tenant-level parameter that controls automatic creation and management of MLOG. The default value is `True`.
mlog_trim_interval	New	Tenant-level parameter that controls the scheduling interval of the MLOG background trim task. The default value is 1d.
enable_mv_binlog_minimal_mode	New	Tenant-level parameter that controls whether Minimal mode is enabled for materialized view DML. When enabled, CLOG payload is simplified to speed up materialized view refresh; downstream consumers that rely on full materialized view CLOG may break. Evaluate before enabling. The default is `False`.
px_target_workers_per_cpu	New	Controls the ratio of expected parallel execution threads to tenant CPU count. The default is 8. In older versions, this functionality was implemented by the system variable `parallel_servers_target`, which specified the absolute value of px threads and required adjustment after scaling, which was not flexible. Therefore, the new version changes to system adaptive dynamic adjustment.
px_target_low_watermark	New	Sets the low watermark percentage threshold for parallel threads. When exceeded, it starts to downgrade the parallelism of newly scheduled DFOs to avoid CPU resource exhaustion. The default is 60.
px_target_high_watermark	New	Sets the high watermark percentage threshold for parallel threads. The closer the real-time load is to this watermark, the greater the parallelism downgrade. If exceeded, parallelism is reduced to 1. The default is 60.
kv_ttl_delete_max_ops	New	Controls the maximum number of operations per second for TTL tasks in a tenant. The default is 0, indicating no throttling for TTL task deletions.
system_protected_tenant_parameters	New	Cluster-level parameter that lists tenant-level parameters locked from modification by ordinary users. The default is ''.
document_ai_file_max_size	New	Limits the maximum file size that can be processed in one operation for Document AI-related expressions (`LOAD_FILE` and `AI_PARSE_DOCUMENT`).

System variable changes

System variable	Change type	Description
ap_query_route_policy	New	Controls replica adaptive routing: `OFF` disables it; `AUTO` routes analytical queries to columnstore replicas by query traits and cost; `FORCE` prefers columnstore replicas. Default is `AUTO`. Can also be set per query with the `opt_param` hint.
ap_query_cost_threshold	New	Cost threshold for adaptive columnstore replica selection. Default is 200000. Parallel `SELECT` plans follow rules to use a columnstore replica; for single-threaded rowstore `SELECT`, if cost exceeds this value, planning uses a columnstore replica instead.
ap_query_replica_fallback	New	When TP/AP automatic routing is on, controls whether to fall back to a non-columnstore replica if no columnstore replica is available. Default is `True` (fall back).
caching_sha2_password_digest_rounds	New	This variable specifies the number of `SHA256` iterations performed before storing the password hash to disk when using the `caching_sha2_password` authentication plugin. The default value is 5000.

View changes

View	Change type	Description
CDB/DBA_OB_BACKUP_VALIDATE_JOBS	New	Displays information about backup validation jobs initiated by users.
CDB/DBA_OB_BACKUP_VALIDATE_JOB_HISTORY	New	Displays historical information about backup validation jobs initiated by users.
CDB/DBA_OB_BACKUP_VALIDATE_TASKS	New	Displays information about backup validation tasks initiated by users.
CDB/DBA_OB_BACKUP_VALIDATE_TASK_HISTORY	New	Displays historical information about backup validation tasks initiated by users.
CDB/DBA_TABLES	Added column	Added the `DELTA_FORMAT` column to display the storage format of incremental data for a table. Added the `SKIP_INDEX_LEVEL` column to display the creation level of Skip Indexes. Added the `TTL_DEFINITION` column to display the TTL attribute of a table.
[G]V$OB_EXTERNAL_CATALOG_CLIENT_POOL_STAT	New	Obtains the client pooling information of a catalog.
CDB/DBA_OB_TTL_TASKS	New	Displays the execution status of TTL tasks.
CDB/DBA_OB_TTL_TASK_HISTORY	New	Displays the execution history of TTL tasks.
[G]V$OB_HNSW_INDEX_SEGMENT_INFO	New	Provides observability at the Segment level for vector indexes.
DBA_OB_VECTOR_INDEX_TASKS	Added column	Added the `TASK_TYPE` column to describe the task type. Added the `TASK_INFO` column to describe the progress of the task.
DBA_OB_VECTOR_INDEX_TASK_HISTORY	Added column	Added the `TASK_TYPE` column to describe the task type. Added the `TASK_INFO` column to describe the progress of the task.
CDB/DBA_OB_LOB_CHECK_TASKS	New	Records the progress of LOB consistency checks and recovery tasks.
CDB/DBA_OB_LOB_CHECK_EXCEPTION_RESULT	New	Records the exception tables and tablets and the exception type of LOB consistency checks.
CDB/DBA_OB_MAJOR_COMPACTION	Added column	Added the `MODE` column to record the type of daily major compactions.
CDB/DBA_OB_DEADLOCK_EVENT_HISTORY	Value Changed	`module`: Fixed to `transaction`. `visitor`: `{session_id:$1}:{txid:$2}`. `$1` and `$2` are numbers. (In earlier versions, it is `{txid:xx}`.) `object`: The primary key of a row. The format is `{addr:"$1:$2"}:{ls:$3}:{tablet:$4}:{row_key:{$5}}`. `$1` is the IP address, `$2` is the port, and the combination of `$1` and `$2` is the address of the process where the conflict occurred. `$3` is the log stream where the conflict occurred. `$4` is the tablet where the conflict occurred. `$5` is the string representation of the row lock primary key. (In earlier versions, it is `{row_key:xxx}`.) `extra_name1`: Fixed to `wait_sql`. (In earlier versions, it is `current sql`.) `extra_value1`: The SQL statement that is being executed when the deadlock occurs. `extra_name2`: Fixed to `hold_sql_request_time`. (In earlier versions, it is empty.) `extra_value2`: The request time of the statement that holds the lock. (In earlier versions, it is empty.) `extra_name3`: Fixed to `hold_sql`. (In earlier versions, it is empty.) `extra_value3`: The SQL statement that holds the `cycle_idx - 1` record corresponding to the `object`. (In earlier versions, it is empty.)
CDB/DBA_OB_LS	Added column	Added the `UNIT_LIST` column to describe the UNIT distribution list of a log stream.
CDB/DBA_OB_BALANCE_JOBS	Added column	Added the `ZONE_UNIT_NUM_LIST` and `PARAMETER_LIST` columns to record each zone's `UNIT NUM` and balance-related parameters for the tenant.
CDB/DBA_OB_BALANCE_JOB_HISTORY	Added column	Added the `ZONE_UNIT_NUM_LIST` and `PARAMETER_LIST` columns to record each zone's `UNIT NUM` and balance-related parameters for the tenant.
CDB/DBA_WR_ACTIVE_SESSION_HISTORY	Added column	Added the `WEIGHT` column to record the weight of an ASH record.
[G]V$ACTIVE_SESSION_HISTORY	Added column	Added the `WEIGHT` column to record the weight of an ASH record. Added the `IS_WR_WEIGHT_SAMPLE` column to indicate whether the row is a weighted sample produced by compressing queue-backlog wait events.
[G]V$OB_ACTIVE_SESSION_HISTORY	Added column	Added the `WEIGHT` column to record the weight of an ASH record. Added the `IS_WR_WEIGHT_SAMPLE` column to indicate whether the row is a weighted sample produced by compressing queue-backlog wait events.
CDB/DBA_WR_SQLSTAT	Added column	Added the `SAMPLE_TIME` column to record the time when the `SQLSTAT` snapshot is written to disk.
[G]V$OB_SQLSTAT	Added column	Added the `LATEST_ACTIVE_TIME` column to record the last active time of a plan corresponding to a `sql_id + plan_hash` pair on a node.
[G]V$OB_SQL_AUDIT	Added column and Semantics Changed	Added the `COMMIT_TIME` column to record transaction commit duration. The semantics of `ELAPSED_TIME` now include commit time.
CDB/DBA_OB_SENSITIVE_RULES	New	Displays the definitions and attributes of sensitive rules.
CDB/DBA_OB_SENSITIVE_COLUMNS	New	Displays the column information protected by sensitive rules.
CDB/DBA_OB_SENSITIVE_RULE_PLAINACCESS_USERS	New	Displays the users or roles with the `PLAINACCESS` privilege on sensitive rules.
ALL/DBA/USER_OBJECTS	Object Added	Added the display of the `SENSITIVE RULE` object.
ALL/DBA/USER/ROLE_TAB_PRIVS	Object Added	Added the display of the `PLAINACCESS` privilege at the object level.
DBA/USER/ROLE_SYS_PRIVS	Object Added	Added the display of the `PLAINACCESS` and `CREATE SENSITIVE RULE` privileges at the user level.
[G]V$OB_SINDI_INDEX_INFO	New	Displays the information of the SINDI vector index.
[G]V$OB_HNSW_INDEX_INFO	Column name change	The `SNAP_INDEX_CNT` and `INCR_INDEX_CNT` columns are updated to `SNAP_VECTOR_CNT` and `INCR_VECTOR_CNT`.
[G]V$OB_PLAN_CACHE_PLAN_STAT	New column	The `create_reason`, `cache_node_id`, `pcv_id`, and `plan_set_id` columns are added to display the reason why the current plan is added to the Plan Cache.
CDB/DBA_OB_USERS	New column	The `plugin` column is added to record the name of the `plugin` used for password hash calculation.

Syntax changes

Syntax	Description
New ALTER SYSTEM VALIDATE BACKUP/BACKUPSET/ARCHIVELOG_PIECE syntax	Used to initiate backup validation.
New ALTER SYSTEM CANCEL VALIDATE BACKUP syntax	Used to cancel a backup validation task.
New delta_format table attribute in CREATE TABLE/ALTER TABLE syntax	Used to control the format of incremental data. In AP workloads, the default value is `encoding`, and in other workloads, the default value is `flat`.
New skip_index_level table attribute in CREATE TABLE/ALTER TABLE syntax	Used to control the data range for generating Skip Index aggregate statistics. The default value is 0. When set to 0, Skip Index aggregate information is generated for all columnar baselines of the table, and for row-based baselines, Skip Index aggregate information is generated based on the schema. Incremental sstables do not generate Skip Index aggregate information. When set to 1, in addition to the settings for 0, Skip Index aggregate information is generated for incremental sstables of tables in `DELETE_INSERT` or `INSERT_ONLY` mode based on the baseline behavior.
New REST catalog-related attributes in CREATE EXTERNAL CATALOG syntax	Added the `TYPE = 'REST'` catalog type to access Iceberg tables through standardized HTTP REST APIs.
New SHOW DATABASES FROM catalog and SHOW TABLES FROM catalog.database syntax	Allows users to view the `databases` in a specified catalog and the `tables` in a specified `catalog.database`.
New TTL (Time To Live) syntax in CREATE TABLE	When creating a table, you can specify `TTL ora_rowscn + INTERVAL 10 DAY BY COMPACTION` to set the table to automatically delete data older than 10 days.
New Iceberg table time travel syntax	In Iceberg table queries, you can specify `{VERSION \\| TIMESTAMP} AS OF '<version_identifier>'` to query data from a specific snapshot in the past.
New UNIT_LIST clause in ALTER SYSTEM CREATE/MODIFY LS syntax	Added a new log stream maintenance syntax for specifying the unit list for log stream distribution in heterogeneous zone mode.
New cascading refresh-related parameters in DBMS_MVIEW.REFRESH command	`DBMS_MVIEW.REFRESH('mv_name', 'refresh_method', nested=> true, nested_refresh_mode => 'consistent/inconsistent')` Specifies the refresh strategy for nested materialized views. If `nested` is set to `true`, cascading refresh is performed for nested materialized views; if `nested` is set to `false`, independent refresh is performed for nested materialized views. Specifies the consistency behavior for cascading refresh. If `nested` is set to `true`, `consistent` and `inconsistent` describe the two cascading refresh behaviors, respectively. If `nested_refresh_mode` is not specified, the default is cascading non-consistent refresh.
New individual/inconsistent/consistent refresh strategy attributes in CREATE/ALTER MATERIALIZED VIEW	`CREATE MATERIALIZED VIEW mv1 REFRESH [individual/inconsistent/consistent] AS xxx` specifies the refresh strategy when creating or modifying a nested materialized view: `individual` specifies independent refresh. `inconsistent` specifies cascading non-consistent refresh. `consistent` specifies cascading consistent refresh.
New AS OF PROCTIME() attribute for materialized view base tables	For example, `CREATE MATERIALIZED VIEW mv REFRESH FAST ON DEMAND AS SELECT xxx FROM t1 LEFT JOIN t2 AS OF PROCTIME() ON t1.c1 = t2.c1 LEFT JOIN t3 ON t1.c2 = t3.c2 WHERE xxx;` allows you to specify `AS OF PROCTIME()` for the base table when creating a materialized view to skip refreshing this table during incremental refresh. Tables with `AS OF PROCTIME()` do not need an MLOG.
New SENSITIVE RULE-related syntax	`CREATE SENSITIVE RULE <rule_name> ON <sensitive_field_list> USING <sensitivity_protection_spec>;` is used to create a sensitive data protection rule. `DROP SENSITIVE RULE <rule_name>;` is used to drop a sensitive data protection rule. `ALTER SENSITIVE RULE <rule_name> <alter_sensitive_rule_action>;` is used to modify the properties of a sensitive data protection rule. `GRANT/REVOKE CREATE SENSITIVE RULE ON . TO/FROM user_or_role;` is used to grant or revoke the permission to create sensitive rules for a user or role. `GRANT/REVOKE PLAINACCESS ON SENSITIVE RULE <rule_name> TO/FROM user_or_role;` or `GRANT/REVOKE PLAINACCESS ON . TO/FROM user_or_role;` is used to grant or revoke the permission to access columns protected by sensitive rules in plaintext for a user or role. `SHOW SENSITIVE RULES;` is used to view all sensitive data protection rules and the columns they protect, with support for filtering by rule name, table name, or database name.
New vector_index hint in vector queries	`/+ vector_index (table_name index_name [ filter_type]) /` specifies the vector index and filter type to use for vector index queries.
New type=sindi_sq index type in vector indexes	`vector index idx1(c2) with (distance=inner_product, type=sindi_sq, lib=vsag)`
New ALTER SYSTEM FLUSH SQL AUDIT/PS CACHE command in Oracle mode	Supports using the ALTER SYSTEM FLUSH SQL AUDIT statement in Oracle mode to refresh SQL audit records. Supports using the ALTER SYSTEM FLUSH PS CACHE statement in Oracle mode to refresh PS cache.
Support for subqueries in WITH CHECK OPTION	Compatible with the behavior of WITH CHECK OPTION with subqueries in Oracle and MySQL.
CREATE USER statement with caching_sha2_password authentication	`CREATE USER IDENTIFIED WITH caching_sha2_password BY PASSWD;` `CREATE USER IDENTIFIED WITH caching_sha2_password BY PASSWORD auth_string;`
Syntax for creating a search index	`CREATE TABLE t (c1 INT, c2 JSON, c3 json, ..., SEARCH INDEX idx_search (c1, c2 [WITH ...], c3));`: Used to create a Search Index with the table. `CREATE SEARCH INDEX idx_search ON t (c1, c2 [WITH...], c3, ...);`: Used to add a Search Index after the table exists. `SELECT JSON_EXPR(jdoc, path [pick {json_number \| json_string}]) FROM t;` added parameters for search index queries, enabling precise filtering of JSON scalar types to optimize query accuracy.
Added FTS_INDEX_TYPE option for full-text indexes	The `FTS_INDEX_TYPE [=] FILTER/MATCH/PHRASE_MATCH` option is added to `FULLTEXT INDEX` in `CREATE TABLE`, `ALTER TABLE ADD FULLTEXT INDEX`, and `CREATE FULLTEXT INDEX`.
ALTER SYSTEM MAJOR FREEZE TENANT [=] tenant_name TABLE_ID = table_id syntax	Added syntax for specifying table-level major compactions.

System package changes

System package	Change type	Description
dbms_lob_manager	New	Used to execute LOB consistency detection tasks. It includes multiple subprograms such as `CHECK_LOB`, `CANCEL_JOB`, `SUSPEND_JOB`, `RESUME_JOB`, and `RESCHEDULE_JOB`.
dbms_daily_maintenance	New	Used to manage window compaction tasks. It includes subprograms such as `TRIGGER_WINDOW_COMPACTION_PROC` and `SET_THREAD_COUNT`.
dbms_scheduler	New internal task	The `DAILY_MAINTENANCE_WINDOW` internal task is added. You can use the `DBMS_SCHEDULER.DISABLE`, `ENABLE`, and `SET_ATTRIBUTE` subprograms to manage this task.
dbms_resource_manager	New subprogram	The `COPY_PLAN` subprogram is added. It is used to copy an existing resource management plan and its related resource plan directives.
sdo_util	New	The `GETVERTICES` subfunction is supported. It is used to extract the vertex coordinates of a geometry object (Geometry).
sdo_geom	New subfunction	The `SDO_AREA` subfunction is added. It is used to calculate the area of a two-dimensional geometry object.
dbms_workload_repository	New parameter	The `SQLSTAT_INTERVAL` parameter is added to the `MODIFY_SNAPSHOT_SETTINGS` subprogram. It represents the `SQLSTAT` sampling interval in minutes.
dbms_vector	New subfunction	The `QUERY_RECALL` subfunction is added. It is used to calculate the recall rate of an SQL statement. The `INDEX_RECALL` subfunction is added. It is used to calculate the recall rate of a vector index.

Function changes

Function	Change type	Description
SDO_CONTAINS	New	Determines whether one geometry (geometry A) strictly contains another geometry (geometry B).
SDO_ANYINTERAC	New	Determines whether any form of intersection or contact exists between two geometries.
MAX_PT	New	Retrieves the maximum partition value that contains data in the current table. Recommended for use in list partitioning scenarios.
DATE_TRUNC	New	Used for truncating dates.
MONTHS_ADD	New	Used for date addition calculations.
TO_DATE	New	Used for converting strings to date types in a specific format.
CAST AS BIGINT	New	Used for converting data to the `bigint` type.
ARBITRARY	New	Used for non-deterministically returning any non-`NULL` value from a set of rows.
ISNAN	New	A ClickHouse-compatible extension function for determining whether a numeric value `number` is "not a number" (NaN).
ADDDATE	New	A ClickHouse-compatible extension function for date addition calculations.
FORMATDATETIME	New	A ClickHouse-compatible extension function for formatting dates or date times.
TOUNIXTIMESTAMP	New	A ClickHouse-compatible extension function for converting a date time expression to a Unix timestamp.
EDITDISTANCE	New	A ClickHouse-compatible extension function for calculating the edit distance between two strings.
STDDEVSAMP	New	A ClickHouse-compatible extension function for returning the sample standard deviation of an expression.
UNIQ	New	A ClickHouse-compatible extension function for approximately calculating the number of distinct values in a column.
VARSAMP	New	A ClickHouse-compatible extension function for returning the sample variance of an expression.
ANY	New	A ClickHouse-compatible extension function for selecting any value from a set of values.
GROUPCONCAT	New	A ClickHouse-compatible extension function for concatenating the values of multiple rows in a group into a single string.
LAGINFRAME	New	A ClickHouse-compatible extension function for returning the value of `expr` from `offset` (default is 1) rows before the current row within the current window frame.
LEADINFRAME	New	A ClickHouse-compatible extension function for returning the value of `expr` from `offset` (default is 1) rows after the current row within the current window frame.
MD5_CONCAT_WS	New	Concatenates multiple strings into one string and returns the MD5 hash of the concatenated string as the result.
LOAD_FILE	New	Reads external storage files into the database via a temporary external table interface, supporting local files, HDFS, OSS, and S3 files, and returning corresponding BLOB data.
AI_PARSE_DOCUMENT	New	Invokes a specified visual understanding/OCR model to identify and parse document data (e.g., PDF) into a specified text format.
AI_SPLIT_DOCUMENT	New	Splits the content of a document in a specified format and outputs a fixed-format table to record the split content.

Upgrade notes

Non-shared storage mode

You can smoothly upgrade from V4.4.1 and earlier versions of the V4.4.x series to V4.6.0.
You can smoothly upgrade from V4.3.5 BP2 and earlier versions of the V4.3.x series to V4.6.0, but only in POC and test environments. This upgrade may carry some risks, and if the upgrade fails, the environment may need to be rebuilt. We recommend that you upgrade to V4.4.1 first and then to V4.6.0.
You can smoothly upgrade from V4.5.0 to V4.6.0.
You cannot upgrade from V4.2.x or earlier versions to V4.6.0.
You cannot upgrade from V4.4.2 to V4.6.0.

Shared storage mode

You cannot upgrade from a lower version of the shared storage mode to V4.6.0.

Recommended versions of components and tools

Component	Version
ODP	ODP 4.3.6.2 Note If you use the `caching_sha2_password` authentication method, you must use ODP 4.4.0.
OCP	OCP 4.4.2
ODC	ODC 4.4.2
OBCDC	OBCDC 4.6.0
OMS	OMS 4.3.2 BP1 Note If you do not need incremental data from OceanBase Database V4.6.0, use OMS V4.3.2 BP1. If you do need that incremental data, apply a hot patch to Store.
Binlog	Binlog 4.3.6.3
OCCI	OCCI 1.0.6
OBCI	OBCI 2.1.1.2
ECOB	ECOB 1.2.1
OBClient	OBClient 2.2.13
LibOBClient	LibOBClient 2.2.13
JDBC	JDBC 2.4.17
ODBC	ODBC 2.0.9.8
.NET	.NET 1.0.0
Import and export	ob-loader-dumper 4.3.5

OceanBase

Customer Stories

Documentation

OceanBase Database Enterprise Edition V4.6.0

V4.6.0

Version information

Version overview

Key features

Enhanced hybrid search capabilities

Comprehensive enhancement of vector capabilities

Optimizations for mixed HTAP workloads

Optimizer and executor enhancements

Transaction and storage engine enhancements

PL engine enhancements

Compatibility and feature enhancements

System performance improvements

Enhancements to the OBKV data model

Enhanced OBCDC

Security and reliability enhancements

Operation and maintenance enhancements

Compatibility changes

Changes in product behavior

Parameter changes

System variable changes

View changes

Syntax changes

System package changes

Function changes

Upgrade notes

Non-shared storage mode

Shared storage mode

Recommended versions of components and tools

Note

Note

OceanBase Database Enterprise Edition V4.6.0

V4.6.0

Version information

Version overview

Key features

Enhanced hybrid search capabilities

Comprehensive enhancement of vector capabilities

Optimizations for mixed HTAP workloads

Optimizer and executor enhancements

Transaction and storage engine enhancements

PL engine enhancements

Compatibility and feature enhancements

System performance improvements

Enhancements to the OBKV data model

Enhanced OBCDC

Security and reliability enhancements

Operation and maintenance enhancements

Compatibility changes

Changes in product behavior

Parameter changes

System variable changes

View changes

Syntax changes

System package changes

Function changes

Upgrade notes

Non-shared storage mode

Shared storage mode

Recommended versions of components and tools

Note

Note