This topic only lists the major updates to the core capabilities of real-time analysis. For more information about the full list of features and updates of OceanBase Database, see Release Note.
V4.4.2
Version information
- Release date: February 10, 2026
- Version number: V4.4.2
New features and enhancements
Enhanced materialized view capabilities
The V4.4.2 version continues to enhance the capabilities of materialized views. It supports the RENAME operation and column addition for materialized view logs at the DDL level, and automatically creates, replaces, and cleans up redundant data in MLOG tables in the background. The incremental refresh capability is significantly expanded to support complex query modes such as outer joins, UNION ALL, non-aggregated single-table queries, and aggregated queries with LEFT JOIN. It also strengthens the processing of MIN/MAX aggregate functions, supports repeated SELECT items and GROUP BY columns, and adapts to scenarios without a primary key in base tables. The AS OF PROCTIME() syntax is introduced to enable dynamic exemption from refreshes for dimension tables (supporting views that reference this syntax). The cascading refresh mechanism for nested materialized views is also improved. Additionally, support for UDT/UDF, minimal mode, and the md5_concat_ws function is enhanced. The display of view content and error messages during creation are optimized, significantly improving the flexibility, performance, and ease of use of materialized views in complex queries, real-time analysis, and O&M management.
V4.4.1
Version information
- Release date: September 26, 2025
- Version number: V4.4.1
New features and enhancements
Support for HMS Catalog
V4.4.1 supports Iceberg, an open table format for data lakes, and allows querying external tables in Iceberg and Hive formats through HMS Catalog. HMS Catalog is introduced to build a unified metadata abstraction layer compatible with the HMS protocol. It automatically synchronizes metadata across different table formats, such as Hudi Timeline and Iceberg Snapshots, and seamlessly integrates with mainstream computing engines. This addresses metadata ambiguity across systems, making it a key infrastructure for OceanBase to transition from "compute-storage separation" to "metadata-driven governance."
Support for JDBC plugins in external tables
OceanBase allows direct access to various external data sources through external tables, such as CSV files stored on OSS and ODPS tables. V4.4.1 introduces JDBC plugin support, enabling integration with JDBC-compatible data sources. Currently, MySQL data sources are supported.
V4.4.0 Beta
Version information
- Release date: July 8, 2025
- Version number: V4.4.0
New features and enhancements
Support for ODPS Storage API
The V4.4.0 version introduces support for the ODPS Storage API, allowing direct access to ODPS's underlying storage from external tables. This eliminates the need for establishing independent sessions for each partition scan in the Tunnel API, significantly reducing initialization latency and enhancing performance for small and frequent queries. Compatibility with the Tunnel API is retained to meet diverse access requirements.
SELECT INTO OUTFILE supports HDFS paths
The V4.4.0 version continues to enhance external table capabilities, supporting direct querying and importing of files from HDFS paths. The new version further integrates HDFS capabilities, allowing direct data export to HDFS and supporting Kerberos authentication for secure access.
V4.3.5 BP5
Version information
- Release date: November 17, 2025
- Version number: V4.3.5 BP5
New features and enhancements
Materialized View improvements
V4.3.5 BP5 supports incremental refresh of aggregated views with a LEFT JOIN, nested materialized views created based on keyless incremental materialized views, MIN/MAX aggregate functions with non-basic columns as parameters for incremental materialized views, incremental materialized views using dimension tables (AS OF PROCTIME()) in multi-table joins, full refresh of materialized views referencing UDFs, minimal refresh mode for incremental materialized views (DMLs during incremental refresh only read and write necessary columns in columnar tables), and refresh of materialized views using session variables.
Skip Index for Delete-Insert table incremental data queries is supported.
V4.3.5 BP5 supports incremental Skip Index for Delete-Insert tables. In incremental query processing, the Skip Index pre-trims data for improved query performance. You can use the default_skip_index_level parameter at the tenant level to control the scope of Skip Index capabilities.
Exchange partition support added
Before V4.3.5 BP5, the exchange of a partition in a range/range columns partitioned table with a non-partitioned table and the exchange of a partition in a subpartitioned range/range columns partitioned table with a non-partitioned table are supported. In addition, exchanging a primary range/range columns partition from a subpartitioned table with a primary partitioned table is supported. In V4.3.5 BP5, the exchange of a partition in a list/list columns partitioned table with a non-partitioned table is supported.
V4.3.5 BP4
Version information
- Release date: September 10, 2025
- Version number: V4.3.5 BP4
New features and enhancements
Enhanced materialized view capabilities
V4.3.5 BP4 continues to enhance materialized view capabilities. From the functional perspective, it supports incremental refresh of materialized views without refreshing dimension tables. When creating a materialized view, you can use the new AS OF PROCTIME() syntax to specify tables that do not need to be refreshed. During the incremental refresh of a materialized view, the incremental data of these tables will not be refreshed. Single-table aggregation incremental refresh materialized views support the MIN() and MAX() aggregate functions. From the usability perspective, it supports automated management of MLOG tables. When creating an incremental refresh materialized view, the system automatically creates or replaces the required MLOG tables for the base table and periodically cleans up redundant MLOG tables in the background. It also optimizes the view content and error messages when creating materialized views.
Disabled full bypass import by default for AP parameter templates.
V4.3.5 BP3
Version information
- Release date: July 21, 2025
- Version number: V4.3.5 BP3
New features and enhancements
Optimizations for external data sources and integrations
ODPS external table support for Storage API
In previous versions, the execution of an external table based on the Tunnel API required a separate session for each partition, which introduced a second delay. This delay could be significant for queries that are inherently short. To address this, the new version supports the ODPS Storage API, which allows direct access to the underlying storage of ODPS. This provides more optimization strategies and better read performance. The new version has been adapted to support the ODPS Storage API, further enhancing the performance of external table access.
Support for Azure Blob storage
The new version supports accessing Azure Blob storage using the Azure Blob protocol.
Enhanced materialized views
Enhancements to materialized views
The new version continues to improve the capabilities of materialized views. Key enhancements include support for renaming materialized views, incremental refreshes for outer joins, UNION ALL, and non-aggregated single-table operations, support for adding columns to materialized view logs, and cascading refreshes for nested materialized views. Additionally, multiple issues, such as materialized view drop hangs, have been fixed to improve the stability of materialized views.
Optimizations for storage and partition management
Support for partition exchange between primary and secondary partitioned tables
The new version supports partition exchange between primary partitions of secondary partitioned tables and primary partitions of primary partitioned tables.
Optimizer and query performance improvements
Enhanced statistics collection
In versions prior to V4.3.5 BP3, statistics collection required gathering aggregated data such as MIN, MAX, NULL COUNT, and NDV for each column at the partition and table levels, and estimating data distribution through sampling, which often resulted in significant errors. The new version leverages the Skip Index capability at the storage layer to collect more accurate column-level aggregated data at the block level for all columns in columnar storage. This helps the optimizer generate and optimize plans with greater accuracy. Additionally, the new version introduces a progressive statistics collection approach for partitions to address the issue of timeouts caused by long statistics collection times for large partitioned tables, thereby improving system stability and data timeliness.
Accelerated data import
Optimized performance for direct load
The new version optimizes the execution logic for direct load, including sorting algorithms, vectorized paths for direct load, CSV parsing, column_conv expressions, and CS encoding. In the scenario of columnar storage for non-partitioned heap tables, performance can improve by up to 10%.
V4.3.5 BP2
Version information
- Release date: May 15, 2025
- Version number: V4.3.5 BP2
Product forms
- Shared-Storage AP
Starting from V4.3.5 BP2, OceanBase Database can be deployed in the Shared-Storage AP form to handle AP workloads. This form supports key AP features in the Shared-Nothing deployment mode, such as columnstore tables, incremental direct load, materialized views, full-text indexes, and multi-value indexes. It also optimizes the I/O read method in the Shared-Storage deployment mode. The Shared-Storage AP form is recommended for AP workloads with data cold-warm characteristics that aim to reduce storage costs.
Feature enhancements
Enhanced data recovery capabilities
Support for restoring columnstore tables at the table level
Previously, the table-level restore feature only supported restoring to a rowstore table. In this version, the table-level restore feature is enhanced in the Shared-Nothing deployment mode to support restoring to columnstore tables and hybrid rowstore-columnstore tables.
Performance optimizations
Optimized performance of direct load for heap tables
The uniqueness check process during the creation of a unique index in the direct load of a heap table is optimized. The performance of a full direct load of a non-partitioned table is not worse than that of a primary key table in an index-organized table.
Optimized performance of hard parsing
In some AP scenarios or for some small accounts, disabling the plan cache and using hard parsing can reduce the use of suboptimal plans. However, this requires higher performance from hard parsing. In this version, the optimizer's resolve and rewrite stages are optimized by improving the hot functions in the resolver stage, replacing global cost verification with local cost verification where applicable, reducing memory usage for expressions, and eliminating unnecessary expression type inference operations. This enhances the performance of the optimizer's resolve and rewrite stages.
Enhanced materialized views
Support for building materialized views based on external tables in full refresh scenarios
Previously, OceanBase Database supported building materialized views based on user tables and materialized views. In this version, materialized views based on external tables are supported in full refresh scenarios, expanding the applicable scenarios for materialized views.
Enhanced diagnostic capabilities for materialized views
New system views
CDB/DBA_MVIEW_RUNNING_JOBSandDBA_MVIEW_DEPSare introduced to display ongoing materialized view tasks, such as refresh tasks and MLOG cleanup tasks, and the dependency objects of materialized views, respectively. TheCDB/DBA/ALL/USER_MVIEWSsystem views now record the refresh points and refresh delays of materialized views. TheCDB/DBA/ALL/USER_MVIEW_LOGSviews now record the parallelism and duration of MLOG cleanup tasks.Resource isolation for materialized view tasks
Refreshing and purging materialized views consume system resources, which may affect the performance of foreground tasks. In this version, resource isolation is provided for incremental refreshes and MLOG purges of materialized views based on the Resource Manager. Users can set resource limits for materialized view tasks.
Partition management
Dynamic partition management
Businesses often use date or time as the range partition key to divide data into different partitions based on different time periods. Previously, DBAs had to pre-create partitions for future time periods to meet data writing requirements. If a DBA forgets to pre-create partitions, data may fail to write due to the absence of partitions, affecting business operations. In this version, dynamic partition management is introduced. By specifying the dynamic partition management attribute for a table, the kernel automatically pre-creates partitions for future time periods, reducing the burden on DBAs. Dynamic partition management also supports automatically deleting obsolete partitions that are no longer needed, saving storage space.
Indexes and query optimization
NGRAM2 tokenizer for full-text indexes
The built-in
NGRAM2tokenizer is introduced for full-text indexes, supportingNGRAMtokenization within themin_ngram_sizetomax_ngram_sizerange. TheNGRAM2tokenizer is suitable for scenarios where performance and storage space are relatively insensitive, and different-lengthtokenretrieval is required. For fixed-lengthtoken, theNGRAMtokenizer can be used.
Data types and storage optimization
Map data type
The map data type is used to store unordered key-value pairs, such as {a:1, b:2, c:3}. Common use cases include saving configuration options, user attributes, and product information. In this version, OceanBase Database implements the map data type based on the array framework and supports map constructors and functions and operators such as map_keys, map_values, =, and !=.
MySQL mode JSON semi-structured storage
Currently, semi-structured data is stored as binary strings, making it difficult to encode and compress using encoding methods. Each data record redundantly stores some data metadata for self-explanation, resulting in a low compression rate. JSON is a typical semi-structured data type with strong structural characteristics. In theory, semi-structured data can be logically split into multiple base-type columns, with the non-decomposable parts stored in a binary column. Structured columns can leverage encoding to improve the compression rate, while non-decomposable parts, by extracting data metadata, reduce space usage compared to before. This is referred to as "structured storage of semi-structured columns." In this version, OceanBase Database supports semi-structured storage of JSON data and implements corresponding encoding methods to reduce storage space usage and optimize query and filter performance on JSON sub-columns. However, since JSON data is stored as multiple sub-columns, frequent reads of the original JSON data may introduce merge overhead, leading to performance degradation. Whether to enable this feature depends on the business requirements.
Query execution
Adaptive plan cache
Plan cache is a crucial capability in TP scenarios and is enabled by default. In some AP scenarios, the execution time of a plan can be much longer than the generation time. In such cases, generating a new plan may result in a better execution plan than reusing an existing one, leading to improved execution performance. In mixed HTAP workloads, it is necessary to provide a way to enable hard parsing for AP-skewed SQL statements while allowing TP-skewed SQL statements to reuse plans to avoid hard parsing overhead. In this version, OceanBase Database introduces adaptive plan cache, which can decide whether to use the plan cache for a SQL statement based on its execution time and distribution of execution times.
External data source integration
MySQL mode ODPS (MaxCompute) catalog
To facilitate querying data stored in various external data sources, this version introduces the External Catalog framework and supports the ODPS catalog feature. OceanBase Database's internal objects are managed under the Internal Catalog. Users can create External Catalogs to connect to external data sources and retrieve metadata. Direct querying of tables under an External Catalog is supported, such as
select col1 from catalog1.database1.table1;. Cross-catalog table joins are also supported when permissions are met. This eliminates the need for data imports or creating individual external tables, improving the usability of external data access.
Data import
Load Data import with URL CSV external table error tolerance mode
Previously, when data type incompatibility or precision mismatch issues occurred during data import using the Load Data method, errors were directly reported. In this version, the
LOG ERRORSinstruction is introduced for error-tolerant imports. In MySQL mode, failed rows are recorded, and users can view error data usingshow warningsfor error diagnosis.
V4.3.5 BP1
Version information
- Release date: March 18, 2025
- Version number: V4.3.5 BP1
New features and enhancements
Data import and export
Support for incremental direct load of heap tables with unique indexes
Direct load can be performed in full or incremental mode. When data already exists in the table, incremental direct load performs better than full direct load. However, incremental direct load previously did not support importing data into heap tables with unique indexes. This version now supports incremental direct load for heap tables with a local unique index.
Support for HASH partitioned and concurrent direct load
Starting from V4.3.5 BP1, direct load is supported for HASH-partitioned tables. Additionally, incremental direct load now supports concurrent loading across multiple partitions. However, if multiple import tasks have overlapping partitions, concurrent partition loading is not supported.
Performance optimization for direct load
Through vectorization optimization, streamlined temporary file processing, and improved computational performance in each stage, the import performance of the clickbench table has improved by 17% compared to V4.3.5.
External data source integration
Support for reading HDFS files from external tables
HDFS is the most common storage medium in data lake architectures and serves as a platform for multi-engine data sharing. Therefore, this version supports direct reading of HDFS files from external tables and also supports Kerberos authentication for HDFS clusters.
Support for reading and writing data from and to MaxCompute (ODPS)
This version supports reading data from MaxCompute using external tables or URL-based external tables, and also supports writing data from OceanBase to MaxCompute.
URL-based external tables
When accessing external data sources in OceanBase, you typically need to create an external table and then use SQL queries. This version introduces URL-based external tables, allowing direct querying of external CSV, Parquet, ORC, and MaxCompute data without creating an external table. Additionally, it extends the
LOAD DATAsyntax to directly reference URL-based external tables for data import, reducing the operational cost of data import.
Index optimization
Enhancements to full-text indexing
Starting from OceanBase V4.3.1, full-text indexing is supported in MySQL mode. Subsequent versions have added various features and enhancements. V4.3.5 BP1 introduces the built-in Chinese
IKtokenizer, supporting bothsmartandmax_wordmodes. It also supports adding new tokenizers via plugins. A newPARSER_PROPERTIESattribute is added for full-text indexes, allowing table-level configuration. In addition to the existing natural language mode, a boolean mode is now supported. You can use theIN BOOLEAN MODEkeyword inMATCH AGAINSTto enable boolean mode, supporting operators such as+,-,(), and no operator, as well as nested boolean operations. This version also supports full-text indexes in index merge operations, improving query performance through index union merge. It optimizes update and delete scenarios for full-text indexes on the main table, enhancing update and delete performance.
Materialized view enhancements
Enhancements to materialized views
The parallelism of materialized view refreshes is closely related to the efficiency of incremental and full refreshes. This version supports complete parallelism control mechanisms, allowing you to set the refresh parallelism using the
mview_refresh_dopsystem variable, theDBMS_MVIEW.refreshstored procedure, or thePARALLELclause when creating a materialized view. This version also supports theLOBdata type for incremental materialized views and provides the ability to directly modify materialized views or materialized view attributes usingALTER MATERIALIZED VIEWandALTER MATERIALIZED VIEW LOGcommands. You can modify the parallelism, background refresh task intervals, MLOG table parallelism, MLOG table background cleanup task intervals, and MLOG LOB inline storage length threshold. Additionally, this version supports adding columns to MLOG base tables, including both appending and inserting columns in the middle. It also improves MLOG cleanup performance and optimizes error messages for incremental refreshes.
Table structure and storage optimization
Heap table organization mode
OceanBase uses a clustered index table model, which performs well in OLTP environments. It stores primary keys and main table data in the same table, optimizing primary key query speed and uniqueness validation performance. However, this model has limitations in OLAP scenarios requiring efficient data import and complex data analysis. Import processes require sorting of full data, and queries must perform primary key row fusion, both of which impact performance. This version introduces the heap table organization mode, where primary keys are used for uniqueness constraints, and queries rely on the main table. When user data is sorted by time, skip indexes can be more effectively utilized to improve query efficiency. Decoupling primary keys from data also eliminates the need for sorting main table data during imports, enhancing import performance.
Data type and storage optimization
New String data type in MySQL mode
Some AP applications require columns with high tolerance for varying lengths, often using variable-length string types for data storage and requiring primary keys or index keys. In OceanBase,
CHARandVARCHARtypes can store strings and serve as primary keys or index keys, but require specified lengths.MEDIUMTEXTandTEXTtypes can store variable-length strings but cannot serve as primary keys or index keys. To better meet AP application needs, this version introduces theStringdata type, which does not require specified lengths and has a default maximum length of 16 MB. When the column length is less than 16 KB and does not exceedlob_inrow_threshold, it can serve as a primary key or index key.Enhancements to ARRAY type functions
OceanBase supports the
ARRAYdata type and related functions and operators starting from version 4.3.3. This version further enhances these functions, adding support forarray_prepend,array_concat,array_compact,array_filter,array_sort,array_sortby,array_length,array_range,array_sum,array_first,array_difference,array_min,array_max,array_avg,array_position,array_slice, andreverse.
Parallel query optimization
Decoupling of PX compute nodes and data nodes
In the current system architecture, storage and compute resources are highly coupled. The existing PX scheduling mechanism for Non-Leaf DFOs only selects machines where data is stored for compute task allocation, which can limit resource utilization in certain scenarios. To enable large SQL queries to utilize more machines, this version decouples PX compute nodes from data nodes. You can configure or use hints (PX_NODE_POLICY) to determine the candidate resource pools for Non-Leaf DFOs. It also provides hints (PX_NODE_ADDRS, PX_NODE_COUNT) to force the allocation of specific machines or a specific number of machines for Non-Leaf DFOs.
Query optimization and pushdown
Support for query pushdown in rowstore data column addition scenarios
During a rowstore SCAN, if there is no intersection between major sstable and other rowid data, the SCAN is pushed down to the storage layer. After a major compaction, if new columns are added, the SCAN can theoretically be pushed down. However, if the new column parsing fails, the pushdown fails, and the system reverts to the less efficient row-by-row read. This version optimizes this scenario, supporting query pushdown to the storage layer to improve query performance.
Columnar storage optimization
Optimizations for columnar replicas
The columnar replica feature introduced in V4.3.3 had some limitations, such as supporting only a single C replica, not allowing direct full import of columnar data to C replicas, and lacking user-friendliness. This version addresses these limitations by allowing multiple C replicas in a cluster, direct full import of columnar data to C replicas, and the addition of system views
CDB/DBA_OB_CS_REPLICA_STATSto monitor columnar replica progress at the partition and log stream levels.
Behavior changes
NLJ plans are disabled for AP templates
After the optimizer generates a nested loop join plan, the plan is sensitive to the number of rows in the driving table. If the estimated number of rows in the driving table is significantly smaller than the actual number, the actual execution performance of the plan may severely degrade. In OLAP scenarios, the data volume is generally large, and the benefits of using nested loop join plans are limited. Therefore, in AP scenarios, the default is to generate hash join plans, and nested loop join plans are only generated in non-equi join scenarios.
V4.3.5
Version information
- Release date: December 31, 2024
- Version number: V4.3.5
New features and enhancements
Materialized view enhancements
Support for nested materialized views
In versions earlier than V4.3.5, materialized views could only be created on regular user tables. In data warehouse scenarios, materialized views are used for data processing. To better support lightweight real-time data warehouses, V4.3.5 introduces nested materialized views, which allow new materialized views to be created based on existing ones. The refresh methods for nested materialized views are the same as those for non-nested ones, including full and incremental refreshes. Therefore, V4.3.5 supports creating materialized view logs based on materialized views. The freshness of data in nested materialized views depends on the base table's data freshness. To ensure the freshness of upper-level materialized views, users must refresh the underlying materialized views first.
Data import and export
Support for specifying partitions for direct load
Before V4.3.5, OBServer only supported full-table direct load to accelerate data import. If users wanted to import data from only specific partitions, they had to use partition exchange. This involved importing data from the specified partitions to a non-partitioned table via full direct load and then exchanging the non-partitioned table with the target partition. This process was cumbersome. To better support partition-level data imports, V4.3.5 allows users to specify partitions for direct load using the
LOAD DATAandINSERT INTO SELECTsyntax. However, it does not support specifying the last-level partition type as Hash/Key.Enhancements to the
select into outfilefeatureV4.3.5 enhances the
select intoexport feature by supporting the export of parquet and orc types, allowing users to specify compression algorithms when exporting csv types, and introducing new syntax forselect into.- Use the format syntax to set export options.
- Specify a compression algorithm when exporting csv files.
- Export parquet files.
- Export orc files.
Index enhancements
Enhancements to full-text indexing
V4.3.5 continues to enhance the full-text indexing feature based on V4.3.4, including:
- Support for creating full-text indexes using the
CREATE FULLTEXT INDEXorALTER TABLE ADD FULLTEXT INDEXstatements. - Support for cost-based full-text index plan selection, allowing the selection of less expensive indexes for scanning when multiple filters with
MATCH AGAINSTare present in an SQL query. - Support for Functional Lookup:
- Support for multiple
MATCH AGAINSTexpressions in query statements. - Support for scanning other indexes while using full-text indexes.
- Support for outputting
MATCH AGAINSTexpressions without filter semantics as projected columns. - Support for filtering
MATCH AGAINSTexpressions with filter semantics using <=/< operators. - Support for linking
MATCH AGAINSTexpressions with filter semantics using AND/OR logic.
- Support for multiple
- Support for creating full-text indexes using the
External data source integration
Enhancements to external tables
In V4.3.5, external tables support reading ORC format files.
Columnar storage optimization
- Read-only columnar replicas
The read-only columnar replica feature introduced in V4.3.3 was an experimental feature. After iterations in V4.3.4 and V4.3.5, this feature has now met the GA requirements in V4.3.5.
Support for asynchronous online DDL during row-to-column conversion
V4.3.0 introduced three storage formats for tables: row-based, pure columnar, and hybrid row-column. Converting a row-based table to a hybrid or columnar table was an offline DDL operation. To minimize the impact of offline DDL on customers, V4.3.5 supports asynchronous online DDL for row-to-column conversion. By specifying the
delayedkeyword during DDL execution, users can modify the table schema in real-time without blocking data writes. The columnar baseline data is then reorganized asynchronously during baseline data compaction. V4.3.5 supports both row-to-column and row-to-hybrid conversions as online DDL operations.
Performance optimization
Vectorization in direct load write path
Before V4.3.5, direct load did not use vectorization, and write functions processed data row by row, resulting in significant function call overhead. Additionally, writing to sstables was a background process, and some operations lacked specific optimizations. V4.3.5 introduces vectorization in the direct load path and optimizes columnar encoding to enhance direct load performance. Testing has shown that direct load performance for tables without a primary key can be improved by approximately 200%.
Data type and storage optimization
Support for complex DML statements with JSON multi-value indexes
V4.3.5 enhances support for complex DML statements with JSON multi-value indexes. It allows multi-value predicates in
updateanddeletestatements and supports complex DML operations that involve multi-value index lookups.Support for more array expressions
OceanBase Database introduced the ARRAY data type in V4.3.3. To better support business use cases involving arrays, V4.3.5 adds more array expressions to meet diverse business needs. New expressions include
array_append,array_distinct,arrayMap,array_remove,cardinality,element_at,array_contains_all,array_overlaps,array_to_string,array_agg,unnest, andrb_build.
V4.3.4
Version information
- Release date: October 31, 2024
- Version number: V4.3.4
New features and enhancements
Storage architecture optimization
Shared storage (experimental feature)
To provide more cost-effective database services in public cloud environments, OceanBase Database V4.3.4 introduces the Shared-Storage Cluster architecture based on a general-purpose object storage. This architecture allows multiple tenants to share a baseline dataset and log data stored in a standard object storage. Hot data is stored in cloud disks or local SSDs, while cold data is stored in standard object storage, reducing storage costs. Each tenant's replicas cache only hot data and logs locally, enabling quick scaling of compute nodes. This architecture is suitable for TP workloads with distinct hot and cold data, such as historical and backup databases, as well as key-value and time-series workloads.
This feature is defined as experimental in V4.3.4 and will be further enhanced in future versions to become a production-ready feature.
Indexing capabilities
Enhanced full-text indexing
OceanBase Database has supported full-text indexing since version 4.3.1. By preprocessing text content and building keyword indexes, it significantly improves the efficiency of full-text search. However, in complex DML scenarios involving full-text indexes, there were some usage limitations that caused inconvenience for businesses. To better support business development, version 4.3.4 enhances the complex DML functionality, supporting operations such as
INSERT INTO ON DUPLICATE KEY,REPLACE INTO, multi-table updates/deletes, and updatable views for tables with full-text indexes. Additionally, the new version optimizes full-text search performance using vector execution and extended TAAT processing strategies. It also removes the token limit for theMATCH AGAINSTpredicate, allowing full-text indexes to be created on partitioned tables without a primary key. Furthermore, it supports thetokenizefunction to view tokenization results, aiding in debugging tokenization systems.Enhanced multi-value indexing
Currently, multi-value indexing only supports pre-created indexes, with some usage limitations for DML and DDL operations. To better support business use, V4.3.4 relaxes some of these limitations and enhances the functionality. This includes supporting complex DML operations such as
INSERT INTO ON DUPLICATE KEY,REPLACE INTO, multi-table updates/deletes, and updatable views.
Data import optimization
Support for non-unique local indexes in incremental direct load
OceanBase Database supports both full and incremental direct loads. In scenarios where data already exists in the table, incremental direct load performs better than full direct load. Previously, incremental direct load did not support loading data into tables with indexes. V4.3.4 now supports incremental direct load for tables with non-unique local indexes.
Optimized direct load performance
During direct load, if a table with a primary key requires sorting, data is first written to a temporary file, then loaded into memory for sorting and merging, and finally written to an sstable. This two-stage process (data writing and sorting/merging) can cause performance bottlenecks if data is not fully written or if I/O reaches its limit. V4.3.4 optimizes this by directly writing data to memory instead of a temporary file, allowing sorting and merging to proceed in a pipeline fashion. This avoids the impact of earlier stages on later stages. In primary key tables, this optimization improves performance by 35% for 1TB of data. Larger memory and CPU resources result in greater performance gains, making this optimization ideal for large-scale tenants. For smaller tenants, performance remains stable without significant degradation.
Data types and storage optimization
Enhanced bitmap functionality
A bitmap, also known as a bitset, is a data structure for efficiently storing and processing whether an element exists in a set. OceanBase Database V4.3.2 and V4.3.3 supported some features of RoringBitmap, and V4.3.4 further enhances this by adding the
rb_iterateandrb_selectexpressions.rb_iterateexpands a roaringbitmap type into multiple rows based on the number of elements, whilerb_selectfilters elements from a roaringbitmap type based on conditions and returns a new roaringbitmap type.
V4.3.3
Version information
- Release date: September 30, 2024
- Version number: V4.3.3
New features and enhancements
Columnar storage optimization
Read-only columnar replicas (experimental feature)
OceanBase Database V4.3.0 supports columnar storage. You can create a columnar table, a rowstore table, or a hybrid table. Regardless of the table type, the replica mode is consistent across zones in a tenant. For example, if a tenant has three zones and a hybrid table is created, the table will have one rowstore replica and one columnar replica in each zone. To meet the physical isolation requirements of TP and AP resources in hybrid workloads, V4.3.3 introduces a new deployment mode that allows you to expand a dedicated zone for read-only columnar replicas (Column Store Replicas, abbreviated as C replicas). In this zone, all user tables are stored in columnar format. AP workloads can use an independent ODP and set the
ob_route_policysession-level system variable toCOLUMN_STORE_ONLYto access the columnar replicas for weak-consistency reads without affecting TP workloads. Additionally, similar to the 3+1 zone deployment, this mode reduces storage overhead compared to hybrid tables. It requires ODP V4.3.2 or later.V4.3.3 defines this feature as experimental and will continue to refine it in future versions.
Enhanced materialized views
Enhanced materialized view capabilities
To reduce the manual rewrite cost, OceanBase Database V4.3.1 supports materialized view rewriting. When the
QUERY_REWRITE_ENABLEDsystem variable is set toTrue, you can enable automatic rewriting by specifyingENABLE QUERY REWRITEwhen creating a materialized view. This allows the system to rewrite queries against the original table to queries against the materialized view. V4.3.1 supports rewriting for non-aggregate materialized views in cases where theFROMclause fully matches and theWHEREclause partially matches. V4.3.3 further supports rewriting for non-aggregate materialized views withFROMclause joins, queries involving tables not present in the materialized view, and aggregate materialized views, including rewriting for aggregation rollups.Additionally, the new version extends the types of SQL supported for incremental refreshes and real-time materialized views. Earlier versions supported incremental refreshes and real-time queries for single-table aggregations and multi-table joins. V4.3.3 now supports incremental refreshes for join aggregations.
Furthermore, OceanBase Database previously only supported materialized views in rowstore format. V4.3.3 introduces support for columnar materialized views, which can improve query performance in complex analytical scenarios involving materialized view references.
Optimized data import and export
Optimized external table import performance
V4.3.3 optimizes the performance of reading data from external tables during direct load, achieving approximately a 15% improvement over the previous version.
Enhanced INSERT OVERWRITE capabilities
OceanBase Database V4.3.2 supports table-level overwrite writes (INSERT OVERWRITE), allowing atomic clearing of old data and writing of new data to support AP workloads such as periodic data refreshes, data transformations, and data cleansing. However, V4.3.2 only supported full-table overwrite writes and did not provide partition-level or partial-column overwrite capabilities. V4.3.3 enhances this feature by allowing you to specify the target table as a partitioned or subpartitioned table and to specify partial column information in the INSERT OVERWRITE statement, making data overwrite writes more flexible and applicable to a wider range of business scenarios.
Support for compressed files in Load Data and external tables
Load Data is a common data import method that supports various use cases, including server-side, direct, and client-side imports. It allows you to import text files in a specified format into the database. Previously, only plain text files were supported, and compressed files like GZIP had to be decompressed before import, which was cumbersome. The new version supports importing compressed files, including GZIP, DEFLATE, and ZSTD formats. These files can be loaded, decompressed, and written to the database simultaneously. Additionally, CSV external tables now support compressed files, allowing you to query data from compressed files directly.
Data type and storage optimizations
ARRAY data type
The ARRAY data type is commonly used in AP workloads to store multiple elements of the same type. It is suitable for managing and querying multi-valued attributes that cannot be effectively represented by relational data. OceanBase Database V4.3.3 supports the ARRAY data type in MySQL mode. You can define a column as an array of numeric or character values, and even as a nested array. It supports expressions for querying or writing array objects, including the
array_containsexpression and the ANY operator for checking if an element exists in an array. It also supports operators like +, -, =, and != for array element calculations and comparisons.Optimized RoaringBitmap performance
V4.3.2 introduced the RoaringBitmap data type and related expressions to support multi-dimensional analysis in scenarios such as user profiling, personalized recommendations, and precision marketing. However, performance was suboptimal in some cases. V4.3.3 focuses on optimizing the performance of RoaringBitmap calculations by improving memory allocation and expression execution logic, reducing unnecessary performance overhead, and significantly enhancing execution performance for cardinality, AND/OR/XOR/ANDNOT operations, and aggregation scenarios.
Resource management and isolation
Query-level resource group configuration
OceanBase Database currently supports configuring resource groups at the user, background task, and column parameter levels using the DBMS_RESOURCE_MANAGER system package for CPU and IOPS isolation. V4.3.3 introduces query-level resource group binding. You can specify the
/*+ resource_group('group_name') */hint in an SQL statement to force the use of a specific resource group. If the specified resource group does not exist, the default resource group is used. Switching resource groups requires a session reconnect to take effect.
V4.3.2
Version information
- Release date: July 16, 2024
- Version number: V4.3.2 Beta
New features and enhancements
Optimizations for data import and export
Enhancements to incremental direct load
Starting from V4.3.1, incremental direct load is supported as an experimental feature, with some limitations, such as not supporting
outrow lobdata import and only supportingload_modeset toinc_replace(i.e.,replacesemantics). V4.3.2 adds support for incremental direct load ofoutrow lobdata and extends theload_modeparameter to includeincas a new value, with a default ofinsertsemantics. When theignorekeyword is specified in an SQL statement, it will useignoresemantics. This feature is now officially released.Performance optimization for full direct load
In the full direct load scenario, the new version optimizes performance by reducing column type conversion operations during data loading, lowering CPU usage for statistics collection, removing the generation of primary keys on the
SELECTside forINSERT INTO SELECTstatements, and defaulting to disablingsum skip index(which pre-aggregatesSUMvalues for specified columns within a range in the storage layer). Combined with disabling microblock verification (by setting themicro_block_merge_verify_levelparameter to 0), the direct load performance is improved by approximately 20%.Partitioned export using
SELECT INTO OUTFILEOceanBase Database currently supports exporting data to multiple files using
SELECT INTO OUTFILE, but not partitioned export. V4.3.2 introduces the ability to export data by partition, resulting in a clearer directory structure. This feature also allows constructing a partitioned external table from the file directory, improving query efficiency through partition path pruning.
Integration with external data sources
Enhancements to external table functionality
OceanBase Database has long supported CSV external tables, but as AP workloads have expanded, the need to read Parquet-formatted external data sources has become common. V4.3.2 introduces support for Parquet external tables, allowing users to import file data into internal tables or perform cross-source join queries. To ensure the timeliness of files scanned by external tables, the new version includes an automatic refresh feature for file directories. When creating an external table, users can specify the refresh method (manual, real-time, or periodic) using the
AUTO_REFRESHoption. TheDBMS_EXTERNAL_TABLE.REFRESH_ALL_TABLE(withinterval int) system package manages scheduled refresh tasks.
Performance optimizations
Performance optimization for AP scenarios
V4.3.2 optimizes strategies for data block prefetching, vectorized batch processing, filter pushdown, comparison calculations, aggregation calculations, DTL shuffle, and monotonicity filters. In scenarios with 100GB of data, the performance of benchmark tests for AP workloads such as TPCH, TPCDS, and ClickBench improves by approximately 10% to 15% compared to V4.3.1.
Data type and storage optimizations
RoaringBitmap
With the rise of big data, enterprises increasingly rely on user data for analysis and insights. RoaringBitmap, known for its space efficiency and computational speed, plays a crucial role in user profiling, personalized recommendations, and precision marketing. OceanBase Database V4.3.2 introduces support for the RoaringBitmap data type in MySQL mode. This feature allows efficient storage and operations on sets of unsigned integers, enhancing performance for large-scale set calculations and deduplication. To meet multidimensional analysis needs, this version supports over 20 expressions for cardinality calculation, set operations, bitmap checks, bitmap construction, bitmap output, and aggregation operations.
Data operation optimizations
Table-level overwrite write
In data warehouses, scenarios such as data refresh, transformation, and cleansing often require data overwrite writes. OceanBase Database V4.3.2 introduces the
INSERT OVERWRITEfeature, which atomically clears old data and writes new data to a table. Leveraging the full direct load capability,INSERT OVERWRITEalso demonstrates high execution performance. V4.3.2 supports theINSERT OVERWRITE tablename SELECT * FROM tablenamesyntax, but partition-level overwrite write is not yet supported. Partition-level features will be introduced in future versions.
Task scheduling and asynchronous execution
Asynchronous task scheduling
OceanBase Database currently supports various data import commands such as
insert overwrite,insert select,create table as, andload datato facilitate real-time data ingestion. However, real-time import requires waiting for the process to complete and cannot be interrupted, which is impractical for large-scale data imports. The new version introduces asynchronous task scheduling based on DBMS_SCHEDULER. Users can create, check the status of, and cancel asynchronous import tasks using commands likesubmit job,show job status, andcancel job.
V4.3.1
Version information
- Release date: May 17, 2024
- Version number: V4.3.1 Beta
New features and enhancements
Materialized view
OceanBase Database V4.3.0 supports materialized views, which precompute and store query results to reduce real-time computation and improve query performance. This feature simplifies complex query logic and meets the requirements of AP (Analytical Processing) scenarios. In V4.3.1, the feature is extended to support real-time materialized views, which provide real-time computation capabilities based on data from both materialized views and MLOGs. This is suitable for analytical workloads that require high real-time performance. Additionally, V4.3.1 introduces primary key constraints for materialized views, allowing users to specify a primary key for a materialized view to optimize single-row lookups, range queries, and join scenarios based on the primary key. The incremental update capability for inner-joined materialized views is also extended to improve the refresh performance of materialized views in certain scenarios.
In V4.3.0, when using materialized views, you must manually replace references to the original tables in your business scripts with queries to the corresponding materialized views, which introduces manual rewrite costs. In V4.3.1, the feature is extended to support automatic materialized view rewriting in certain scenarios. When the system variable QUERY_REWRITE_ENABLED is set to True, you can specify ENABLE QUERY REWRITE when creating a materialized view to enable automatic rewriting. In this case, the system can rewrite queries to the original tables into queries to the materialized views, reducing the need for business modifications.
Optimizations for data import and export
Incremental direct load (Experimental)
OceanBase Database V4.1.0 supports direct load, which significantly improves data import efficiency by simplifying the data loading execution path. It skips SQL, transaction, and memtable modules and directly persists data as SSTables. However, in scenarios where tables need to be imported multiple times, each import requires rewriting existing data, which affects incremental import performance. V4.3.1 optimizes incremental direct load to avoid rewriting existing data. Instead, it only processes new data, ensuring high performance for multiple imports, similar to the first import. You can specify whether to use incremental direct load in
LOAD DATAandINSERT INTO SELECTstatements by using the/*+ direct(need_sort, max_errors_allowed, load_mode)*/hint. If you do not specifyload_modeor setload_modetofull, the system uses the original full direct load method. If you setload_modetoinc_replace, the system uses incremental direct load. This feature is defined as an experimental feature in V4.3.1 and will be extended and evolved into a production-ready feature in future versions.Performance optimization for SELECT INTO OUTFILE
The existing
SELECT INTO OUTFILEfeature supports parallel data reading but only allows sequential writing to external files, which creates a performance bottleneck for data export. V4.3.1 introduces parallel export capabilities. In addition to theSELECT INTO OUTFILEcommand, it provides thesingleandmax_file_sizeoptions to control how data is written to external files. Thesingleoption allows you to export data to a single file or multiple files. If you specify a parallelism greater than 1 and setsingletofalse, the system exports data to multiple files, achieving parallel read and write. Themax_file_sizeoption allows you to control the size of the exported files.
Index enhancements
Full-text indexes (MySQL mode) (Experimental)
In relational databases, indexes are typically used to accelerate queries based on exact value matching. However, standard B-Tree indexes are not suitable for scenarios involving large amounts of text data that require fuzzy searches. In such cases, full-table scans are necessary to perform fuzzy queries on each row of data, which can be inefficient when the text is large and the data volume is high. Additionally, complex query scenarios such as approximate matching and relevance sorting are difficult to support by rewriting SQL statements.
To address these issues, OceanBase Database V4.3.1 introduces full-text indexes. This feature preprocesses text content and builds keyword indexes to significantly improve full-text search efficiency. The feature is defined as an experimental feature in V4.3.1 and will be extended and evolved into a production-ready feature in future versions. This version includes the following features:
Full-text indexes are supported in MySQL mode and are compatible with basic MySQL syntax.
You can create full-text indexes in advance for CHAR, VARCHAR, and TEXT columns when creating tables.
Full-text indexes are applicable to partitioned tables.
You can create multiple full-text indexes for the main table.
The feature supports three built-in tokenizers: SPACE (space), NGRAM, and BENG (basic English).
You can perform full-text searches on multiple columns using a single
MATCH AGAINSTclause.The feature supports the NATURAL LANGUAGE MODE.
Partition management
Partition exchange
Over time, tables may accumulate a large amount of historical data that is not frequently accessed but needs to be retained for compliance or historical analysis. To improve query performance, businesses often need to distinguish between active and inactive data and archive the inactive data. Migrating data to a new table using SQL can address this scenario, but performance is often suboptimal for large data volumes. Therefore, OceanBase Database V4.3.1 introduces the partition exchange feature. This feature allows you to move data from table A to a partition of table B by modifying the definitions of partitions and tables in the data dictionary without physically copying the data. This nearly instantaneous data migration significantly improves performance. This version includes the following features:
- You can exchange data between a partition of a partitioned table and a non-partitioned table.
- You can create a partitioned table with RANGE (RANGE COLUMNS) partitions.
- You can create a partitioned table with RANGE (RANGE COLUMNS) partitions at both the primary and subpartition levels.
- You can specify the
including indexesoption, which allows local indexes to be exchanged along with the data and remain available after the exchange. - You can specify the
without validationoption, which requires you to ensure that the data conforms to the partitioning key range.
Integration with external data sources
Partitioned external tables
OceanBase Database V4.2.0 supports external tables, which are limited to non-partitioned tables. In scenarios where a large number of files exist but a query only needs to scan a subset of them, non-partitioned external tables can only scan all files, leading to poor performance. V4.3.1 introduces the partitioned external table feature, which supports list partitioning similar to regular tables. It provides both automatic and manual partitioning options. When automatic partitioning is specified, the system groups files into partitions based on the defined partitioning key. When manual partitioning is specified, you need to specify the subpaths for each partition. In this case, external table queries can leverage partition pruning based on the partitioning conditions, reducing the number of scanned files and significantly improving query performance.
Data types and storage optimization
MySQL JSON multi-value index (Experimental)
MySQL 8.0 introduced the multi-value index feature, which is applicable to JSON documents and other collection data types. This feature allows users to build indexes on arrays or collections to efficiently retrieve elements. OceanBase Database V4.3.1 is compatible with the JSON multi-value index feature in MySQL mode. This allows users to create efficient secondary indexes on JSON array fields containing multiple elements, enhancing the query capabilities of complex JSON data structures while maintaining the flexibility of the data model and high query performance. This feature is defined as experimental in V4.3.1 and will be expanded and evolved into a production-ready feature in subsequent versions. The following features are included in this version:
- Supports pre-creation of multi-value indexes and composite multi-value indexes.
- Supports unique and non-unique multi-value indexes.
- Supports creating multi-value indexes with JSON array elements of type INT, UINT, DOUBLE, FLOAT, and CHAR.
- Applicable to partitioned tables.
- Supports using multi-value indexes with the MEMBER_OF(), JSON_CONTAINS(), and JSON_OVERLAPS() functions.
MySQL JSON partial update
Some users store business data in JSON documents. When updating a document, they currently need to read the entire document and perform a full update, which can be inefficient for large documents. V4.3.1 introduces the JSON partial update feature. When users update specific fields in a JSON document using expressions like json_set, json_replace, or json_remove, they can only update the modified fields without needing to perform a full overwrite. This improves update performance. To use this feature, you need to enable it using the log_row_value_options parameter.
Query execution and resource management
SQL temporary result compression
When the amount of data involved in SQL execution is excessively large, memory may become insufficient, and some operators need to materialize temporary intermediate results. If the materialized data exceeds the available disk space, SQL execution will fail. V4.3.1 introduces the SQL temporary result compression feature. This allows users to specify whether to compress temporary results and the compression algorithm using tenant-level parameters like spill_compression_codec or SQL-level hints (e.g., /*+opt_param('spill_compression_codec', 'lz4') */). Specifying temporary result compression effectively reduces the temporary disk space usage, supporting larger-scale query tasks.
V4.3.0 Beta
Version information
- Release date: March 22, 2024
- Version number: V4.3.0 Beta
New features
Columnar storage engine
In scenarios involving complex analysis of large-scale data or ad-hoc queries on massive data, columnar storage is a key capability of an AP database. Columnar storage is a data organization method that differs from row-based storage by arranging table data physically by columns. When data is stored in a columnar format, analysis scenarios can scan only the columns required for query computation, avoiding full-row scans and reducing resource usage such as I/O and memory, thereby improving computational speed. Additionally, columnar storage inherently offers better conditions for data compression, making it easier to achieve high compression ratios, which reduces storage space and network transmission bandwidth.
However, common columnar storage engines often assume minimal random updates to maintain static data organization. When faced with frequent random updates, performance issues are inevitable. The OceanBase LSM-Tree architecture addresses this by separately handling baseline and incremental data. Therefore, V4.3.0 extends support for columnar storage engines within the existing architecture, achieving integrated storage of columnar and row-based data on a single codebase, architecture, and OBServer node. This approach balances TP and AP query performance.
To facilitate AP business migration and ensure smooth use of the new version, multiple modules, including the optimizer, executor, DDL, and transaction processing, have been adapted and optimized around the columnar storage engine. This includes a new cost model based on columnar storage, a vectorized engine, extended and enhanced query pushdown functionality, Skip Index, new columnar encoding algorithms, and adaptive compaction.
Users can flexibly configure tables as row-based, columnar, or hybrid (row-column redundant) based on their workload type.
New vectorized engine
OceanBase has already implemented a vectorized engine based on the Uniform data description method in earlier versions, significantly improving performance compared to non-vectorized engines. However, in deep AP scenarios, there are still some performance limitations. V4.3.0 introduces the 2.0 version of the vectorized engine, which uses the Column data format description to avoid the memory usage, serialization, and read/write access overheads associated with ObDatum. Based on this data format description, the new version has also re-implemented a batch of commonly used operators and expressions, including over 10 operators such as HashJoin, AGGR, HashGroupBy, and Exchange(DTL Shuffle), as well as over 20 MySQL expressions for relational, logical, and arithmetic operations. In subsequent V4.3.x versions, the new vectorized engine will continue to be used to supplement and enhance the implementation of other operators and expressions to achieve optimal performance in AP scenarios.
Materialized views
V4.3.0 introduces the Materialized View feature. Materialized views are a key feature that supports AP business by precomputing and storing query results, reducing real-time computation to improve query performance and simplify complex query logic. They are commonly used in scenarios requiring rapid report generation and data analysis.
Since materialized views require storing query result sets to optimize query performance, and there is a data dependency between materialized views and base tables, whenever the data in the base table changes, the data in the materialized view must be updated to stay synchronized. Therefore, the new version introduces a materialized view refresh mechanism, including full refresh and incremental refresh strategies. A full refresh is a straightforward approach where, each time a refresh operation is executed, the system re-executes the corresponding query statement of the materialized view to completely compute and overwrite the original view result data. This approach is suitable for scenarios with relatively small data volumes. In contrast, an incremental refresh only processes the data that has changed since the last refresh. To achieve precise incremental refreshes, OceanBase has implemented a feature similar to Oracle's MLOG (Materialized View Log), which tracks and records incremental updates to the base table through logs, ensuring that materialized views can perform fast incremental refreshes. The incremental refresh approach is particularly suitable for scenarios with large data volumes and frequent changes.
