AP feature version history|V4.6.0|OceanBase Database| docs|Distributed Database

This topic only lists the major updates to the core capabilities of real-time analysis. For more information about the full list of features and updates of OceanBase Database, see Release Note.

V4.4.2

Version information

Release date: February 10, 2026
Version number: V4.4.2

New features and enhancements

Enhanced materialized view capabilities

The V4.4.2 version continues to enhance the capabilities of materialized views. It supports the RENAME operation and column addition for materialized view logs at the DDL level, and automatically creates, replaces, and cleans up redundant data in MLOG tables in the background. The incremental refresh capability is significantly expanded to support complex query modes such as outer joins, UNION ALL, non-aggregated single-table queries, and aggregated queries with LEFT JOIN. It also strengthens the processing of MIN/MAX aggregate functions, supports repeated SELECT items and GROUP BY columns, and adapts to scenarios without a primary key in base tables. The AS OF PROCTIME() syntax is introduced to enable dynamic exemption from refreshes for dimension tables (supporting views that reference this syntax). The cascading refresh mechanism for nested materialized views is also improved. Additionally, support for UDT/UDF, minimal mode, and the md5_concat_ws function is enhanced. The display of view content and error messages during creation are optimized, significantly improving the flexibility, performance, and ease of use of materialized views in complex queries, real-time analysis, and O&M management.

V4.4.1

Version information

Release date: September 26, 2025
Version number: V4.4.1

New features and enhancements

Support for HMS Catalog

V4.4.1 supports Iceberg, an open table format for data lakes, and allows querying external tables in Iceberg and Hive formats through HMS Catalog. HMS Catalog is introduced to build a unified metadata abstraction layer compatible with the HMS protocol. It automatically synchronizes metadata across different table formats, such as Hudi Timeline and Iceberg Snapshots, and seamlessly integrates with mainstream computing engines. This addresses metadata ambiguity across systems, making it a key infrastructure for OceanBase to transition from "compute-storage separation" to "metadata-driven governance."
Support for JDBC plugins in external tables

OceanBase allows direct access to various external data sources through external tables, such as CSV files stored on OSS and ODPS tables. V4.4.1 introduces JDBC plugin support, enabling integration with JDBC-compatible data sources. Currently, MySQL data sources are supported.

V4.4.0 Beta

Version information

Release date: July 8, 2025
Version number: V4.4.0

New features and enhancements

Support for ODPS Storage API

The V4.4.0 version introduces support for the ODPS Storage API, allowing direct access to ODPS's underlying storage from external tables. This eliminates the need for establishing independent sessions for each partition scan in the Tunnel API, significantly reducing initialization latency and enhancing performance for small and frequent queries. Compatibility with the Tunnel API is retained to meet diverse access requirements.
SELECT INTO OUTFILE supports HDFS paths

The V4.4.0 version continues to enhance external table capabilities, supporting direct querying and importing of files from HDFS paths. The new version further integrates HDFS capabilities, allowing direct data export to HDFS and supporting Kerberos authentication for secure access.

V4.3.5 BP5

Version information

Release date: November 17, 2025
Version number: V4.3.5 BP5

New features and enhancements

Materialized View improvements

V4.3.5 BP5 supports incremental refresh of aggregated views with a LEFT JOIN, nested materialized views created based on keyless incremental materialized views, MIN/MAX aggregate functions with non-basic columns as parameters for incremental materialized views, incremental materialized views using dimension tables (AS OF PROCTIME()) in multi-table joins, full refresh of materialized views referencing UDFs, minimal refresh mode for incremental materialized views (DMLs during incremental refresh only read and write necessary columns in columnar tables), and refresh of materialized views using session variables.
Skip Index for Delete-Insert table incremental data queries is supported.

V4.3.5 BP5 supports incremental Skip Index for Delete-Insert tables. In incremental query processing, the Skip Index pre-trims data for improved query performance. You can use the default_skip_index_level parameter at the tenant level to control the scope of Skip Index capabilities.
Exchange partition support added

Before V4.3.5 BP5, the exchange of a partition in a range/range columns partitioned table with a non-partitioned table and the exchange of a partition in a subpartitioned range/range columns partitioned table with a non-partitioned table are supported. In addition, exchanging a primary range/range columns partition from a subpartitioned table with a primary partitioned table is supported. In V4.3.5 BP5, the exchange of a partition in a list/list columns partitioned table with a non-partitioned table is supported.

V4.3.5 BP4

Version information

Release date: September 10, 2025
Version number: V4.3.5 BP4

New features and enhancements

Enhanced materialized view capabilities

V4.3.5 BP4 continues to enhance materialized view capabilities. From the functional perspective, it supports incremental refresh of materialized views without refreshing dimension tables. When creating a materialized view, you can use the new AS OF PROCTIME() syntax to specify tables that do not need to be refreshed. During the incremental refresh of a materialized view, the incremental data of these tables will not be refreshed. Single-table aggregation incremental refresh materialized views support the MIN() and MAX() aggregate functions. From the usability perspective, it supports automated management of MLOG tables. When creating an incremental refresh materialized view, the system automatically creates or replaces the required MLOG tables for the base table and periodically cleans up redundant MLOG tables in the background. It also optimizes the view content and error messages when creating materialized views.
Disabled full bypass import by default for AP parameter templates.

V4.3.5 BP3

Version information

Release date: July 21, 2025
Version number: V4.3.5 BP3

New features and enhancements

Optimizations for external data sources and integrations

ODPS external table support for Storage API

In previous versions, the execution of an external table based on the Tunnel API required a separate session for each partition, which introduced a second delay. This delay could be significant for queries that are inherently short. To address this, the new version supports the ODPS Storage API, which allows direct access to the underlying storage of ODPS. This provides more optimization strategies and better read performance. The new version has been adapted to support the ODPS Storage API, further enhancing the performance of external table access.
Support for Azure Blob storage

The new version supports accessing Azure Blob storage using the Azure Blob protocol.

Enhanced materialized views

Enhancements to materialized views

The new version continues to improve the capabilities of materialized views. Key enhancements include support for renaming materialized views, incremental refreshes for outer joins, UNION ALL, and non-aggregated single-table operations, support for adding columns to materialized view logs, and cascading refreshes for nested materialized views. Additionally, multiple issues, such as materialized view drop hangs, have been fixed to improve the stability of materialized views.

Optimizations for storage and partition management

Support for partition exchange between primary and secondary partitioned tables

The new version supports partition exchange between primary partitions of secondary partitioned tables and primary partitions of primary partitioned tables.

Optimizer and query performance improvements

Enhanced statistics collection

In versions prior to V4.3.5 BP3, statistics collection required gathering aggregated data such as MIN, MAX, NULL COUNT, and NDV for each column at the partition and table levels, and estimating data distribution through sampling, which often resulted in significant errors. The new version leverages the Skip Index capability at the storage layer to collect more accurate column-level aggregated data at the block level for all columns in columnar storage. This helps the optimizer generate and optimize plans with greater accuracy. Additionally, the new version introduces a progressive statistics collection approach for partitions to address the issue of timeouts caused by long statistics collection times for large partitioned tables, thereby improving system stability and data timeliness.

Accelerated data import

Optimized performance for direct load

The new version optimizes the execution logic for direct load, including sorting algorithms, vectorized paths for direct load, CSV parsing, column_conv expressions, and CS encoding. In the scenario of columnar storage for non-partitioned heap tables, performance can improve by up to 10%.

V4.3.5 BP2

Version information

Release date: May 15, 2025
Version number: V4.3.5 BP2

Product forms

Shared-Storage AP

Starting from V4.3.5 BP2, OceanBase Database can be deployed in the Shared-Storage AP form to handle AP workloads. This form supports key AP features in the Shared-Nothing deployment mode, such as columnstore tables, incremental direct load, materialized views, full-text indexes, and multi-value indexes. It also optimizes the I/O read method in the Shared-Storage deployment mode. The Shared-Storage AP form is recommended for AP workloads with data cold-warm characteristics that aim to reduce storage costs.

Feature enhancements

Enhanced data recovery capabilities

Support for restoring columnstore tables at the table level

Previously, the table-level restore feature only supported restoring to a rowstore table. In this version, the table-level restore feature is enhanced in the Shared-Nothing deployment mode to support restoring to columnstore tables and hybrid rowstore-columnstore tables.

Performance optimizations

Optimized performance of direct load for heap tables

The uniqueness check process during the creation of a unique index in the direct load of a heap table is optimized. The performance of a full direct load of a non-partitioned table is not worse than that of a primary key table in an index-organized table.
Optimized performance of hard parsing

In some AP scenarios or for some small accounts, disabling the plan cache and using hard parsing can reduce the use of suboptimal plans. However, this requires higher performance from hard parsing. In this version, the optimizer's resolve and rewrite stages are optimized by improving the hot functions in the resolver stage, replacing global cost verification with local cost verification where applicable, reducing memory usage for expressions, and eliminating unnecessary expression type inference operations. This enhances the performance of the optimizer's resolve and rewrite stages.

Enhanced materialized views

Support for building materialized views based on external tables in full refresh scenarios

Previously, OceanBase Database supported building materialized views based on user tables and materialized views. In this version, materialized views based on external tables are supported in full refresh scenarios, expanding the applicable scenarios for materialized views.
Enhanced diagnostic capabilities for materialized views

New system views CDB/DBA_MVIEW_RUNNING_JOBS and DBA_MVIEW_DEPS are introduced to display ongoing materialized view tasks, such as refresh tasks and MLOG cleanup tasks, and the dependency objects of materialized views, respectively. The CDB/DBA/ALL/USER_MVIEWS system views now record the refresh points and refresh delays of materialized views. The CDB/DBA/ALL/USER_MVIEW_LOGS views now record the parallelism and duration of MLOG cleanup tasks.
Resource isolation for materialized view tasks

Refreshing and purging materialized views consume system resources, which may affect the performance of foreground tasks. In this version, resource isolation is provided for incremental refreshes and MLOG purges of materialized views based on the Resource Manager. Users can set resource limits for materialized view tasks.

Partition management

Dynamic partition management

Businesses often use date or time as the range partition key to divide data into different partitions based on different time periods. Previously, DBAs had to pre-create partitions for future time periods to meet data writing requirements. If a DBA forgets to pre-create partitions, data may fail to write due to the absence of partitions, affecting business operations. In this version, dynamic partition management is introduced. By specifying the dynamic partition management attribute for a table, the kernel automatically pre-creates partitions for future time periods, reducing the burden on DBAs. Dynamic partition management also supports automatically deleting obsolete partitions that are no longer needed, saving storage space.

Indexes and query optimization

NGRAM2 tokenizer for full-text indexes

The built-in NGRAM2 tokenizer is introduced for full-text indexes, supporting NGRAM tokenization within the min_ngram_size to max_ngram_size range. The NGRAM2 tokenizer is suitable for scenarios where performance and storage space are relatively insensitive, and different-length token retrieval is required. For fixed-length token, the NGRAM tokenizer can be used.

Data types and storage optimization

Map data type

The map data type is used to store unordered key-value pairs, such as {a:1, b:2, c:3}. Common use cases include saving configuration options, user attributes, and product information. In this version, OceanBase Database implements the map data type based on the array framework and supports map constructors and functions and operators such as map_keys, map_values, =, and !=.
MySQL mode JSON semi-structured storage

Currently, semi-structured data is stored as binary strings, making it difficult to encode and compress using encoding methods. Each data record redundantly stores some data metadata for self-explanation, resulting in a low compression rate. JSON is a typical semi-structured data type with strong structural characteristics. In theory, semi-structured data can be logically split into multiple base-type columns, with the non-decomposable parts stored in a binary column. Structured columns can leverage encoding to improve the compression rate, while non-decomposable parts, by extracting data metadata, reduce space usage compared to before. This is referred to as "structured storage of semi-structured columns." In this version, OceanBase Database supports semi-structured storage of JSON data and implements corresponding encoding methods to reduce storage space usage and optimize query and filter performance on JSON sub-columns. However, since JSON data is stored as multiple sub-columns, frequent reads of the original JSON data may introduce merge overhead, leading to performance degradation. Whether to enable this feature depends on the business requirements.

Query execution

Adaptive plan cache

Plan cache is a crucial capability in TP scenarios and is enabled by default. In some AP scenarios, the execution time of a plan can be much longer than the generation time. In such cases, generating a new plan may result in a better execution plan than reusing an existing one, leading to improved execution performance. In mixed HTAP workloads, it is necessary to provide a way to enable hard parsing for AP-skewed SQL statements while allowing TP-skewed SQL statements to reuse plans to avoid hard parsing overhead. In this version, OceanBase Database introduces adaptive plan cache, which can decide whether to use the plan cache for a SQL statement based on its execution time and distribution of execution times.

External data source integration

MySQL mode ODPS (MaxCompute) catalog

To facilitate querying data stored in various external data sources, this version introduces the External Catalog framework and supports the ODPS catalog feature. OceanBase Database's internal objects are managed under the Internal Catalog. Users can create External Catalogs to connect to external data sources and retrieve metadata. Direct querying of tables under an External Catalog is supported, such as select col1 from catalog1.database1.table1;. Cross-catalog table joins are also supported when permissions are met. This eliminates the need for data imports or creating individual external tables, improving the usability of external data access.

Data import

Load Data import with URL CSV external table error tolerance mode

Previously, when data type incompatibility or precision mismatch issues occurred during data import using the Load Data method, errors were directly reported. In this version, the LOG ERRORS instruction is introduced for error-tolerant imports. In MySQL mode, failed rows are recorded, and users can view error data using show warnings for error diagnosis.

V4.3.5 BP1

Version information

Release date: March 18, 2025
Version number: V4.3.5 BP1

New features and enhancements

Data import and export

Support for incremental direct load of heap tables with unique indexes

Direct load can be performed in full or incremental mode. When data already exists in the table, incremental direct load performs better than full direct load. However, incremental direct load previously did not support importing data into heap tables with unique indexes. This version now supports incremental direct load for heap tables with a local unique index.
Support for HASH partitioned and concurrent direct load

Starting from V4.3.5 BP1, direct load is supported for HASH-partitioned tables. Additionally, incremental direct load now supports concurrent loading across multiple partitions. However, if multiple import tasks have overlapping partitions, concurrent partition loading is not supported.
Performance optimization for direct load

Through vectorization optimization, streamlined temporary file processing, and improved computational performance in each stage, the import performance of the clickbench table has improved by 17% compared to V4.3.5.

External data source integration

Support for reading HDFS files from external tables

HDFS is the most common storage medium in data lake architectures and serves as a platform for multi-engine data sharing. Therefore, this version supports direct reading of HDFS files from external tables and also supports Kerberos authentication for HDFS clusters.
Support for reading and writing data from and to MaxCompute (ODPS)

This version supports reading data from MaxCompute using external tables or URL-based external tables, and also supports writing data from OceanBase to MaxCompute.
URL-based external tables

When accessing external data sources in OceanBase, you typically need to create an external table and then use SQL queries. This version introduces URL-based external tables, allowing direct querying of external CSV, Parquet, ORC, and MaxCompute data without creating an external table. Additionally, it extends the LOAD DATA syntax to directly reference URL-based external tables for data import, reducing the operational cost of data import.

Index optimization

Enhancements to full-text indexing

Starting from OceanBase V4.3.1, full-text indexing is supported in MySQL mode. Subsequent versions have added various features and enhancements. V4.3.5 BP1 introduces the built-in Chinese IK tokenizer, supporting both smart and max_word modes. It also supports adding new tokenizers via plugins. A new PARSER_PROPERTIES attribute is added for full-text indexes, allowing table-level configuration. In addition to the existing natural language mode, a boolean mode is now supported. You can use the IN BOOLEAN MODE keyword in MATCH AGAINST to enable boolean mode, supporting operators such as +, -, (), and no operator, as well as nested boolean operations. This version also supports full-text indexes in index merge operations, improving query performance through index union merge. It optimizes update and delete scenarios for full-text indexes on the main table, enhancing update and delete performance.

Materialized view enhancements

Enhancements to materialized views

The parallelism of materialized view refreshes is closely related to the efficiency of incremental and full refreshes. This version supports complete parallelism control mechanisms, allowing you to set the refresh parallelism using the mview_refresh_dop system variable, the DBMS_MVIEW.refresh stored procedure, or the PARALLEL clause when creating a materialized view. This version also supports the LOB data type for incremental materialized views and provides the ability to directly modify materialized views or materialized view attributes using ALTER MATERIALIZED VIEW and ALTER MATERIALIZED VIEW LOG commands. You can modify the parallelism, background refresh task intervals, MLOG table parallelism, MLOG table background cleanup task intervals, and MLOG LOB inline storage length threshold. Additionally, this version supports adding columns to MLOG base tables, including both appending and inserting columns in the middle. It also improves MLOG cleanup performance and optimizes error messages for incremental refreshes.

Table structure and storage optimization

Heap table organization mode

OceanBase uses a clustered index table model, which performs well in OLTP environments. It stores primary keys and main table data in the same table, optimizing primary key query speed and uniqueness validation performance. However, this model has limitations in OLAP scenarios requiring efficient data import and complex data analysis. Import processes require sorting of full data, and queries must perform primary key row fusion, both of which impact performance. This version introduces the heap table organization mode, where primary keys are used for uniqueness constraints, and queries rely on the main table. When user data is sorted by time, skip indexes can be more effectively utilized to improve query efficiency. Decoupling primary keys from data also eliminates the need for sorting main table data during imports, enhancing import performance.

Data type and storage optimization

New String data type in MySQL mode

Some AP applications require columns with high tolerance for varying lengths, often using variable-length string types for data storage and requiring primary keys or index keys. In OceanBase, CHAR and VARCHAR types can store strings and serve as primary keys or index keys, but require specified lengths. MEDIUMTEXT and TEXT types can store variable-length strings but cannot serve as primary keys or index keys. To better meet AP application needs, this version introduces the String data type, which does not require specified lengths and has a default maximum length of 16 MB. When the column length is less than 16 KB and does not exceed lob_inrow_threshold, it can serve as a primary key or index key.
Enhancements to ARRAY type functions

OceanBase supports the ARRAY data type and related functions and operators starting from version 4.3.3. This version further enhances these functions, adding support for array_prepend, array_concat, array_compact, array_filter, array_sort, array_sortby, array_length, array_range, array_sum, array_first, array_difference, array_min, array_max, array_avg, array_position, array_slice, and reverse.

Parallel query optimization

Decoupling of PX compute nodes and data nodes

In the current system architecture, storage and compute resources are highly coupled. The existing PX scheduling mechanism for Non-Leaf DFOs only selects machines where data is stored for compute task allocation, which can limit resource utilization in certain scenarios. To enable large SQL queries to utilize more machines, this version decouples PX compute nodes from data nodes. You can configure or use hints (PX_NODE_POLICY) to determine the candidate resource pools for Non-Leaf DFOs. It also provides hints (PX_NODE_ADDRS, PX_NODE_COUNT) to force the allocation of specific machines or a specific number of machines for Non-Leaf DFOs.

Query optimization and pushdown

Support for query pushdown in rowstore data column addition scenarios

During a rowstore SCAN, if there is no intersection between major sstable and other rowid data, the SCAN is pushed down to the storage layer. After a major compaction, if new columns are added, the SCAN can theoretically be pushed down. However, if the new column parsing fails, the pushdown fails, and the system reverts to the less efficient row-by-row read. This version optimizes this scenario, supporting query pushdown to the storage layer to improve query performance.

Columnar storage optimization

Optimizations for columnar replicas

The columnar replica feature introduced in V4.3.3 had some limitations, such as supporting only a single C replica, not allowing direct full import of columnar data to C replicas, and lacking user-friendliness. This version addresses these limitations by allowing multiple C replicas in a cluster, direct full import of columnar data to C replicas, and the addition of system views CDB/DBA_OB_CS_REPLICA_STATS to monitor columnar replica progress at the partition and log stream levels.

Behavior changes

NLJ plans are disabled for AP templates

After the optimizer generates a nested loop join plan, the plan is sensitive to the number of rows in the driving table. If the estimated number of rows in the driving table is significantly smaller than the actual number, the actual execution performance of the plan may severely degrade. In OLAP scenarios, the data volume is generally large, and the benefits of using nested loop join plans are limited. Therefore, in AP scenarios, the default is to generate hash join plans, and nested loop join plans are only generated in non-equi join scenarios.

V4.3.5

Version information

Release date: December 31, 2024
Version number: V4.3.5

New features and enhancements

Materialized view enhancements

Support for nested materialized views

In versions earlier than V4.3.5, materialized views could only be created on regular user tables. In data warehouse scenarios, materialized views are used for data processing. To better support lightweight real-time data warehouses, V4.3.5 introduces nested materialized views, which allow new materialized views to be created based on existing ones. The refresh methods for nested materialized views are the same as those for non-nested ones, including full and incremental refreshes. Therefore, V4.3.5 supports creating materialized view logs based on materialized views. The freshness of data in nested materialized views depends on the base table's data freshness. To ensure the freshness of upper-level materialized views, users must refresh the underlying materialized views first.

Data import and export

Support for specifying partitions for direct load

Before V4.3.5, OBServer only supported full-table direct load to accelerate data import. If users wanted to import data from only specific partitions, they had to use partition exchange. This involved importing data from the specified partitions to a non-partitioned table via full direct load and then exchanging the non-partitioned table with the target partition. This process was cumbersome. To better support partition-level data imports, V4.3.5 allows users to specify partitions for direct load using the LOAD DATA and INSERT INTO SELECT syntax. However, it does not support specifying the last-level partition type as Hash/Key.
Enhancements to the select into outfile feature

V4.3.5 enhances the select into export feature by supporting the export of parquet and orc types, allowing users to specify compression algorithms when exporting csv types, and introducing new syntax for select into.
- Use the format syntax to set export options.
- Specify a compression algorithm when exporting csv files.
- Export parquet files.
- Export orc files.

Index enhancements

Enhancements to full-text indexing

V4.3.5 continues to enhance the full-text indexing feature based on V4.3.4, including:
- Support for creating full-text indexes using the CREATE FULLTEXT INDEX or ALTER TABLE ADD FULLTEXT INDEX statements.
- Support for cost-based full-text index plan selection, allowing the selection of less expensive indexes for scanning when multiple filters with MATCH AGAINST are present in an SQL query.
- Support for Functional Lookup:
  - Support for multiple MATCH AGAINST expressions in query statements.
  - Support for scanning other indexes while using full-text indexes.
  - Support for outputting MATCH AGAINST expressions without filter semantics as projected columns.
  - Support for filtering MATCH AGAINST expressions with filter semantics using <=/< operators.
  - Support for linking MATCH AGAINST expressions with filter semantics using AND/OR logic.

External data source integration

Enhancements to external tables

In V4.3.5, external tables support reading ORC format files.

Columnar storage optimization

Read-only columnar replicas

The read-only columnar replica feature introduced in V4.3.3 was an experimental feature. After iterations in V4.3.4 and V4.3.5, this feature has now met the GA requirements in V4.3.5.

Support for asynchronous online DDL during row-to-column conversion

V4.3.0 introduced three storage formats for tables: row-based, pure columnar, and hybrid row-column. Converting a row-based table to a hybrid or columnar table was an offline DDL operation. To minimize the impact of offline DDL on customers, V4.3.5 supports asynchronous online DDL for row-to-column conversion. By specifying the delayed keyword during DDL execution, users can modify the table schema in real-time without blocking data writes. The columnar baseline data is then reorganized asynchronously during baseline data compaction. V4.3.5 supports both row-to-column and row-to-hybrid conversions as online DDL operations.

Performance optimization

Vectorization in direct load write path

Before V4.3.5, direct load did not use vectorization, and write functions processed data row by row, resulting in significant function call overhead. Additionally, writing to sstables was a background process, and some operations lacked specific optimizations. V4.3.5 introduces vectorization in the direct load path and optimizes columnar encoding to enhance direct load performance. Testing has shown that direct load performance for tables without a primary key can be improved by approximately 200%.

Data type and storage optimization

Support for complex DML statements with JSON multi-value indexes

V4.3.5 enhances support for complex DML statements with JSON multi-value indexes. It allows multi-value predicates in update and delete statements and supports complex DML operations that involve multi-value index lookups.
Support for more array expressions

OceanBase Database introduced the ARRAY data type in V4.3.3. To better support business use cases involving arrays, V4.3.5 adds more array expressions to meet diverse business needs. New expressions include array_append, array_distinct, arrayMap, array_remove, cardinality, element_at, array_contains_all, array_overlaps, array_to_string, array_agg, unnest, and rb_build.

V4.3.4

Version information

Release date: October 31, 2024
Version number: V4.3.4

New features and enhancements

Storage architecture optimization

Shared storage (experimental feature)

To provide more cost-effective database services in public cloud environments, OceanBase Database V4.3.4 introduces the Shared-Storage Cluster architecture based on a general-purpose object storage. This architecture allows multiple tenants to share a baseline dataset and log data stored in a standard object storage. Hot data is stored in cloud disks or local SSDs, while cold data is stored in standard object storage, reducing storage costs. Each tenant's replicas cache only hot data and logs locally, enabling quick scaling of compute nodes. This architecture is suitable for TP workloads with distinct hot and cold data, such as historical and backup databases, as well as key-value and time-series workloads.

This feature is defined as experimental in V4.3.4 and will be further enhanced in future versions to become a production-ready feature.

Indexing capabilities

Enhanced full-text indexing

OceanBase Database has supported full-text indexing since version 4.3.1. By preprocessing text content and building keyword indexes, it significantly improves the efficiency of full-text search. However, in complex DML scenarios involving full-text indexes, there were some usage limitations that caused inconvenience for businesses. To better support business development, version 4.3.4 enhances the complex DML functionality, supporting operations such as INSERT INTO ON DUPLICATE KEY, REPLACE INTO, multi-table updates/deletes, and updatable views for tables with full-text indexes. Additionally, the new version optimizes full-text search performance using vector execution and extended TAAT processing strategies. It also removes the token limit for the MATCH AGAINST predicate, allowing full-text indexes to be created on partitioned tables without a primary key. Furthermore, it supports the tokenize function to view tokenization results, aiding in debugging tokenization systems.
Enhanced multi-value indexing

Currently, multi-value indexing only supports pre-created indexes, with some usage limitations for DML and DDL operations. To better support business use, V4.3.4 relaxes some of these limitations and enhances the functionality. This includes supporting complex DML operations such as INSERT INTO ON DUPLICATE KEY, REPLACE INTO, multi-table updates/deletes, and updatable views.

Data import optimization

Support for non-unique local indexes in incremental direct load

OceanBase Database supports both full and incremental direct loads. In scenarios where data already exists in the table, incremental direct load performs better than full direct load. Previously, incremental direct load did not support loading data into tables with indexes. V4.3.4 now supports incremental direct load for tables with non-unique local indexes.
Optimized direct load performance

During direct load, if a table with a primary key requires sorting, data is first written to a temporary file, then loaded into memory for sorting and merging, and finally written to an sstable. This two-stage process (data writing and sorting/merging) can cause performance bottlenecks if data is not fully written or if I/O reaches its limit. V4.3.4 optimizes this by directly writing data to memory instead of a temporary file, allowing sorting and merging to proceed in a pipeline fashion. This avoids the impact of earlier stages on later stages. In primary key tables, this optimization improves performance by 35% for 1TB of data. Larger memory and CPU resources result in greater performance gains, making this optimization ideal for large-scale tenants. For smaller tenants, performance remains stable without significant degradation.

Data types and storage optimization

Enhanced bitmap functionality

A bitmap, also known as a bitset, is a data structure for efficiently storing and processing whether an element exists in a set. OceanBase Database V4.3.2 and V4.3.3 supported some features of RoringBitmap, and V4.3.4 further enhances this by adding the rb_iterate and rb_select expressions. rb_iterate expands a roaringbitmap type into multiple rows based on the number of elements, while rb_select filters elements from a roaringbitmap type based on conditions and returns a new roaringbitmap type.

V4.3.3

Version information

Release date: September 30, 2024
Version number: V4.3.3

New features and enhancements

Columnar storage optimization

Read-only columnar replicas (experimental feature)

OceanBase Database V4.3.0 supports columnar storage. You can create a columnar table, a rowstore table, or a hybrid table. Regardless of the table type, the replica mode is consistent across zones in a tenant. For example, if a tenant has three zones and a hybrid table is created, the table will have one rowstore replica and one columnar replica in each zone. To meet the physical isolation requirements of TP and AP resources in hybrid workloads, V4.3.3 introduces a new deployment mode that allows you to expand a dedicated zone for read-only columnar replicas (Column Store Replicas, abbreviated as C replicas). In this zone, all user tables are stored in columnar format. AP workloads can use an independent ODP and set the ob_route_policy session-level system variable to COLUMN_STORE_ONLY to access the columnar replicas for weak-consistency reads without affecting TP workloads. Additionally, similar to the 3+1 zone deployment, this mode reduces storage overhead compared to hybrid tables. It requires ODP V4.3.2 or later.

V4.3.3 defines this feature as experimental and will continue to refine it in future versions.

Enhanced materialized views

Enhanced materialized view capabilities

To reduce the manual rewrite cost, OceanBase Database V4.3.1 supports materialized view rewriting. When the QUERY_REWRITE_ENABLED system variable is set to True, you can enable automatic rewriting by specifying ENABLE QUERY REWRITE when creating a materialized view. This allows the system to rewrite queries against the original table to queries against the materialized view. V4.3.1 supports rewriting for non-aggregate materialized views in cases where the FROM clause fully matches and the WHERE clause partially matches. V4.3.3 further supports rewriting for non-aggregate materialized views with FROM clause joins, queries involving tables not present in the materialized view, and aggregate materialized views, including rewriting for aggregation rollups.

Additionally, the new version extends the types of SQL supported for incremental refreshes and real-time materialized views. Earlier versions supported incremental refreshes and real-time queries for single-table aggregations and multi-table joins. V4.3.3 now supports incremental refreshes for join aggregations.

Furthermore, OceanBase Database previously only supported materialized views in rowstore format. V4.3.3 introduces support for columnar materialized views, which can improve query performance in complex analytical scenarios involving materialized view references.

Optimized data import and export

Optimized external table import performance

V4.3.3 optimizes the performance of reading data from external tables during direct load, achieving approximately a 15% improvement over the previous version.
Enhanced INSERT OVERWRITE capabilities

OceanBase Database V4.3.2 supports table-level overwrite writes (INSERT OVERWRITE), allowing atomic clearing of old data and writing of new data to support AP workloads such as periodic data refreshes, data transformations, and data cleansing. However, V4.3.2 only supported full-table overwrite writes and did not provide partition-level or partial-column overwrite capabilities. V4.3.3 enhances this feature by allowing you to specify the target table as a partitioned or subpartitioned table and to specify partial column information in the INSERT OVERWRITE statement, making data overwrite writes more flexible and applicable to a wider range of business scenarios.
Support for compressed files in Load Data and external tables

Load Data is a common data import method that supports various use cases, including server-side, direct, and client-side imports. It allows you to import text files in a specified format into the database. Previously, only plain text files were supported, and compressed files like GZIP had to be decompressed before import, which was cumbersome. The new version supports importing compressed files, including GZIP, DEFLATE, and ZSTD formats. These files can be loaded, decompressed, and written to the database simultaneously. Additionally, CSV external tables now support compressed files, allowing you to query data from compressed files directly.

Data type and storage optimizations

ARRAY data type

The ARRAY data type is commonly used in AP workloads to store multiple elements of the same type. It is suitable for managing and querying multi-valued attributes that cannot be effectively represented by relational data. OceanBase Database V4.3.3 supports the ARRAY data type in MySQL mode. You can define a column as an array of numeric or character values, and even as a nested array. It supports expressions for querying or writing array objects, including the array_contains expression and the ANY operator for checking if an element exists in an array. It also supports operators like +, -, =, and != for array element calculations and comparisons.
Optimized RoaringBitmap performance

V4.3.2 introduced the RoaringBitmap data type and related expressions to support multi-dimensional analysis in scenarios such as user profiling, personalized recommendations, and precision marketing. However, performance was suboptimal in some cases. V4.3.3 focuses on optimizing the performance of RoaringBitmap calculations by improving memory allocation and expression execution logic, reducing unnecessary performance overhead, and significantly enhancing execution performance for cardinality, AND/OR/XOR/ANDNOT operations, and aggregation scenarios.

Resource management and isolation

Query-level resource group configuration

OceanBase Database currently supports configuring resource groups at the user, background task, and column parameter levels using the DBMS_RESOURCE_MANAGER system package for CPU and IOPS isolation. V4.3.3 introduces query-level resource group binding. You can specify the /*+ resource_group('group_name') */ hint in an SQL statement to force the use of a specific resource group. If the specified resource group does not exist, the default resource group is used. Switching resource groups requires a session reconnect to take effect.

V4.3.2

Version information

Release date: July 16, 2024
Version number: V4.3.2 Beta

New features and enhancements

Optimizations for data import and export

Enhancements to incremental direct load

Starting from V4.3.1, incremental direct load is supported as an experimental feature, with some limitations, such as not supporting outrow lob data import and only supporting load_mode set to inc_replace (i.e., replace semantics). V4.3.2 adds support for incremental direct load of outrow lob data and extends the load_mode parameter to include inc as a new value, with a default of insert semantics. When the ignore keyword is specified in an SQL statement, it will use ignore semantics. This feature is now officially released.
Performance optimization for full direct load

In the full direct load scenario, the new version optimizes performance by reducing column type conversion operations during data loading, lowering CPU usage for statistics collection, removing the generation of primary keys on the SELECT side for INSERT INTO SELECT statements, and defaulting to disabling sum skip index (which pre-aggregates SUM values for specified columns within a range in the storage layer). Combined with disabling microblock verification (by setting the micro_block_merge_verify_level parameter to 0), the direct load performance is improved by approximately 20%.
Partitioned export using SELECT INTO OUTFILE

OceanBase Database currently supports exporting data to multiple files using SELECT INTO OUTFILE, but not partitioned export. V4.3.2 introduces the ability to export data by partition, resulting in a clearer directory structure. This feature also allows constructing a partitioned external table from the file directory, improving query efficiency through partition path pruning.

Integration with external data sources

Enhancements to external table functionality

OceanBase Database has long supported CSV external tables, but as AP workloads have expanded, the need to read Parquet-formatted external data sources has become common. V4.3.2 introduces support for Parquet external tables, allowing users to import file data into internal tables or perform cross-source join queries. To ensure the timeliness of files scanned by external tables, the new version includes an automatic refresh feature for file directories. When creating an external table, users can specify the refresh method (manual, real-time, or periodic) using the AUTO_REFRESH option. The DBMS_EXTERNAL_TABLE.REFRESH_ALL_TABLE (with interval int) system package manages scheduled refresh tasks.

Performance optimizations

Performance optimization for AP scenarios

V4.3.2 optimizes strategies for data block prefetching, vectorized batch processing, filter pushdown, comparison calculations, aggregation calculations, DTL shuffle, and monotonicity filters. In scenarios with 100GB of data, the performance of benchmark tests for AP workloads such as TPCH, TPCDS, and ClickBench improves by approximately 10% to 15% compared to V4.3.1.

Data type and storage optimizations

RoaringBitmap

With the rise of big data, enterprises increasingly rely on user data for analysis and insights. RoaringBitmap, known for its space efficiency and computational speed, plays a crucial role in user profiling, personalized recommendations, and precision marketing. OceanBase Database V4.3.2 introduces support for the RoaringBitmap data type in MySQL mode. This feature allows efficient storage and operations on sets of unsigned integers, enhancing performance for large-scale set calculations and deduplication. To meet multidimensional analysis needs, this version supports over 20 expressions for cardinality calculation, set operations, bitmap checks, bitmap construction, bitmap output, and aggregation operations.

Data operation optimizations

Table-level overwrite write

In data warehouses, scenarios such as data refresh, transformation, and cleansing often require data overwrite writes. OceanBase Database V4.3.2 introduces the INSERT OVERWRITE feature, which atomically clears old data and writes new data to a table. Leveraging the full direct load capability, INSERT OVERWRITE also demonstrates high execution performance. V4.3.2 supports the INSERT OVERWRITE tablename SELECT * FROM tablename syntax, but partition-level overwrite write is not yet supported. Partition-level features will be introduced in future versions.

Task scheduling and asynchronous execution

Asynchronous task scheduling

OceanBase Database currently supports various data import commands such as insert overwrite, insert select, create table as, and load data to facilitate real-time data ingestion. However, real-time import requires waiting for the process to complete and cannot be interrupted, which is impractical for large-scale data imports. The new version introduces asynchronous task scheduling based on DBMS_SCHEDULER. Users can create, check the status of, and cancel asynchronous import tasks using commands like submit job, show job status, and cancel job.

V4.3.1

Version information

Release date: May 17, 2024
Version number: V4.3.1 Beta

New features and enhancements

Materialized view

OceanBase Database V4.3.0 supports materialized views, which precompute and store query results to reduce real-time computation and improve query performance. This feature simplifies complex query logic and meets the requirements of AP (Analytical Processing) scenarios. In V4.3.1, the feature is extended to support real-time materialized views, which provide real-time computation capabilities based on data from both materialized views and MLOGs. This is suitable for analytical workloads that require high real-time performance. Additionally, V4.3.1 introduces primary key constraints for materialized views, allowing users to specify a primary key for a materialized view to optimize single-row lookups, range queries, and join scenarios based on the primary key. The incremental update capability for inner-joined materialized views is also extended to improve the refresh performance of materialized views in certain scenarios.

In V4.3.0, when using materialized views, you must manually replace references to the original tables in your business scripts with queries to the corresponding materialized views, which introduces manual rewrite costs. In V4.3.1, the feature is extended to support automatic materialized view rewriting in certain scenarios. When the system variable QUERY_REWRITE_ENABLED is set to True, you can specify ENABLE QUERY REWRITE when creating a materialized view to enable automatic rewriting. In this case, the system can rewrite queries to the original tables into queries to the materialized views, reducing the need for business modifications.

Optimizations for data import and export

Incremental direct load (Experimental)

OceanBase Database V4.1.0 supports direct load, which significantly improves data import efficiency by simplifying the data loading execution path. It skips SQL, transaction, and memtable modules and directly persists data as SSTables. However, in scenarios where tables need to be imported multiple times, each import requires rewriting existing data, which affects incremental import performance. V4.3.1 optimizes incremental direct load to avoid rewriting existing data. Instead, it only processes new data, ensuring high performance for multiple imports, similar to the first import. You can specify whether to use incremental direct load in LOAD DATA and INSERT INTO SELECT statements by using the /*+ direct(need_sort, max_errors_allowed, load_mode)*/ hint. If you do not specify load_mode or set load_mode to full, the system uses the original full direct load method. If you set load_mode to inc_replace, the system uses incremental direct load. This feature is defined as an experimental feature in V4.3.1 and will be extended and evolved into a production-ready feature in future versions.
Performance optimization for SELECT INTO OUTFILE

The existing SELECT INTO OUTFILE feature supports parallel data reading but only allows sequential writing to external files, which creates a performance bottleneck for data export. V4.3.1 introduces parallel export capabilities. In addition to the SELECT INTO OUTFILE command, it provides the single and max_file_size options to control how data is written to external files. The single option allows you to export data to a single file or multiple files. If you specify a parallelism greater than 1 and set single to false, the system exports data to multiple files, achieving parallel read and write. The max_file_size option allows you to control the size of the exported files.

Index enhancements

Full-text indexes (MySQL mode) (Experimental)

In relational databases, indexes are typically used to accelerate queries based on exact value matching. However, standard B-Tree indexes are not suitable for scenarios involving large amounts of text data that require fuzzy searches. In such cases, full-table scans are necessary to perform fuzzy queries on each row of data, which can be inefficient when the text is large and the data volume is high. Additionally, complex query scenarios such as approximate matching and relevance sorting are difficult to support by rewriting SQL statements.

To address these issues, OceanBase Database V4.3.1 introduces full-text indexes. This feature preprocesses text content and builds keyword indexes to significantly improve full-text search efficiency. The feature is defined as an experimental feature in V4.3.1 and will be extended and evolved into a production-ready feature in future versions. This version includes the following features:
- Full-text indexes are supported in MySQL mode and are compatible with basic MySQL syntax.
- You can create full-text indexes in advance for CHAR, VARCHAR, and TEXT columns when creating tables.
- Full-text indexes are applicable to partitioned tables.
- You can create multiple full-text indexes for the main table.
- The feature supports three built-in tokenizers: SPACE (space), NGRAM, and BENG (basic English).
- You can perform full-text searches on multiple columns using a single MATCH AGAINST clause.
- The feature supports the NATURAL LANGUAGE MODE.

Partition management

Partition exchange

Over time, tables may accumulate a large amount of historical data that is not frequently accessed but needs to be retained for compliance or historical analysis. To improve query performance, businesses often need to distinguish between active and inactive data and archive the inactive data. Migrating data to a new table using SQL can address this scenario, but performance is often suboptimal for large data volumes. Therefore, OceanBase Database V4.3.1 introduces the partition exchange feature. This feature allows you to move data from table A to a partition of table B by modifying the definitions of partitions and tables in the data dictionary without physically copying the data. This nearly instantaneous data migration significantly improves performance. This version includes the following features:
- You can exchange data between a partition of a partitioned table and a non-partitioned table.
- You can create a partitioned table with RANGE (RANGE COLUMNS) partitions.
- You can create a partitioned table with RANGE (RANGE COLUMNS) partitions at both the primary and subpartition levels.
- You can specify the including indexes option, which allows local indexes to be exchanged along with the data and remain available after the exchange.
- You can specify the without validation option, which requires you to ensure that the data conforms to the partitioning key range.

Integration with external data sources

Partitioned external tables

OceanBase Database V4.2.0 supports external tables, which are limited to non-partitioned tables. In scenarios where a large number of files exist but a query only needs to scan a subset of them, non-partitioned external tables can only scan all files, leading to poor performance. V4.3.1 introduces the partitioned external table feature, which supports list partitioning similar to regular tables. It provides both automatic and manual partitioning options. When automatic partitioning is specified, the system groups files into partitions based on the defined partitioning key. When manual partitioning is specified, you need to specify the subpaths for each partition. In this case, external table queries can leverage partition pruning based on the partitioning conditions, reducing the number of scanned files and significantly improving query performance.

Data types and storage optimization

MySQL JSON multi-value index (Experimental)

MySQL 8.0 introduced the multi-value index feature, which is applicable to JSON documents and other collection data types. This feature allows users to build indexes on arrays or collections to efficiently retrieve elements. OceanBase Database V4.3.1 is compatible with the JSON multi-value index feature in MySQL mode. This allows users to create efficient secondary indexes on JSON array fields containing multiple elements, enhancing the query capabilities of complex JSON data structures while maintaining the flexibility of the data model and high query performance. This feature is defined as experimental in V4.3.1 and will be expanded and evolved into a production-ready feature in subsequent versions. The following features are included in this version:
- Supports pre-creation of multi-value indexes and composite multi-value indexes.
- Supports unique and non-unique multi-value indexes.
- Supports creating multi-value indexes with JSON array elements of type INT, UINT, DOUBLE, FLOAT, and CHAR.
- Applicable to partitioned tables.
- Supports using multi-value indexes with the MEMBER_OF(), JSON_CONTAINS(), and JSON_OVERLAPS() functions.
MySQL JSON partial update

Some users store business data in JSON documents. When updating a document, they currently need to read the entire document and perform a full update, which can be inefficient for large documents. V4.3.1 introduces the JSON partial update feature. When users update specific fields in a JSON document using expressions like json_set, json_replace, or json_remove, they can only update the modified fields without needing to perform a full overwrite. This improves update performance. To use this feature, you need to enable it using the log_row_value_options parameter.

Query execution and resource management

SQL temporary result compression

When the amount of data involved in SQL execution is excessively large, memory may become insufficient, and some operators need to materialize temporary intermediate results. If the materialized data exceeds the available disk space, SQL execution will fail. V4.3.1 introduces the SQL temporary result compression feature. This allows users to specify whether to compress temporary results and the compression algorithm using tenant-level parameters like spill_compression_codec or SQL-level hints (e.g., /*+opt_param('spill_compression_codec', 'lz4') */). Specifying temporary result compression effectively reduces the temporary disk space usage, supporting larger-scale query tasks.

V4.3.0 Beta

Version information

Release date: March 22, 2024
Version number: V4.3.0 Beta

New features

Columnar storage engine

In scenarios involving complex analysis of large-scale data or ad-hoc queries on massive data, columnar storage is a key capability of an AP database. Columnar storage is a data organization method that differs from row-based storage by arranging table data physically by columns. When data is stored in a columnar format, analysis scenarios can scan only the columns required for query computation, avoiding full-row scans and reducing resource usage such as I/O and memory, thereby improving computational speed. Additionally, columnar storage inherently offers better conditions for data compression, making it easier to achieve high compression ratios, which reduces storage space and network transmission bandwidth.

However, common columnar storage engines often assume minimal random updates to maintain static data organization. When faced with frequent random updates, performance issues are inevitable. The OceanBase LSM-Tree architecture addresses this by separately handling baseline and incremental data. Therefore, V4.3.0 extends support for columnar storage engines within the existing architecture, achieving integrated storage of columnar and row-based data on a single codebase, architecture, and OBServer node. This approach balances TP and AP query performance.

To facilitate AP business migration and ensure smooth use of the new version, multiple modules, including the optimizer, executor, DDL, and transaction processing, have been adapted and optimized around the columnar storage engine. This includes a new cost model based on columnar storage, a vectorized engine, extended and enhanced query pushdown functionality, Skip Index, new columnar encoding algorithms, and adaptive compaction.

Users can flexibly configure tables as row-based, columnar, or hybrid (row-column redundant) based on their workload type.
New vectorized engine

OceanBase has already implemented a vectorized engine based on the Uniform data description method in earlier versions, significantly improving performance compared to non-vectorized engines. However, in deep AP scenarios, there are still some performance limitations. V4.3.0 introduces the 2.0 version of the vectorized engine, which uses the Column data format description to avoid the memory usage, serialization, and read/write access overheads associated with ObDatum. Based on this data format description, the new version has also re-implemented a batch of commonly used operators and expressions, including over 10 operators such as HashJoin, AGGR, HashGroupBy, and Exchange(DTL Shuffle), as well as over 20 MySQL expressions for relational, logical, and arithmetic operations. In subsequent V4.3.x versions, the new vectorized engine will continue to be used to supplement and enhance the implementation of other operators and expressions to achieve optimal performance in AP scenarios.
Materialized views

V4.3.0 introduces the Materialized View feature. Materialized views are a key feature that supports AP business by precomputing and storing query results, reducing real-time computation to improve query performance and simplify complex query logic. They are commonly used in scenarios requiring rapid report generation and data analysis.

Since materialized views require storing query result sets to optimize query performance, and there is a data dependency between materialized views and base tables, whenever the data in the base table changes, the data in the materialized view must be updated to stay synchronized. Therefore, the new version introduces a materialized view refresh mechanism, including full refresh and incremental refresh strategies. A full refresh is a straightforward approach where, each time a refresh operation is executed, the system re-executes the corresponding query statement of the materialized view to completely compute and overwrite the original view result data. This approach is suitable for scenarios with relatively small data volumes. In contrast, an incremental refresh only processes the data that has changed since the last refresh. To achieve precise incremental refreshes, OceanBase has implemented a feature similar to Oracle's MLOG (Materialized View Log), which tracks and records incremental updates to the base table through logs, ensuring that materialized views can perform fast incremental refreshes. The incremental refresh approach is particularly suitable for scenarios with large data volumes and frequent changes.

OceanBase

Customer Stories

Documentation

AP feature version history

V4.4.2

Version information

New features and enhancements

V4.4.1

Version information

New features and enhancements

V4.4.0 Beta

Version information

New features and enhancements

V4.3.5 BP5

Version information

New features and enhancements

V4.3.5 BP4

Version information

New features and enhancements

V4.3.5 BP3

Version information

New features and enhancements

Optimizations for external data sources and integrations

Enhanced materialized views

Optimizations for storage and partition management

Optimizer and query performance improvements

Accelerated data import

V4.3.5 BP2

Version information

Product forms

Feature enhancements

Enhanced data recovery capabilities

Performance optimizations

Enhanced materialized views

Partition management

Indexes and query optimization

Data types and storage optimization

Query execution

External data source integration

Data import

V4.3.5 BP1

Version information

New features and enhancements

Data import and export

External data source integration

Index optimization

Materialized view enhancements

Table structure and storage optimization

Data type and storage optimization

Parallel query optimization

Query optimization and pushdown

Columnar storage optimization

Behavior changes

V4.3.5

Version information

New features and enhancements

Materialized view enhancements

Data import and export

Index enhancements

External data source integration

Columnar storage optimization

Performance optimization

Data type and storage optimization

V4.3.4

Version information

New features and enhancements

Storage architecture optimization

Indexing capabilities

Data import optimization

Data types and storage optimization

V4.3.3

Version information

New features and enhancements

Columnar storage optimization

Enhanced materialized views

Optimized data import and export

Data type and storage optimizations

Resource management and isolation

V4.3.2

Version information