This topic lists only the core capability changes related to real-time analytics. For more information about the full set of capabilities and iterations of OceanBase Database, see Release Note.
V4.3.5 BP5
Version information
- Release date: November 17, 2025
- Version number: V4.3.5 BP5
New features and enhancements
Materialized view functionality enhancement
- Supports incremental refresh for materialized views with LEFT JOIN.
- Supports nested materialized views based on incrementally materialized views without a primary key.
- Supports setting parameters of MIN/MAX aggregate functions in incremental materialized views to non-basic columns.
- Supports incremental materialized views with multiple tables connected that reference views declared as dimension tables (
AS OF PROCTIME()). - Supports full refresh materialized views by referencing UDF.
- Supports minimal refresh mode (incremental refresh DML only reads and writes the necessary data of the column store table) for incremental materialized views.
- Supports materialized views to refresh using the session variables that are set to specific values.
Delete-Insert table incremental data query supports Skip Index
V4.3.5 BP5 introduced the Skip Index for pre-generated incremental data in the Delete-Insert table. When querying incremental data, the Skip Index allows for pre-trimming of data, significantly improving query efficiency. This capability is controlled by the tenant-level configuration item default_skip_index_level, which determines the scope of Skip Index coverage.
Partition exchange functionality extended
Prior to V4.3.5 BP5, a range/range columns partitioned table can be swapped with a non-partitioned table. Additionally, you can exchange a range/range columns partition of a subpartitioned table with a non-partitioned table. You can also exchange the primary partitions of a range/range columns partitioned table with those of a non-partitioned table. In addition to these features, V4.3.5 BP5 supports exchanging a list/list columns partition of a list/list columns partitioned table with that of a non-partitioned table.
V4.3.5 BP4
Version information
- Release date: September 10, 2025
- Version number: V4.3.5 BP4
New features and enhancements
Enhanced materialized view capabilities
V4.3.5 BP4 continues to enhance materialized view capabilities. Functionally, it supports incremental refreshes of materialized views without refreshing dimension tables. When creating a materialized view, you can use the new AS OF PROCTIME() syntax to specify tables that do not need to be refreshed. During incremental refreshes, the incremental data of these tables will not be refreshed. Single-table aggregation incremental refresh materialized views now support the MIN() and MAX() aggregate functions. In terms of usability, it supports automated management of MLOG tables: when creating an incremental refresh materialized view, the system automatically creates or replaces the necessary MLOG tables for the base table and periodically cleans up redundant MLOG tables in the background. The view content and error messages for materialized views have also been optimized.
Disabled full bypass import by default for AP parameter templates.
V4.3.5 BP3
Version information
- Release date: July 21, 2025
- Version number: V4.3.5 BP3
New features and enhancements
Optimized external data sources and integrations
ODPS external table support for Storage API
In previous versions, external tables based on the Tunnel API required an independent session for each partition during execution, leading to significant delays. The new version supports the ODPS Storage API, which provides direct access to the underlying storage, offering better read performance and more optimization strategies.
Support for Azure Blob storage
The new version supports accessing Azure Blob storage using the Azure Blob protocol.
Enhanced materialized views
Enhanced materialized view capabilities
The new version continues to improve materialized view features, including support for renaming materialized views, incremental refreshes for outer joins, UNION ALL, and non-aggregated single-table operations, as well as adding columns to materialized view logs. It also supports cascading refreshes for nested materialized views and fixes issues like materialized view drop failures, enhancing overall stability.
Optimized storage and partition management
Support for primary and secondary partition table swaps
The new version allows for swapping primary partitions of secondary partition tables with primary partitions of primary partition tables.
Optimized optimizer and query performance
Enhanced statistics collection
In previous versions, statistics collection required sampling to estimate data distribution, leading to inaccuracies. The new version leverages the storage layer's Skip Index feature to collect more accurate column-level statistics for all columns in columnar storage. It also introduces progressive statistics collection for partitions to address timeouts caused by long collection times for large partitioned tables, improving system stability and data real-time performance.
Accelerated data import
Optimized direct load performance
The new version optimizes the execution logic of direct loads, including sorting algorithms, vectorized paths, CSV parsing, column conversion expressions, and character set encoding. In non-partitioned heap tables, this results in up to a 10% performance improvement.
V4.3.5 BP2
Version information
- Release date: May 15, 2025
- Version number: V4.3.5 BP2
Product form
- Shared-Storage AP
Starting from V4.3.5 BP2, OceanBase Database in Shared-Storage deployment mode can be used for AP workloads. Shared-Storage AP supports key AP features in Shared-Nothing mode, such as columnstore tables, incremental direct load, materialized views, full-text indexes, and multi-value indexes. It also optimizes IO read methods in Shared-Storage mode. Shared-Storage AP is recommended for AP workloads with data cold-warm characteristics that aim to reduce storage costs.
Feature enhancements
Enhanced data recovery
Support for restoring columnstore tables
Previously, the table-level restore feature only supported restoring to rowstore tables. In this version, the table-level restore feature is enhanced in Shared-Nothing mode to support restoring to columnstore tables and hybrid rowstore-columnstore tables.
Performance optimization
Optimized performance of direct load for heap tables
The uniqueness check process in the unique index creation phase of direct load for heap tables is optimized. The performance of full direct load for non-partitioned heap tables is not lower than that of index-organized tables with primary keys.
Optimized performance of hard parsing
In some AP scenarios or for some large accounts, disabling the plan cache and using hard parsing can reduce the use of suboptimal plans. However, this requires higher performance from hard parsing. This version optimizes the resolver stage by improving hot functions, replacing global cost validation with local cost validation in non-essential scenarios, reducing expression memory usage, and eliminating unnecessary expression type inference operations. These optimizations enhance the performance of the optimizer in resolve and rewrite stages.
Enhanced materialized views
Support for creating materialized views based on external tables
Previously, OceanBase Database supported creating materialized views based on user tables and materialized views. This version adds support for creating materialized views based on external tables in full refresh scenarios, expanding the applicability of materialized views.
Enhanced diagnostics for materialized views
New system views
CDB/DBA_MVIEW_RUNNING_JOBSandDBA_MVIEW_DEPSare added to display running materialized view tasks (such as refresh tasks and MLOG cleanup tasks) and dependency objects of materialized views, respectively. TheCDB/DBA/ALL/USER_MVIEWSsystem views now record materialized view refresh points and refresh delays. TheCDB/DBA/ALL/USER_MVIEW_LOGSviews now include parallelism and time consumption information for MLOG cleanup.Resource isolation for materialized view tasks
Refreshing and purging materialized views consume system resources, which may affect the performance of foreground tasks. This version provides resource isolation capabilities for incremental refreshes and MLOG purges of materialized views using Resource Manager. Users can set resource limits for materialized view tasks.
Partition management
Dynamic partition management
Businesses often use date or time as the range partition key to divide data into different partitions based on time periods. DBAs need to pre-create future partitions regularly to accommodate data writes. If DBAs forget to pre-create partitions, data may fail to write due to non-existent partitions, affecting business operations. This version introduces dynamic partition management, which automatically pre-creates future partitions at the kernel level when specified for a table. It also supports automatic deletion of obsolete partitions no longer needed by the business, saving storage space.
Indexes and query optimization
NGRAM2 tokenizer for full-text indexes
A built-in
NGRAM2tokenizer is added for full-text indexes, supportingNGRAMtokenization within themin_ngram_sizetomax_ngram_sizerange. TheNGRAM2tokenizer is suitable for scenarios where performance and storage space are relatively insensitive, and different lengths oftokenretrieval are required. For fixed-lengthtoken, theNGRAMtokenizer can be used.
Data types and storage optimization
Map data type
The Map data type stores unordered key-value pairs, such as {a:1, b:2, c:3}. Common use cases include saving configuration options, user attributes, and product information. OceanBase Database implements the Map data type based on the Array framework in V4.3.5 BP2, supporting Map constructors and functions and operators such as map_keys, map_values, =, and !=.
MySQL mode JSON semi-structured storage
Currently, semi-structured data is stored as binary strings, making it difficult to encode and compress using Encoding. Each data entry redundantly stores some data metadata for self-explanation, resulting in low compression rates. JSON is a typical semi-structured data type with strong structural characteristics. Theoretically, semi-structured data can be logically split into multiple base-type columns, with the non-decomposable parts stored in a Binary column. Structured columns can leverage Encoding to improve compression rates, while non-decomposable parts, by extracting data metadata, reduce space usage compared to before. This is referred to as "structured storage of semi-structured columns." OceanBase Database supports semi-structured storage of JSON data in V4.3.5 BP2, implementing corresponding encoding methods to reduce storage space and optimize query filtering performance for JSON subcolumns. However, since JSON data is stored in multiple subcolumns, frequent reads of original JSON data may introduce merge overhead, leading to performance degradation. This feature should be enabled based on business requirements.
Query execution
Adaptive plan cache
Plan Cache is crucial for TP scenarios and is enabled by default. In some AP scenarios, plan execution time can be much longer than plan generation time. In such cases, generating a new plan may yield a better execution plan than reusing an existing one, improving execution performance. In hybrid HTAP scenarios, it's necessary to provide a way for AP-oriented SQL to use hard parsing while allowing TP-oriented SQL to reuse plans to avoid hard parsing overhead. This version introduces adaptive plan cache capabilities, allowing OceanBase Database to decide whether to use the plan cache for a SQL statement based on its execution time and distribution characteristics.
External data source integration
MySQL mode ODPS (MaxCompute) Catalog
To facilitate querying data stored in various external data sources, this version supports the External Catalog framework and introduces the ODPS Catalog feature. Existing internal objects belong to the Internal Catalog. Users can create External Catalogs to connect to external data sources and retrieve metadata. Direct querying of tables in External Catalogs is supported, such as
SELECT col1 FROM catalog1.database1.table1;. Cross-catalog table joins are also supported when permissions are met. This eliminates the need for data imports or creating individual external tables, improving usability.
Data import
Load Data import with URL CSV external table error tolerance mode
Previously, Load Data would directly report errors when encountering incompatible data types or precision mismatches. This version introduces the
LOG ERRORSinstruction for error-tolerant imports. In MySQL mode, failed rows are recorded, and users can view error data usingSHOW WARNINGSfor diagnostics.
V4.3.5 BP1
Version information
- Release date: March 18, 2025
- Version number: V4.3.5 BP1
New features and enhancements
Data import and export
Incremental direct load supports heap tables with unique indexes
Direct load is divided into full direct load and incremental direct load. When data already exists in the table, incremental direct load performs better than full direct load. However, incremental direct load previously did not support importing data into heap tables with unique indexes. The new version supports importing data into heap tables with a local unique index using incremental direct load.
Direct load supports HASH partitioning and concurrent partition import
Starting from V4.3.5 BP1, the system supports HASH partition-level direct load. Additionally, incremental direct load now supports concurrent partition import. However, if multiple import tasks have overlapping partitions, concurrent partition import is not supported.
Optimized direct load performance
Through vectorization optimization, streamlined temporary file processes, and improved calculation performance in each stage, the import performance of the clickbench table has improved by 17% compared to V4.3.5.
External data source integration
Support for reading HDFS files from external tables
HDFS is the most common storage medium in a data lake architecture and serves as a platform for multi-engine data sharing. Therefore, the new version of OceanBase Database supports directly reading files from HDFS through external tables. It also supports connecting to HDFS clusters with Kerberos authentication through relevant configurations.
Support for reading and writing data from and to MaxCompute (ODPS)
The new version supports reading data from MaxCompute through external tables or URL external tables. It also supports writing data from OceanBase Database back to MaxCompute.
URL external tables
When accessing external data sources in OceanBase Database, you typically need to create an external table and then use SQL queries. The new version introduces URL external tables, allowing you to directly query external CSV, Parquet, ORC, and MaxCompute data using SELECT statements without creating an external table. Additionally, the new version extends the LOAD DATA syntax to directly reference URL external tables for data import, reducing the operational cost of data import.
Index optimization
Enhancements to full-text indexing
OceanBase Database has supported full-text indexing in MySQL mode since V4.3.1. As the version evolved, additional features and extensions were gradually added. V4.3.5 BP1 introduces an internal Chinese
IKtokenizer that supportssmartandmax_wordtokenization modes. It also allows adding new tokenizers as plugins. The new version supports thePARSER_PROPERTIESconfiguration at the table level for full-text indexes. In addition to the existing natural language mode, the new version supports the boolean mode, which can be enabled using theIN BOOLEAN MODEkeyword inMATCH AGAINSTstatements. Boolean operators such as+,-,(), and no operator, as well as nested operations, are supported. The new version also supports full-text indexes participating in index merge, improving query performance through index union merge. It optimizes the update and deletion scenarios for full-text indexes on the main table, enhancing the performance of updates and deletions.
Materialized view enhancements
Enhancements to materialized views
The parallelism of materialized view refreshes is closely related to the efficiency of incremental and full refreshes. The new version supports a complete parallelism control mechanism, allowing you to set the refresh parallelism using the
mview_refresh_dopsystem variable, theDBMS_MVIEW.refreshsubprogram, or thePARALLELoption when creating a materialized view. The new version supports theLOBdata type for incremental materialized views and provides the ability to directly modify materialized views or materialized view attributes using theALTER MATERIALIZED VIEWandALTER MATERIALIZED VIEW LOGcommands. You can modify the parallelism, background refresh task cycle, MLOG table parallelism, MLOG table background cleanup task cycle, and the LOB inline storage length threshold for MLOGs. Additionally, when you create an MLOG on the base table, you cannot add columns to the base table. You must first delete the MLOG and then perform DDL operations, which is complex. The new version supports adding columns to the MLOG base table, including adding columns at the end and in the middle. The new version also improves the performance of MLOG cleanup and optimizes the error messages for incremental refreshes.
Table structure and storage optimization
Heap table organization mode
OceanBase Database uses a clustered index table model, which performs well in OLTP environments. It stores the primary key and main table data in the same table, optimizing the speed of primary key queries and the performance of uniqueness checks. However, this model has limitations in OLAP scenarios that require efficient data import and complex data analysis. The import process needs to sort the entire dataset, and queries must perform primary key row fusion, both of which impact performance. The new version introduces the heap table organization mode, where the primary key is used for uniqueness constraints, and queries rely on the main table. When user data is sorted by time, the skip index can be more effectively utilized to improve query efficiency. By decoupling the primary key from the data, the import process no longer needs to sort the main table data, thereby enhancing the performance of data import.
Data type and storage optimization
New String data type in MySQL mode
Some AP applications have a high tolerance for column length and often use variable-length string types to store data, which also need to be used as primary keys or index keys. In OceanBase Database, the char and varchar types can store strings and be used as primary keys or index keys, but they require a specified length. The lob types such as mediumtext and text can store variable-length strings but cannot be used as primary keys or index keys. To better meet the needs of AP applications, the new String data type is introduced, which does not require a specified length and defaults to a maximum length of 16 MB. If the column length is less than 16 KB and does not exceed the lob_inrow_threshold, it can be used as a primary key or index key.
Enhancements to ARRAY type-related functions
OceanBase Database supported the ARRAY data type and some common functions and operators starting from V4.3.3. The new version further enhances these functions, adding support for
array_prepend,array_concat,array_compact,array_filter,array_sort,array_sortby,array_length,array_range,array_sum,array_first,array_difference,array_min,array_max,array_avg,array_position,array_slice, andreverse.
Parallel query optimization
Decoupling of PX computing nodes and data nodes
In the current system architecture, storage resources and computing resources are highly coupled. The existing PX scheduling mechanism for Non-Leaf DFOs only selects machines where data is stored for computing task allocation. This tight coupling limits the full utilization of machine resources in certain scenarios. To enable large SQL queries to leverage more computing resources, the new version supports decoupling PX computing nodes from data nodes. You can use configuration parameters or hints (PX_NODE_POLICY) to specify the candidate resource pools for Non-Leaf DFOs. Additionally, hints (PX_NODE_ADDRS and PX_NODE_COUNT) allow you to forcibly specify the specific machines or the number of machines to which Non-Leaf DFOs are allocated.
Query optimization and pushdown
Support for query pushdown in rowstore data column addition scenarios
During rowstore SCAN, if there is no intersection between the data, that is, no rowid is the same as in the major sstable and other places, the SCAN is pushed down to the storage layer. When new columns are added after the merge, the SCAN can theoretically be pushed down, but if the parsing of the new columns fails, the pushdown will fail, and it will fall back to the less efficient row-by-row reading. The new version optimizes this scenario, supporting query pushdown to the storage layer to improve query performance.
Columnstore storage optimization
Optimizations for columnstore replicas
The columnstore replica feature introduced in V4.3.3 had some limitations, such as supporting only a single C replica, not allowing direct full import of columnstore data to the C replica, and lacking user-friendliness. The new version addresses these limitations by allowing multiple C replicas to be specified in a cluster and supporting direct full import of columnstore data to the C replica. It also introduces the system view
CDB/DBA_OB_CS_REPLICA_STATSfor displaying the progress of columnstore replica conversion at the partition and log stream levels, making it easier for users to monitor the status of columnstore replicas.
Behavior changes
Disable NLJ in AP templates
After the optimizer generates a nested loop join plan, the plan is sensitive to the number of rows in the driving table. If the estimated number of rows in the driving table is too small, the actual execution performance of the plan may significantly decrease. In OLAP scenarios, the data volume is generally large, and the benefits of using a nested loop join plan are limited. Therefore, in AP scenarios, the default plan is changed to a hash join plan. Nested loop join plans are only generated in non-equality join scenarios.
V4.3.5
Version information
- Release date: December 31, 2024
- Version number: V4.3.5
New features and enhancements
Materialized view enhancements
Support for nested materialized views
In versions earlier than V4.3.5, materialized views could only be created on regular user tables. In data warehouse scenarios, materialized views are used for data processing. To support lightweight real-time data warehouses, V4.3.5 allows the creation of new materialized views based on existing materialized views, known as nested materialized views. Nested materialized views support the same refresh methods as non-nested ones, including full and incremental refreshes. Therefore, V4.3.5 supports creating materialized view logs based on materialized views. The freshness of data in nested materialized views depends on the base table's data freshness. To ensure the freshness of upper-level materialized views, users must refresh the underlying materialized views first.
For more information, see the section on creating nested materialized views in Create a materialized view (MySQL mode) or Create a materialized view (Oracle mode).
Data import and export
Support for specifying partitions for direct load
Before V4.3.5, OBServer only supported full-table direct load for accelerating data import. If users wanted to import data from only specific partitions, they had to use partition exchange. This involved importing partition data to a non-partitioned table via full direct load and then exchanging the non-partitioned table with the target partition. This process was cumbersome. To better support partition-level data imports, V4.3.5 supports specifying partitions for direct load using the
LOAD DATAandINSERT INTO SELECTsyntax. However, it does not support specifying the last-level partition type as Hash/Key.For more information, see Full direct load or Incremental direct load.
Enhancements to the
select into outfilefeatureV4.3.5 enhances the
select intoexport feature by supporting the export of Parquet and ORC types, allowing the specification of compression algorithms when exporting csv types, and adding new syntax.- Use the format syntax to set export options.
- Specify a compression algorithm when exporting csv files.
- Export Parquet files.
- Export ORC files.
Index enhancements
Enhancements to full-text indexing
V4.3.5 continues to enhance the full-text indexing feature based on V4.3.4, including:
- Support for creating full-text indexes using
CREATE FULLTEXT INDEXorALTER TABLE ADD FULLTEXT INDEX. - Support for cost-based full-text index plan selection, allowing the selection of the least costly index for scanning when multiple filters with
MATCH AGAINSTexist in an SQL statement. - Support for Functional Lookup:
- Support for multiple
MATCH AGAINSTexpressions in query statements. - Support for scanning other indexes while using full-text indexes.
- Support for outputting
MATCH AGAINSTexpressions without filter semantics as projected columns. - Support for using <=/< symbols to filter
MATCH AGAINSTexpressions with filter semantics. - Support for linking
MATCH AGAINSTexpressions with filter semantics using AND/OR logic with other filter semantics.
- Support for multiple
For more information, see MATCH AGAINST.
- Support for creating full-text indexes using
External data source integration
Enhancements to external tables
In V4.3.5, external tables support reading ORC format files.
For more information, see Create an external table (MySQL mode) or Create an external table (Oracle mode).
Columnar storage optimization
- Read-only columnar replicas
The read-only columnar replica feature introduced in V4.3.3 was an experimental feature. After iterations in V4.3.4 and V4.3.5, this feature has now met the GA requirements in V4.3.5.
Support for asynchronous online DDL during row-to-column conversion
V4.3.0 introduced three storage formats: rowstore, pure columnar, and hybrid row-column. Converting a rowstore table to a hybrid or columnar table was an offline DDL operation. To reduce the impact of offline DDL on customers, V4.3.5 supports asynchronous online DDL for row-to-column conversion. By specifying the
delayedkeyword during DDL execution, users can modify the table schema in real-time without blocking data writes. The columnar baseline data is then asynchronously reorganized during baseline data compaction. V4.3.5 supports asynchronous online DDL for converting rowstore tables to columnar or hybrid row-column tables.
Performance optimization
Vectorization in direct load write paths
Before V4.3.5, direct load did not use vectorization, and write functions processed data row by row, resulting in significant function call overhead. Additionally, writing to sstables was a background process, and some operations lacked specific optimizations. V4.3.5 introduces vectorization in the direct load path and optimizes columnar encoding to enhance direct load performance. Testing shows that direct load performance for tables without a primary key can be improved by up to 2x.
Data type and storage optimization
Support for complex DML with JSON multi-value indexes
V4.3.5 enhances support for complex DML statements with JSON multi-value indexes. It now supports multi-value predicates in
updateanddeletestatements and allows complex DML operations involving multi-value index lookups.Support for more array expressions
OceanBase Database introduced the ARRAY data type in V4.3.3. To better support business use of the ARRAY type, V4.3.5 supports more array expressions to meet diverse business needs. New expressions include
array_append,array_distinct,arrayMap,array_remove,cardinality,element_at,array_contains_all,array_overlaps,array_to_string,array_agg,unnest, andrb_build.
V4.3.4
Version information
- Release date: October 31, 2024
- Version number: V4.3.4
New features and enhancements
Storage architecture optimization
Shared storage (experimental feature)
To provide more cost-effective database services in public cloud environments, OceanBase Database V4.3.4 introduces the Shared-Storage Cluster architecture based on general-purpose object storage. This architecture allows multiple tenants to share a baseline dataset and log data stored on standard object storage. Hot data is stored on cloud disks or local SSDs, while cold data is stored on standard object storage. This approach reduces storage costs. Additionally, since each tenant's replicas only cache hot data and logs on local storage, the compute nodes can be quickly scaled up or down. The shared storage architecture is suitable for TP workloads with distinct hot and cold data, such as historical and backup databases, as well as key-value and time-series workloads, where latency is not a critical concern.
In V4.3.4, this feature is defined as an experimental feature and will be further refined in subsequent versions to become a production-ready feature.
Indexing capabilities
Enhanced full-text indexing
OceanBase Database has supported full-text indexing since version 4.3.1. This feature preprocesses text content and builds keyword indexes to improve the efficiency of full-text searches. However, in complex DML scenarios involving full-text indexes, there are still some usage limitations, which can be inconvenient for businesses. To better support business development, version 4.3.4 enhances the complex DML functionality. It now supports complex DML operations such as
INSERT INTO ON DUPLICATE KEY,REPLACE INTO, multi-table updates and deletions, and updatable views for tables with full-text indexes. Additionally, the new version optimizes full-text search performance using vector execution and extended TAAT processing strategies. It also removes the token number limit for theMATCH AGAINSTpredicate, allowing full-text indexes to be created on partitioned tables without a primary key. Furthermore, the new version supports using thetokenizefunction to view tokenization results, which aids in debugging tokenization systems.For more information, see Create a multi-value index.
Enhanced multi-value indexing
Currently, multi-value indexing only supports pre-created indexes, and there are some usage limitations for DML and DDL operations. To better support business usage, V4.3.4 relaxes some of these limitations and enhances the feature. The main enhancement is the support for complex DML operations, including
INSERT INTO ON DUPLICATE KEY,REPLACE INTO, multi-table updates/deletions, and updatable views.
Data import optimization
Support for non-unique local indexes in incremental direct load
OceanBase Database supports both full direct load and incremental direct load. When data already exists in the table, incremental direct load performs better than full direct load. Previously, incremental direct load did not support importing data into tables with indexes. In V4.3.4, incremental direct load now supports importing data into tables with non-unique local indexes.
Optimized direct load performance
During direct load, if a table with a primary key requires sorting, the data is first written to a temporary file, then loaded into memory for sorting and merging, and finally written to an sstable. This approach divides the data import into two stages: data writing and sorting/merging. If the data is not fully written, the sorting/merging stage cannot begin. If the write I/O reaches a bottleneck, it indirectly limits the performance of direct load. In V4.3.4, the data import method is optimized. Instead of writing data to a temporary file, it is directly written to memory for subsequent sorting and merging. This optimization ensures that data import and sorting are performed in a pipeline manner, avoiding the impact of one stage on the next. In the primary key table scenario, the overall performance improves by 35% for 1TB of data. The larger the memory and CPU, the greater the performance improvement. This optimization is particularly suitable for large-scale tenants. For small-scale tenants, the performance does not significantly degrade.
Data types and storage optimization
Enhanced bitmap functionality
A bitmap, also known as a bitset, is a data structure used to efficiently store and process whether a particular element exists in a set. OceanBase Database V4.3.2 and V4.3.3 successively supported some features of RoringBitmap. V4.3.4 continues to enhance this functionality by adding the
rb_iterateandrb_selectexpressions. Therb_iterateexpression expands a roaringbitmap type into multiple rows based on the number of elements, while therb_selectexpression filters elements in a roaringbitmap type based on conditions and returns a new roaringbitmap type.
V4.3.3
Version information
- Release date: September 30, 2024
- Version number: V4.3.3
New features and enhancements
Columnar storage optimization
Read-only columnar replicas (experimental feature)
OceanBase Database V4.3.0 introduced columnar storage. You can create a columnstore table, a rowstore table, or a hybrid table based on your business requirements. The replica mode is consistent across zones of a tenant. For example, if the unit distribution of tenant X is 1:1:1 and table T1 is a hybrid table, then T1 has one rowstore replica and one columnstore replica in each of the three zones. To meet the physical isolation requirements of TP and AP resources in HTAP scenarios, V4.3.3 introduces a new deployment mode that allows you to add a dedicated zone for read-only columnstore replicas (Column Store Replica, abbreviated as C replica). In this zone, all user tables are stored in columnar format. AP applications can use an independent ODP and set the
ob_route_policysystem variable toCOLUMN_STORE_ONLYat the session level to query columnstore replicas for weak-consistency reads without affecting TP applications. Additionally, similar to a 3+1 zone deployment, this mode also reduces storage overhead compared to hybrid tables. It is recommended to use ODP V4.3.2 or later. For more information, see Columnstore replicas.V4.3.3 defines this feature as experimental and will continue to refine it in future versions.
Enhanced materialized views
Enhanced materialized view capabilities
To reduce manual rewrite efforts, OceanBase Database V4.3.1 introduced materialized view rewrite capabilities. When the
QUERY_REWRITE_ENABLEDsystem variable is set toTrue, you can enable automatic rewrite by specifyingENABLE QUERY REWRITEwhen creating a materialized view. This allows the system to rewrite queries against the original table to queries against the materialized view. V4.3.1 supports rewrite in cases whereFROMclauses fully match andWHEREclauses partially match for non-aggregate materialized views. V4.3.3 further extends support to include rewrite for non-aggregate materialized views withFROMclause joins, queries involving tables not present in the materialized view, and aggregate materialized views, including aggregation rollup.Additionally, the new version expands the types of SQL supported for incremental refreshes and real-time materialized views. Earlier versions supported incremental refreshes and real-time queries for single-table aggregation and multi-table joins. V4.3.3 now adds support for join aggregation scenarios. For more information, see Materialized view query rewrite (MySQL mode) and Materialized view query rewrite (Oracle mode).
Furthermore, earlier versions of OceanBase Database only supported rowstore materialized views. V4.3.3 introduces support for columnstore materialized views, which can improve query performance in complex analytical scenarios involving materialized view references. For more information, see Create a materialized view (MySQL mode) and Create a materialized view (Oracle mode).
Optimized data import and export
Improved performance for external table imports
V4.3.3 optimizes the performance of reading data from external tables during direct load, achieving approximately a 15% improvement over the previous version.
Enhanced INSERT OVERWRITE capabilities
OceanBase Database V4.3.2 introduced table-level overwrite writes (INSERT OVERWRITE), allowing atomic clearing of old data and writing of new data to support AP scenarios such as periodic data refreshes, transformations, and cleansing. However, V4.3.2 only supported full-table overwrite writes and did not provide partition-level or partial-column overwrite capabilities. V4.3.3 enhances this feature by allowing you to specify target partitions (primary or secondary) and target columns in INSERT OVERWRITE statements, making data overwrite more flexible and applicable to a wider range of business scenarios.
Support for compressed files in LOAD DATA and external tables
LOAD DATA is a common method for importing data, supporting various scenarios such as server-side import, direct load, and client-side import. Previously, only plain text files were supported, requiring decompression before import. V4.3.3 introduces support for compressed files, allowing GZIP, DEFLATE, and ZSTD files to be loaded, decompressed, and written to the database simultaneously. Additionally, CSV external tables now support compressed files, enabling direct querying of these compressed formats.
Data types and storage optimization
ARRAY data type
The ARRAY data type is commonly used in AP scenarios to store multiple elements of the same type. It is suitable for managing and querying multi-valued attributes that cannot be effectively represented by relational data. OceanBase Database V4.3.3 introduces the ARRAY data type in MySQL mode. You can define a column as an array of numeric or character values, including nested arrays. It supports query and write operations using array expressions, including
array_containsand the ANY operator for checking element presence. It also supports arithmetic and comparison operators (+, -, =, !=) for array element calculations and checks.Optimized RoaringBitmap performance
V4.3.2 introduced the RoaringBitmap data type and related expressions to support multi-dimensional analysis in scenarios such as user profiling, personalized recommendations, and precision marketing. However, performance was suboptimal in some cases. V4.3.3 optimizes memory allocation and expression execution logic, reducing unnecessary performance overhead, and significantly improves cardinality, and/or/xor/andnot, and aggregation scenarios.
Resource management and isolation
Query-level resource group settings
OceanBase Database currently supports configuring resource groups at the user, background task, and column parameter levels using the DBMS_RESOURCE_MANAGER package for CPU and IOPS isolation. V4.3.3 introduces query-level resource group binding. You can specify
/*+ resource_group('group_name') */in SQL statements to force the use of a specific resource group. If the specified resource group does not exist, the default resource group is used. Changes to the resource group take effect only after the session reconnects.
V4.3.2
Version information
- Release date: July 16, 2024
- Version number: V4.3.2 Beta
New features and enhancements
Optimizations for data import and export
Enhancements to incremental direct load
Starting from V4.3.1, incremental direct load is supported as an experimental feature. However, there are some limitations, such as the inability to import data with the
outrow loboption, and only incremental direct load with theload_modeset toinc_replace(i.e.,replacesemantics) is supported. V4.3.2 adds support for incremental direct load ofoutrow lobdata and extends theload_modeoption to includeincas a valid value, with a default semantics ofinsert. When theignorekeyword is specified in an SQL statement, it will take effect asignoresemantics. This feature is now officially released. For more detailed usage information, see Use the LOAD DATA statement for direct load and Use the INSERT INTO SELECT statement for direct load.Performance optimization for full direct load
In the full direct load scenario, the new version optimizes performance by reducing column type conversion operations during data loading, lowering CPU usage for statistics collection, removing the logic for generating primary keys on the
SELECTside ofINSERT INTO SELECT, and defaulting to disablingsum skip indexfor columnar storage (which specifies the pre-aggregatedSUMvalue for specified columns within a range). Additionally, by disabling microblock verification (setting themicro_block_merge_verify_levelparameter to 0), the direct load performance is improved by approximately 20%.Partitioned export using
SELECT INTO OUTFILEOceanBase Database currently supports exporting multiple files using
SELECT INTO OUTFILE, but not partitioned export. V4.3.2 introduces partitioned export capabilities to achieve a clearer directory structure. With this feature, you can construct a file directory as a partitioned external table, enhancing query efficiency through partition path pruning.
Integration with external data sources
Enhancements to external tables
OceanBase Database has long supported external tables for CSV files. However, with the expansion of AP business, there is a growing need to read Parquet-formatted external data sources in data lake scenarios. V4.3.2 introduces support for Parquet external tables, allowing users to import file data into internal tables or perform cross-data-source join queries directly. To ensure the timeliness of the file directories scanned by external tables, the new version includes an automatic file directory refresh feature. When creating an external table, you can specify the refresh method (manual, real-time, or periodic) using the
AUTO_REFRESHoption. Additionally, theDBMS_EXTERNAL_TABLE.REFRESH_ALL_TABLE(withinterval int) system package manages scheduled refresh tasks. For more detailed usage information, see Create an external table.
Performance optimizations
Performance optimization for typical AP scenarios
V4.3.2 optimizes various strategies, including data block prefetching, vectorized batch processing, filter pushdown, comparison calculations, aggregation calculations, DTL shuffle, and monotonic filter. With 100 GB of data, the performance of benchmark tests in AP scenarios such as TPCH, TPCDS, and ClickBench improves by approximately 10% to 15% compared to V4.3.1.
Data type and storage optimizations
RoaringBitmap
As the big data era progresses, the demand for user data mining and analysis in enterprises is growing. RoaringBitmap, with its space-saving and efficient computing characteristics, plays a significant role in business scenarios such as user profiling, personalized recommendations, and precise marketing. Starting from V4.3.2, OceanBase Database supports the RoaringBitmap data type in MySQL mode. By storing and operating on a set of unsigned integers, it enhances the performance of large-scale set calculations and deduplication. To meet multidimensional analysis needs, this version supports over 20 expressions for cardinality calculation, set operations, Bitmap judgment, Bitmap construction, Bitmap output, and aggregation operations.
Data operation optimizations
Table-level overwrite write
In data warehouses, scenarios such as regular data refresh, data conversion, and data cleaning and correction often require data overwrite writes. OceanBase V4.3.2 introduces table-level overwrite write capabilities (INSERT OVERWRITE), atomically clearing old data and writing new data within a table. Leveraging the full direct load capability, INSERT OVERWRITE also demonstrates high execution performance. V4.3.2 supports the
INSERT OVERWRITE tablename SELECT * FROM tablenamesyntax and does not yet support partition-level overwrite writes, which will be introduced in future versions.
Task scheduling and asynchronous execution
Asynchronous task scheduling
OceanBase Database currently supports various data import commands such as
insert overwrite,insert select,create table as, andload datato support real-time data entry. However, real-time import methods require waiting for the import to complete and cannot be interrupted, which is inconvenient for large-scale data imports. The new version provides asynchronous task scheduling capabilities based on DBMS_SCHEDULER. Users can use commands such assubmit job,show job status, andcancel jobto create, query the status, and cancel asynchronous import tasks.
V4.3.1
Version information
- Release date: May 17, 2024
- Version number: V4.3.1 Beta
New features and enhancements
Materialized view enhancements
Materialized view enhancements
OceanBase Database V4.3.0 supports materialized views. By precomputing and storing the query results of views, it reduces the need for real-time computation to improve query performance and simplify complex query logic, supporting AP scenarios. On this basis, V4.3.1 extends support for real-time materialized views, providing real-time computing capabilities based on data from both materialized views and MLOGs, suitable for analysis businesses with high real-time requirements. New materialized view primary key constraints are added, allowing users to specify primary keys for materialized views to optimize single-row lookups, range queries, or join scenarios based on primary keys. Additionally, incremental updates for inner join scenarios of materialized views are extended, improving the refresh performance of materialized views in some scenarios.
In V4.3.0, when using materialized views, you need to manually replace accesses to the original table in your business scripts with queries on the corresponding materialized view, introducing manual rewrite costs. The new version supports materialized view rewriting in some scenarios. When the system variable
QUERY_REWRITE_ENABLEDis set toTrue, you can specifyENABLE QUERY REWRITEwhen creating a materialized view to enable automatic rewriting. In this case, the system can rewrite queries on the original table to queries on the materialized view, reducing the amount of business modifications. For more information about automatic rewriting of materialized views, see Query rewriting for materialized views (MySQL mode) and Query rewriting for materialized views (Oracle mode).
Optimizations for data import and export
Incremental direct load (Experimental)
OceanBase Database V4.1.0 supports direct load. By simplifying the execution path of data loading and skipping modules such as SQL, transactions, and memtables, it directly persists data as SSTables, significantly improving data import efficiency. However, in scenarios where table data needs to be imported multiple times, each import requires rewriting existing data, affecting the performance of incremental imports. V4.3.1 optimizes incremental imports, allowing incremental direct load to process only new data without rewriting existing data, ensuring high performance for multiple imports. You can specify whether to use incremental direct load in
LOAD DATAandINSERT INTO SELECTstatements using the/*+ direct(need_sort, max_errors_allowed, load_mode)*/hint. Ifload_modeis not specified or set tofull, the original full direct load method is used. Ifload_modeis set toinc_replace, incremental direct load is used. This feature is defined as experimental in V4.3.1 and will be expanded and evolved into a production-ready feature in later versions. For more information about incremental direct load, see the Incremental direct load section in Overview of direct load.Performance optimization for SELECT INTO OUTFILE
The existing
SELECT INTO OUTFILEfeature supports parallel data reading but can only write data to external files sequentially, creating a performance bottleneck for data export. V4.3.1 introduces parallel export capabilities, adding thesingleandmax_file_sizeoptions to theSELECT INTO OUTFILEcommand to control data writing to external files. Thesingleoption allows you to export data to a single file or multiple files. If the parallelism is greater than 1 andsingleis set tofalse, data can be exported to multiple files, achieving parallel read and write. Themax_file_sizeoption controls the size of the exported files.
Index enhancements
Full-text index (MySQL mode) (Experimental)
In relational databases, indexes are typically used to accelerate queries based on exact value matching. However, B-tree indexes are not suitable for scenarios involving large amounts of text data requiring fuzzy searches. In such cases, full-table scans are necessary to perform fuzzy queries on each row of data, which may not meet performance requirements when the text is large and the data volume is high. Additionally, complex query scenarios such as approximate matching and relevance sorting are difficult to support through SQL rewriting.
To address these issues, OceanBase Database V4.3.1 introduces full-text indexing. By preprocessing text content and establishing keyword indexes, it significantly improves full-text search efficiency. This feature is defined as experimental in V4.3.1 and will be expanded and evolved into a production-ready feature in later versions. The following features are included in this release:
Full-text indexing is supported in MySQL mode, compatible with basic MySQL syntax;
- Full-text indexes can be prebuilt for CHAR, VARCHAR, and TEXT columns during table creation;
- Supports partitioned tables;
- Allows multiple full-text indexes to be created for the main table;
- Supports three built-in tokenizers: SPACE (space), NGRAM, and BENG (basic English);
- Supports searching across multiple columns using a single
MATCH AGAINSTclause; - Supports the NATURAL LANGUAGE MODE.
Partition management
Partition exchange
Over time, tables may accumulate a large amount of historical data that is not frequently accessed but needs to be retained for compliance or historical analysis. To improve query performance, businesses often need to distinguish between active and inactive data and archive the inactive data. Migrating data to a new table using SQL can address this scenario, but performance may be suboptimal for large data volumes. Therefore, OceanBase Database V4.3.1 introduces partition exchange, allowing data from table A to be nearly instantaneously moved to a partition of table B by modifying the definitions of partitions and tables in the data dictionary without physically copying data. This significantly improves data migration performance. The following features are included in this release:
- Supports data exchange between a partition of a partitioned table and a non-partitioned table;
- Supports first-level partitioned tables with RANGE (RANGE COLUMNS) partitioning;
- Supports second-level partitioned tables with first-level partitioning of any type and second-level partitioning of RANGE (RANGE COLUMNS);
- Supports the
including indexesbehavior, where local indexes are also exchanged and remain available after the exchange; - Supports the
without validationmode, requiring users to ensure data conforms to the partition key range.
Integration with external data sources
Partitioning of external tables
OceanBase Database V4.2.0 supports external tables, limited to non-partitioned tables. When a large number of files exist, but a query theoretically only needs to scan a subset of data files, non-partitioned external tables can only scan all files, resulting in poor performance. V4.3.1 introduces external table partitioning, supporting list partitioning similar to regular tables, with both automatic and manual partitioning syntax options. When automatically creating partitions, the system groups files into partitions based on the partition key definition. When manually creating partitions, users specify the data file subpaths for each partition. In this case, external table queries can implement partition pruning based on partition conditions, reducing the number of scanned files and significantly improving query performance.
Data types and storage optimization
MySQL JSON multi-value index (Experimental)
MySQL 8.0 introduced the multi-value index feature, which is applicable to JSON documents and other collection data types. This feature allows users to build indexes on arrays or collections to efficiently retrieve elements. OceanBase Database V4.3.1 is compatible with the JSON multi-value index feature in MySQL mode. It enables the creation of efficient secondary indexes on JSON array fields containing multiple elements, enhancing the query capabilities of complex JSON data structures while maintaining the flexibility of the data model and high query performance. This feature is defined as experimental in V4.3.1 and will be expanded and evolved into a production-ready feature in subsequent versions. The following features are included in this release:
- Support for pre-creation of multi-value indexes and composite multi-value indexes.
- Support for unique and non-unique multi-value indexes.
- Support for creating multi-value indexes with JSON array elements of types INT, UINT, DOUBLE, FLOAT, and CHAR.
- Applicable to partitioned tables.
- Support for using multi-value indexes with the MEMBER_OF(), JSON_CONTAINS(), and JSON_OVERLAPS() functions.
MySQL JSON partial update
Some users store business data in JSON documents. When updating a document, the current approach requires a full read and full update, which can be inefficient for large documents. OceanBase Database V4.3.1 introduces the JSON partial update feature. When a user updates specific fields in a JSON document using expressions like json_set, json_replace, or json_remove, only the modified fields are updated, without the need for a full overwrite. This improves update performance. To enable this feature, the log_row_value_options parameter must be configured.
Query execution and resource management
SQL temporary result compression
When the amount of data involved in SQL execution is excessively large, memory insufficiency may occur. In such cases, some operators need to materialize temporary intermediate results. If the materialized data exceeds the available disk space, SQL execution will fail. OceanBase Database V4.3.1 introduces the SQL temporary result compression feature. This feature allows you to specify whether to compress temporary results and the compression algorithm using the tenant-level parameter spill_compression_codec or an SQL-level hint (e.g., /*+opt_param('spill_compression_codec', 'lz4') */). Specifying temporary result compression effectively reduces temporary disk space usage, enabling the execution of queries with larger computational demands.
V4.3.0 Beta
Version information
- Release date: March 22, 2024
- Version number: V4.3.0 Beta
New features
Columnar storage engine
In scenarios involving large-scale data analysis or ad-hoc queries on massive datasets, columnar storage is a key capability of an AP database. Columnar storage is a data organization method that differs from row-based storage by arranging table data by columns. When data is stored in a columnar format, analysis queries can scan only the relevant columns needed for the computation, avoiding full-row scans. This reduces I/O and memory resource usage and improves computational speed. Additionally, columnar storage naturally offers better conditions for data compression, resulting in higher compression ratios, reduced storage space, and lower network bandwidth requirements.
However, traditional columnar storage engines often assume minimal random updates and aim to maintain static data organization. When faced with frequent random updates, performance issues are inevitable. The OceanBase LSM-Tree architecture addresses this by separately handling baseline and incremental data, making it ideal for such scenarios. Therefore, V4.3.0 extends support for columnar storage engines within the existing architecture, achieving integrated storage of both columnar and row-based data on a single codebase, architecture, and OBServer node. This approach balances the performance of both TP and AP queries.
To facilitate the migration of AP applications and ensure smooth usage of the new version, multiple modules have been adapted and optimized around the columnar storage engine. These include the optimizer, executor, DDL operations, and transaction processing. Key enhancements include a new cost model based on columnar storage, a vectorized engine, extended and enhanced query pushdown functionality, Skip Index, new columnar encoding algorithms, and adaptive compaction.
Users can flexibly configure tables as row-based, columnar, or hybrid (row-column redundant) based on their workload types. For more information about the columnar storage engine, see Columnar storage.
New vectorized engine
OceanBase Database has already implemented a vectorized engine based on the Uniform data description method in earlier versions, significantly improving performance compared to non-vectorized engines. However, in deep AP scenarios, there are still some performance limitations. V4.3.0 introduces the 2.0 version of the vectorized engine, which uses the Column data format description to avoid the memory usage, serialization, and read/write access overheads associated with ObDatum. Based on this data format description, the new version also re-implements a batch of commonly used operators and expressions, including over 10 operators such as HashJoin, AGGR, HashGroupBy, and Exchange (DTL Shuffle), as well as over 20 MySQL expressions involving relational operations, logical operations, and arithmetic operations. In subsequent V4.3.x versions, the new vectorized engine will continue to be used to supplement and enhance the implementation of other operators and expressions to achieve optimal performance in AP scenarios.
Materialized views
V4.3.0 introduces the Materialized View feature. Materialized views are a key capability for AP applications. They precompute and store query results, reducing the need for real-time computation to improve query performance and simplify complex query logic. Materialized views are commonly used in scenarios such as rapid report generation and data analysis.
Since materialized views store query result sets to optimize query performance, and they have data dependencies with base tables, the data in materialized views must be updated whenever the data in the base tables changes to maintain synchronization. To address this, V4.3.0 introduces a materialized view refresh mechanism, including full refresh and incremental refresh strategies. Full refresh is a straightforward approach where, each time a refresh operation is executed, the system re-executes the query statements corresponding to the materialized view, recalculates, and overwrites the original view result data. This method is suitable for scenarios with relatively small data volumes. In contrast, incremental refresh only processes the data that has changed since the last refresh. To achieve precise incremental refresh, OceanBase implements a feature similar to Oracle's MLOG (Materialized View Log), which tracks and records incremental updates in the base tables through logs. This ensures that materialized views can perform fast incremental refreshes. The incremental refresh method is particularly suitable for scenarios with large data volumes and frequent changes. For more information about materialized views, see Materialized views (MySQL mode) and Materialized views (Oracle mode).
