This topic only lists the core capability changes related to real-time analysis. For more information about the full capabilities and iterations of OceanBase Database, see Release Note.
V4.3.5 BP5
Version information
- Date: November 17, 2025
- Version: V4.3.5 BP5
New features and improvements
Materialized view functionality enhancement
- Supports incremental refresh for materialized views with LEFT JOIN.
- Supports nested materialized views based on incrementally materialized views without a primary key.
- Supports setting parameters of MIN/MAX aggregate functions in incremental materialized views to non-basic columns.
- Supports incremental materialized views with multiple tables connected that reference views declared as dimension tables (
AS OF PROCTIME()). - Supports full refresh materialized views by referencing UDF.
- Supports minimal refresh mode (incremental refresh DML only reads and writes the necessary data of the column store table) for incremental materialized views.
- Supports materialized views to refresh using the session variables that are set to specific values.
Delete-Insert table incremental data query supports Skip Index
V4.3.5 BP5 introduces support for skip indexes in delete-insert tables to preprocess incremental data. By using skip indexes, the system can pre-trim the data during query processing, significantly enhancing query efficiency. You can control the scope of skip index capabilities using the tenant-level parameter
default_skip_index_level.Partition exchange functionality extended
Prior to V4.3.5 BP5, a range/range columns partitioned table can be swapped with a non-partitioned table. Additionally, you can exchange a range/range columns partition of a subpartitioned table with a non-partitioned table. You can also exchange the primary partitions of a range/range columns partitioned table with those of a non-partitioned table. In addition to these features, V4.3.5 BP5 supports exchanging a list/list columns partition of a list/list columns partitioned table with that of a non-partitioned table.
V4.3.5 BP4
Version information
- Release date: September 10, 2025
- Version number: V4.3.5 BP4
New features and enhancements
Enhanced materialized view capabilities
V4.3.5 BP4 continues to enhance the capabilities of materialized views. Functionally, it supports incremental refreshes of materialized views without refreshing dimension tables. When creating a materialized view, you can use the new AS OF PROCTIME() syntax to specify tables that do not need to be refreshed. During incremental refreshes, the incremental data of these tables will not be refreshed. Additionally, single-table aggregation incremental refresh materialized views now support the MIN() and MAX() aggregate functions. In terms of usability, it supports automated management of MLOG tables: when creating an incremental refresh materialized view, the system automatically creates or replaces the required MLOG tables for the base table and periodically cleans up redundant MLOG tables in the background. The view content and error information for creating materialized views have also been optimized.
Disabled full bypass import by default for AP parameter templates.
V4.3.5 BP3
Version information
- Release date: July 21, 2025
- Version number: V4.3.5 BP3
New features and capabilities
Optimization of external data sources and integration
ODPS external table supports Storage API
In previous versions, the ODPS external table based on Tunnel API requires an independent session for each partition during execution. Opening a session takes seconds, which causes some queries with very short execution time to spend time on session preparation. In addition to Tunnel API, ODPS provides an open Storage API that allows direct access to the underlying storage of ODPS. This API offers more optimization strategies and better read performance. Therefore, the new version supports ODPS Storage API to further improve the performance of external table access.
Object storage supports Azure Blob
The new version supports accessing Azure Blob objects using the Azure Blob protocol.
Enhanced materialized views
Enhanced materialized view capabilities
The new version continues to improve the capabilities of materialized views. This includes support for renaming materialized views, incremental refresh for outer joins, UNION ALL, and non-aggregated single-table operations, support for adding columns to materialized view logs, and cascading refresh for nested materialized views. It also fixes issues such as materialized view drop getting stuck, enhancing the stability of materialized views.
Optimization of storage and partition management
Support for exchanging partitions between primary and secondary partitioned tables
The partition exchange feature now supports exchanging partitions between primary partitioned tables and secondary partitioned tables.
Optimizer and query performance improvements
Enhanced statistics collection
In versions prior to V4.3.5 BP3, statistics collection required gathering aggregated data such as MIN, MAX, NULL COUNT, and NDV for each column at the partition and table levels, and estimating data distribution through sampling, which introduced significant errors. The new version leverages the Skip Index capability at the storage layer to collect more accurate column-level aggregated data at the block level for all columns in columnar storage. This helps the optimizer generate and optimize plans with greater accuracy. Additionally, the new version introduces a progressive statistics collection approach for partitions to address the issue of long collection times for large partitioned tables, thereby improving system stability and data timeliness.
Accelerated data import
Performance optimization for direct load
The new version optimizes the execution logic of direct load, including sorting algorithms, vectorized paths for direct load, CSV parsing, column_conv expressions, and CS encoding. In columnar storage scenarios for non-partitioned heap tables, performance can improve by up to 10%.
V4.3.5 BP2
Version information
- Release date: May 15, 2025
- Version number: V4.3.5 BP2
Product forms
- Shared-Storage AP
Starting from V4.3.5 BP2, you can deploy OceanBase Database in the Shared-Storage AP form to handle AP workloads. The Shared-Storage AP form supports key AP features in the Shared-Nothing deployment mode, such as columnar tables, incremental direct load, materialized views, full-text indexes, and multi-value indexes. The I/O read method in the Shared-Storage form has been optimized. We recommend that you use the Shared-Storage AP form for AP workloads that have data cold/warm characteristics and want to reduce storage costs.
Feature enhancements
Enhanced data recovery capability
Support for restoring columnar tables
Before this version, the table-level restore feature only supported restoring to row-based tables. This version improves the table-level restore feature in the Shared-Nothing deployment mode to support restoring to columnar tables and hybrid row-columnar tables.
Performance optimization
Optimized performance of direct load to heap tables
The uniqueness check process in the unique index creation phase of direct load to heap tables is optimized. The performance of full direct load to non-partitioned heap tables is not worse than that of index-organized primary key tables.
Optimized performance of hard parsing
In some AP scenarios or scenarios of certain account sizes, you can disable the plan cache and use hard parsing to reduce the use of suboptimal plans. However, hard parsing requires higher performance. This version optimizes the resolver phase by optimizing hot functions, replacing global cost verification scenarios with local cost verification, reducing expression memory usage, and eliminating unnecessary expression type inference operations. This improves the performance of the optimizer in the resolve and rewrite phases.
Enhanced materialized views
Support for creating materialized views based on external tables
Before this version, OceanBase Database supported creating materialized views based on user tables and materialized views. This version adds support for creating materialized views based on external tables in full refresh scenarios, expanding the applicable scenarios of materialized views.
Enhanced diagnostics for materialized views
The
CDB/DBA_MVIEW_RUNNING_JOBSsystem view is added to display ongoing materialized view tasks, such as refresh tasks and MLOG cleanup tasks. TheDBA_MVIEW_DEPSsystem view is added to display the dependent objects of materialized views. TheCDB/DBA/ALL/USER_MVIEWSsystem view records the refresh points and refresh latency of materialized views. TheCDB/DBA/ALL/USER_MVIEW_LOGSview records the parallelism and time consumption of MLOG cleanup.Resource isolation for materialized view tasks
Refreshing and purging materialized views consumes system resources, which may affect the performance of foreground tasks. This version provides resource isolation for incremental refreshes and MLOG purges of materialized views based on Resource Manager. You can set the resource usage limit for materialized view tasks.
Partition management
Dynamic partition management
Businesses often use date or time as the RANGE partitioning key to divide data of different time periods into different partitions. DBAs need to precreate some future time partitions to meet the data write requirements. If DBAs forget to precreate partitions for some reason, data may fail to be written because the partitions do not exist, which affects the business. To address this issue, this version provides dynamic partition management. You can specify the dynamic partition management attribute for a table. The kernel automatically precreates future time partitions, reducing the burden on DBAs. Dynamic partition management also supports automatically deleting expired partitions that are no longer needed, saving storage space.
Indexes and query optimization
NGRAM2 tokenizer for full-text indexes
The
NGRAM2tokenizer is added as an internal tokenizer for full-text indexes. It supportsNGRAMtokenization in themin_ngram_sizetomax_ngram_sizerange. TheNGRAM2tokenizer is suitable for scenarios where performance and storage space are not sensitive, and different-lengthtokenretrieval is required. You can use theNGRAMtokenizer for fixed-lengthtoken.
Data types and storage optimization
Map data type
The map data type is used to store unordered key-value pairs, such as {a:1, b:2, c:3}. Common use cases include saving configuration options, user attributes, and product information. OceanBase Database implements the map data type based on the Array framework in V4.3.5 BP2. It supports the map constructor and functions and operators such as
map_keys,map_values,=, and!=.MySQL-compatible mode JSON semi-structured storage
Currently, semi-structured data is stored as binary strings. It is difficult to encode and compress it using encoding. Each data record stores redundant data meta information for self-explanation, resulting in a low compression ratio. JSON is a typical semi-structured data type with strong structural characteristics. In theory, semi-structured data can be logically split into multiple basic-type columns, and the parts that cannot be split can be stored in a binary column. Structured columns can leverage encoding to improve the compression ratio. The parts that cannot be split, which store extracted data meta information, occupy less space than before. This is called "structured storage of semi-structured columns". OceanBase Database supports semi-structured storage of the JSON data type in V4.3.5 BP2. It implements the corresponding encoding method, reduces the storage space occupied, and optimizes the query and filtering performance of JSON subcolumns. However, because JSON data is stored in multiple subcolumns, frequent reads of the original JSON data may introduce merge overhead and result in performance degradation. You can enable or disable this feature based on your business requirements.
Query execution
Adaptive plan cache
Plan cache is a critical feature in TP scenarios. Therefore, it is enabled by default. In some AP scenarios, the plan execution time is much longer than the plan generation time. In this case, generating a new plan for each query is often more efficient than reusing the plan. In hybrid HTAP scenarios, you need a way to let AP-oriented SQL statements use hard parsing and TP-oriented SQL statements reuse plans to avoid the overhead of hard parsing. OceanBase Database supports adaptive plan cache in the new version. It can decide whether to use the plan cache of an SQL statement based on the execution time and time distribution of the SQL statement.
External data source integration
MySQL-compatible mode ODPS (MaxCompute) catalog
To facilitate queries on data stored in various external data sources, this version supports the External Catalog framework and the ODPS Catalog feature. Internal objects in OceanBase Database belong to the Internal Catalog. You can create an External Catalog to connect to an external data source and obtain the metadata of the external data. You can directly query data in tables of an External Catalog, such as
select col1 from catalog1.database1.table1;. If the permissions are met, you can also query tables across catalogs. You no longer need to import data or create external tables one by one, improving the ease of use of external data access.
Data import
Load Data import with URL CSV external table error tolerance mode
When you import data by using the Load Data method, an error is returned if the data types are incompatible or the precision does not match. This version adds the
LOG ERRORSinstruction to support error-tolerant import. In MySQL-compatible mode, failed rows are recorded. You can view the error data by using theshow warningscommand and perform error diagnostics.
V4.3.5 BP1
Version information
- Release date: March 18, 2025
- Version: V4.3.5 BP1
New features and enhancements
Data import and export
Support for incremental direct load to heap tables with unique indexes
Direct load is divided into full direct load and incremental direct load. When data already exists in the table, incremental direct load performs better than full direct load. However, previously incremental direct load did not support importing data into heap tables with unique indexes. The new version supports importing data into heap tables with a local unique index using incremental direct load.
Support for HASH partitioned and concurrent direct load
Starting from V4.3.5 BP1, the system supports HASH partitioned direct load. Additionally, incremental direct load now supports concurrent loading across multiple partitions. However, if multiple import tasks have overlapping partitions, partition-level parallel import is not supported.
Performance optimization for direct load
Through vectorization optimization, streamlined temporary file processing, and performance improvements in each stage, the import performance of the clickbench table has improved by 17% compared to V4.3.5.
Integration with external data sources
Support for reading HDFS files from external tables
HDFS is the most common storage medium in data lake architectures and serves as a platform for multi-engine data sharing. Therefore, the new version of OceanBase Database supports directly reading files from HDFS through external tables. It also supports connecting to HDFS clusters with Kerberos authentication via relevant configurations.
Support for reading and writing data from/to MaxCompute (ODPS)
The new version supports reading data from MaxCompute through external tables or URL external tables. It also supports writing data from OceanBase Database back to MaxCompute.
URL external tables
When accessing external data sources in OceanBase Database, you typically need to create an external table and then use SQL queries. The new version introduces URL external tables, allowing you to directly query external CSV, Parquet, ORC, and MaxCompute data using SELECT statements without creating an external table. Additionally, it extends the LOAD DATA syntax to directly reference URL external tables for data import, reducing the operational cost of data import.
Index optimization
Enhancements to full-text indexing
OceanBase Database has supported full-text indexing in MySQL-compatible mode since V4.3.1. As the version evolved, additional features and enhancements were introduced. V4.3.5 BP1 introduces the built-in Chinese
IKtokenizer, supportingsmartandmax_wordtokenization modes. It also allows adding new tokenizers via plugins. The new version supports thePARSER_PROPERTIESconfiguration at the table level for full-text indexes. In addition to the existing natural language mode, it introduces the boolean mode, which can be enabled using theIN BOOLEAN MODEkeyword inMATCH AGAINSTqueries. This mode supports boolean operators such as+,-,(), and no operator, as well as nested operations. Furthermore, the new version supports full-text indexes participating in index merge, improving query performance through index union merge. It also optimizes the update and deletion scenarios for full-text indexes on the main table, enhancing the performance of these operations.
Materialized view enhancements
Enhancements to materialized views
The parallelism of materialized view refreshes is closely related to the efficiency of incremental and full refreshes. The new version supports a complete parallelism control mechanism, allowing you to set the refresh parallelism using the
mview_refresh_dopsystem variable, theDBMS_MVIEW.refreshprocedure, or thePARALLELoption when creating a materialized view. The new version also supports theLOBdata type in incremental materialized views and provides the ability to directly modify materialized views or materialized view logs. You can use theALTER MATERIALIZED VIEWandALTER MATERIALIZED VIEW LOGcommands to modify the parallelism, background refresh task intervals, MLOG table parallelism, MLOG table background cleanup task intervals, and the LOB inline storage length threshold for MLOGs. Additionally, when an MLOG is created on the base table, you cannot add columns to the base table. You must first drop the MLOG before performing DDL operations, which can be complex. To address this, the new version supports adding columns to the MLOG base table, including both appending columns at the end and inserting columns in the middle. Furthermore, the new version improves the performance of MLOG cleanup and optimizes error messages for incremental refreshes.
Table structure and storage optimization
Heap table organization mode
OceanBase Database uses a clustered index table model, which performs well in OLTP environments. It stores the primary key and main table data in the same table, optimizing primary key query speed and uniqueness validation performance. However, this model has limitations in OLAP scenarios requiring efficient data import and complex data analysis. The import process requires sorting all data, and queries must perform primary key row fusion, both of which impact performance. The new version introduces the heap table organization mode, where the primary key is used for uniqueness constraints, and queries rely on the main table. When user data is sorted by time, skip indexes can be more effectively utilized to improve query efficiency. Additionally, decoupling the primary key from the data eliminates the need to sort the main table data during import, enhancing import performance.
Data type and storage optimization
New String data type in MySQL-compatible mode
Some AP applications have a higher tolerance for column lengths and often use variable-length string types to store data, which also needs to serve as primary keys or index keys. In OceanBase Database, the char and varchar types can store strings and serve as primary keys or index keys, but they require a specified length. The mediumtext and text types, which are LOB types, can store variable-length strings but cannot serve as primary keys or index keys. To better meet the needs of AP applications, the new version introduces the String data type, which does not require a specified length and has a default maximum length of 16 MB. When the column length is less than 16 KB and does not exceed the lob_inrow_threshold, it can serve as a primary key or index key.
Enhancements to ARRAY type-related functions
OceanBase Database supports the ARRAY data type and some common functions and operators since V4.3.3. The new version further enhances these functions, adding support for
array_prepend,array_concat,array_compact,array_filter,array_sort,array_sortby,array_length,array_range,array_sum,array_first,array_difference,array_min,array_max,array_avg,array_position,array_slice, andreverse.
Parallel query optimization
Decoupling of PX computing nodes from data nodes
In the current system architecture, storage and computing resources are highly coupled. The existing PX scheduling mechanism for Non-Leaf DFOs only selects machines where data is stored for computing task allocation, which can limit the full utilization of machine resources in certain scenarios. To enable large SQL queries to leverage more computing resources, the new version decouples PX computing nodes from data nodes. You can use configuration parameters or hints (PX_NODE_POLICY) to specify the candidate resource pools for Non-Leaf DFOs. Additionally, you can use hints (PX_NODE_ADDRS, PX_NODE_COUNT) to force the allocation of Non-Leaf DFOs to specific machines or a specific number of machines.
Query optimization and pushdown
Support for query pushdown in rowstore data column addition scenarios
During rowstore SCAN operations, if there is no intersection between the major sstable and other data sources (i.e., no rowid matches), the SCAN operation is pushed down to the storage layer. After a major compaction, when new columns are added, the SCAN operation can theoretically be pushed down. However, if the parsing of the new column fails, the pushdown operation will fail, and the system will fall back to the less efficient row-by-row read method. The new version introduces special optimizations for this scenario, supporting query pushdown to the storage layer to improve query performance.
Columnstore storage optimization
Optimizations for columnstore replicas
The columnstore replica feature introduced in V4.3.3 had some limitations, such as supporting only a single C replica, not allowing direct full import of columnstore data to C replicas, and lacking user-friendliness. The new version addresses these limitations by allowing multiple C replicas to be specified in a cluster and supporting direct full import of columnstore data to C replicas. It also introduces the system view
CDB/DBA_OB_CS_REPLICA_STATSfor displaying the progress of columnstore replica conversion at the partition and log stream levels, making it easier for users to monitor the status of columnstore replicas.
Behavior change
Disable NLJ in AP parameter templates
After the optimizer generates a nested loop join plan, the plan is sensitive to the number of rows in the driving table. If the estimated number of rows in the driving table is significantly smaller than the actual number, the actual execution performance of the plan may severely degrade. In OLAP scenarios, the data volume is usually large, and the benefits of using a nested loop join plan are limited. Therefore, in AP scenarios, the default is to generate a hash join plan, and a nested loop join plan is only generated in non-equi join scenarios.
V4.3.5
Version information
- Release date: December 31, 2024
- Version number: V4.3.5
New features and enhancements
Materialized view enhancements
Support for nested materialized views
In versions earlier than V4.3.5, materialized views could only be created on regular user tables. In data warehouse scenarios, materialized views are used for data processing. To better support lightweight real-time data warehouses, V4.3.5 introduces nested materialized views, which allow new materialized views to be created based on existing ones. The refresh methods supported for nested materialized views are the same as those for non-nested ones, including full and incremental refreshes. Therefore, V4.3.5 supports creating materialized view logs based on materialized views. The freshness of data in nested materialized views depends on the base table's data freshness. To ensure the freshness of upper-level materialized views, users must refresh the underlying materialized views first.
For more information, see the section on creating nested materialized views in Create a materialized view (MySQL-compatible mode) or Create a materialized view (Oracle-compatible mode).
Data import and export
Support for specifying partitions for direct load
In versions earlier than V4.3.5, OBServer only supported direct load for entire tables to accelerate the import process. If you want to import data from only specific partitions, you can use the partition exchange method. This involves first using a full direct load to import the partition data into a non-partitioned table, and then exchanging the non-partitioned table with the target partition. This process can be cumbersome for users. To better support partition-level data imports, V4.3.5 allows you to specify partitions for direct load using the
LOAD DATAandINSERT INTO SELECTsyntax. However, it does not support specifying the last-level partition type as Hash or Key.For more information, see Full direct load or Incremental direct load.
Enhancements to the
select into outfilefeatureV4.3.5 enhances the
select intoexport feature by supporting the export of parquet and orc types, allowing you to specify a compression algorithm when exporting csv files, and adding new syntax forselect into.- Use the format syntax to set export options.
- Specify a compression algorithm when exporting csv files.
- Export parquet files.
- Export orc files.
Index enhancements
Enhancements to full-text search features
V4.3.5 continues to enhance the full-text search features introduced in V4.3.4. The main improvements include:
- Support for creating full-text indexes using the
CREATE FULLTEXT INDEXorALTER TABLE ADD FULLTEXT INDEXstatements. - Support for cost-based full-text index plan selection, allowing the system to choose the least costly index for scanning when multiple filters with
MATCH AGAINSTare present in an SQL statement. - Support for Functional Lookup:
- Support for multiple
MATCH AGAINSTexpressions in a query statement. - Support for scanning other indexes in addition to full-text indexes.
- Support for outputting
MATCH AGAINSTexpressions without filter semantics as projected columns. - Support for using <=/< symbols to filter
MATCH AGAINSTexpressions with filter semantics. - Support for linking
MATCH AGAINSTexpressions with filter semantics using AND/OR logic with other filter semantics.
- Support for multiple
For more information, see MATCH AGAINST.
- Support for creating full-text indexes using the
External data source integration
Enhancements to external tables
In V4.3.5, external tables support reading ORC files.
For more information, see Create an external table (MySQL-compatible mode) or Create an external table (Oracle-compatible mode).
Columnar storage optimization
- Read-only columnar replicas
The read-only columnar replica feature introduced in V4.3.3 is now a general availability (GA) feature in V4.3.5 after being iteratively improved in V4.3.4 and V4.3.5.
Support for asynchronous online DDL during row-to-column conversion
V4.3.0 introduced three storage formats for tables: rowstore, pure columnar, and hybrid row-column. Converting a rowstore table to a hybrid row-column or columnar table is an offline DDL operation. To minimize the impact of offline DDL on customers, V4.3.5 supports asynchronous online DDL for row-to-column conversion. By specifying the
delayedkeyword during DDL execution, you can modify the table schema in real time without blocking user data writes, achieving an online effect. The baseline data reorganization for columnar storage is then asynchronously executed during baseline data compaction. V4.3.5 supports both row-to-column and row-to-hybrid row-column conversions as online DDL operations.
Performance optimization
Vectorization in direct load write paths
In versions earlier than V4.3.5, direct load did not use vectorization, and write functions processed data row by row, resulting in significant function call overhead. Additionally, writing sstables was a background process without specific optimizations for certain operations. V4.3.5 introduces vectorization in the direct load path and optimizes columnar encoding to enhance direct load performance. Testing has shown that direct load performance for tables without a primary key can be improved by approximately 200%.
Data type and storage optimization
Support for complex DML operations with JSON multi-value indexes
V4.3.5 enhances support for complex DML operations with JSON multi-value indexes. It allows the use of multi-value predicates in
updateanddeletestatements and supports complex DML operations involving multi-value index lookups.Support for more array expressions
OceanBase Database introduced the ARRAY data type in V4.3.3. To better support the use of ARRAY types in business scenarios, V4.3.5 supports more array expressions to meet diverse business needs. New array expressions include
array_append,array_distinct,arrayMap,array_remove,cardinality,element_at,array_contains_all,array_overlaps,array_to_string,array_agg,unnest, andrb_build.
V4.3.4
Version information
- Release date: October 31, 2024
- Version number: V4.3.4
New features and enhancements
Storage architecture optimization
Shared storage (experimental feature)
To provide more cost-effective database services in public cloud environments, OceanBase Database V4.3.4 introduces the Shared-Storage Cluster architecture based on a general-purpose object storage. This architecture allows multiple tenants to share a baseline dataset and log data stored in a standard object storage. Hot data is stored in cloud disks or local SSDs, while cold data is stored in standard object storage. This approach reduces the overall storage costs. Each tenant's replicas only cache hot data and logs on local storage, enabling quick scaling of compute nodes. The shared storage architecture is suitable for TP workloads with clear hot/cold data separation, such as historical and backup databases, as well as key-value and time-series workloads.
The shared storage feature is defined as an experimental feature in V4.3.4 and will be further enhanced in future versions to become a production-ready feature.
Indexing capabilities
Enhanced full-text indexing
OceanBase Database has supported full-text indexing since version 4.3.1. This feature preprocesses text content to build keyword indexes, significantly improving full-text search efficiency. However, complex DML operations involving full-text indexes were previously limited, causing some inconvenience for users. To better support business needs, V4.3.4 enhances complex DML operations, allowing support for
INSERT INTO ON DUPLICATE KEY,REPLACE INTO, multi-table updates and deletes, and updatable views on tables with full-text indexes. Additionally, the new version optimizes full-text search performance using vector execution and extended TAAT processing strategies. It also removes the token limit for theMATCH AGAINSTpredicate and supports creating full-text indexes on partitioned tables without a primary key. Furthermore, the new version allows users to view tokenization results using thetokenizefunction, aiding in debugging tokenization systems.For more information, see Create a multi-value index.
Enhanced multi-value indexing
Previously, multi-value indexing only supported pre-created indexes, with limitations in DML and DDL operations. To better support business needs, V4.3.4 relaxes some restrictions and enhances the feature. The main enhancement is the support for complex DML operations, including
INSERT INTO ON DUPLICATE KEY,REPLACE INTO, multi-table updates and deletes, and updatable views.
Data import optimization
Support for non-unique local indexes in incremental direct load
OceanBase Database supports full direct load and incremental direct load. When data already exists in the table, incremental direct load outperforms full direct load. Previously, incremental direct load did not support importing data into tables with indexes. V4.3.4 extends this support to tables with non-unique local indexes.
Optimized direct load performance
During direct load, if a table with a primary key requires sorting, the data is first written to a temporary file, then loaded into memory for sorting and merging, and finally written to an sstable. This two-phase approach (data writing and sorting/merging) can cause performance bottlenecks if data is not fully written or I/O reaches its limit. V4.3.4 optimizes this by directly writing data to memory instead of a temporary file, followed by sorting and merging. This pipeline approach avoids stage dependencies. In a primary key table scenario, this optimization improves performance by 35% for 1TB of data. Larger memory and CPU resources yield even greater performance gains, making this optimization ideal for large-scale tenants. For smaller tenants, performance remains stable without significant degradation.
Data types and storage optimization
Enhanced bitmap functionality
A bitmap, or bitset, is a data structure for efficiently storing and processing whether elements exist in a set. OceanBase Database V4.3.2 and V4.3.3 introduced some features of the roaringbitmap type. V4.3.4 further enhances this by adding the
rb_iterateandrb_selectexpressions.rb_iterateexpands a roaringbitmap type into multiple rows based on the number of elements, whilerb_selectfilters elements in a roaringbitmap type based on conditions and returns a new roaringbitmap type.
V4.3.3
Version information
- Date: September 30, 2024
- Version: V4.3.3
New features and enhanced capabilities
Columnar storage optimization
**Read-only columnstore replica (experimental) **
In OceanBase Database V4.3.0, you can create rowstore tables, columnstore tables, or hybrid row and columnstore tables. The replication mode is consistent across the zones in the same tenant. For example, if zone distribution of tenant X is 1:1:1 and the table T1 is a hybrid row and columnstore table, a rowstore replica and a columnstore replica are stored on each of the three zones. V4.3.3 introduces a new deployment type to meet the need for physical isolation of TP and AP resources in an HTAP workload scenario. The new deployment type enables you to expand a separate zone from the original cluster to store read-only columnstore replicas (columnstore replicas, or C replicas). In the separate zone, all user tables are stored as columnstore tables. You can set the system variable
ob_route_policytoCOLUMN_STORE_ONLYat the session level in ODP to access C replicas for queries and analysis in weak-consistency read mode, without affecting original TP services. The deployment type saves some storage resources compared with hybrid row and columnstore tables. You must use ODP V4.3.2 or later. For more information, see Columnstore replicas.In V4.3.3, this feature is defined as a trial feature and will be further improved to become a production-ready feature in later versions.
Materialized View Enhancements
Materialized View capabilities
To reduce the cost of manually rewriting business queries, OceanBase Database V4.3.1 supports the rewrite of materialized views. When you set the system variable
QUERY_REWRITE_ENABLEDtoTrue, you can use the automatic rewrite feature by specifyingENABLE QUERY REWRITEwhen you create a materialized view. In this scenario, the system can rewrite queries on the base table to queries on the materialized view. V4.3.1 supports the rewrite of non-aggregate materialized views when all clauses in theFROMclause of a query match corresponding clauses in theWHEREclause. V4.3.3 supports the following features: rewrite of non-aggregate materialized views with join compatibility in theFROMclause, rewriting of queries on tables that do not exist in materialized views, and rewrite of aggregate materialized views and aggregate rollup operations.Meanwhile, V4.3.3 also extends the support for incremental refresh and real-time materialized views in terms of SQL types. Earlier versions supported incremental refresh and real-time queries in single-table aggregation scenarios and multiple-table join scenarios. V4.3.3 further extends support to join aggregation scenarios. For more information, see Materialized view query rewriting (MySQL-compatible mode) and Materialized view query rewriting (Oracle-compatible mode).
In addition to supporting materialized views of the ROWSTORE storage type, OceanBase Database V4.3.3 supports materialized views of the COLUMNSTORE storage type, which provides better query performance for complex analysis scenarios involving the referencing of materialized views. For more information, see Create a materialized view (MySQL-compatible mode) and Create a materialized view (Oracle-compatible mode).
Data import and export optimization
Import Performance of an External Data Source Optimized
V4.3.3 optimizes the performance of importing external table data by bypassing the import process. Compared with the performance of the previous version, the performance is improved by about 15%.
Enhanced INSERT OVERWRITE
Starting from OceanBase Database V4.3.2, the overwrite feature (INSERT OVERWRITE) is supported. This feature allows you to clear the original data in the table and write new data. You can use this feature to support AP application scenarios such as regular data updates, data transformations, and data cleaning. However, in V4.3.2, the overwrite feature is only available for replacing the entire table. Partition-based or column-based overwrite replacement is not supported. V4.3.3 has enhanced this feature. You can specify the partition level (such as partition or subpartition) and the column information for a table when you use the INSERT OVERWRITE statement. This provides more flexibility to replace data in the specified table.
Data Load/Support for compressed files
The Load Data method is a common way to import data, supporting scenarios such as normal server-side import, bypass import, and client-side import. Data in specified text file formats can be imported to a database. However, in previous versions, Load Data supported only normal text files, and compressed files such as GZIP files had to be decompressed before they could be imported. This process was complex. The new version supports the direct import of compressed files, such as GZIP, DEFLATE, and ZSTD. Compressed files can be loaded, decompressed, and written to a database simultaneously. In addition, the CSV file view supports the query of data from compressed files in GZIP, DEFLATE, or ZSTD format by accessing the view.
Data types and storage optimization
ARRAY data type
An array is a complex data type often used in AP applications. An array stores multiple elements of the same data type. When you need to store or query multivalued attributes that cannot be effectively represented in relational data, an array is a suitable data type. OceanBase Database V4.3.3 supports the array type in MySQL-compatible mode. When you create a table, you can define a column of the array type and specify the data type of elements as numerical or character types. You can also define nested arrays. OceanBase Database V4.3.3 supports expression-based queries and writes for arrays and array_contains expressions and the ANY operator for verifying whether elements are included in an array. You can also perform operations (such as addition, subtraction, comparison, and assignment) on array elements.
Optimize RoaringBitmap performance
V4.3.2 supports the RoaringBitmap data type and related expressions for multi-dimensional analysis in business scenarios such as user profiling, personalized recommendations, and precision marketing. However, the performance of RoaringBitmap type calculations is suboptimal in some scenarios. This version focuses on analyzing performance issues of RoaringBitmap type calculations and optimizing memory allocation and expression execution logic. The unnecessary performance overheads are reduced. As a result, the execution performance in cardinality, and/or/xor/andnot, and aggregation scenarios is improved severalfold.
Resource management and isolation
Query-level resource group configuration
OceanBase Database provides the DBMS_RESOURCE_MANAGER package for resource group configurations at the user level, the background task level, and the column parameter level, to achieve CPU and I/O resource isolation. In V4.3.3, the system enables resources group binding at the query level. When you specify the
/*+ resource_group('group_name') */HINT in an SQL statement, the statement will be forced to use the resources of the corresponding resource group. If the specified resource group does not exist, the default resource group will be used. When you change resource groups, the session must be reconnected for the change to take effect.
V4.3.2
Version information
- Release date: July 16, 2024
- Version number: V4.3.2 Beta
New features and enhancements
Optimizations for data import and export
Enhancements to incremental direct load
Starting from V4.3.1, incremental direct load is supported as an experimental feature, with some limitations, such as not supporting the import of
outrow lobdata and only supporting incremental direct load withload_modeset toinc_replace(i.e.,replacesemantics). V4.3.2 adds support for incremental direct load ofoutrow lobdata and extends theload_modeparameter to includeincas a valid value, with a default semantics ofinsert. When theignorekeyword is specified in an SQL statement, the semantics change toignore. This feature is now officially released. For more detailed information on how to use this feature, see Use the LOAD DATA statement to directly load data and Use the INSERT INTO SELECT statement to directly load data.Performance optimization for full direct load
In the full direct load scenario, the new version optimizes performance by reducing column type conversion operations during data loading, lowering CPU usage for statistics collection, removing the generation of primary keys at the
SELECTend inINSERT INTO SELECTstatements, and defaulting to disablingsum skip indexfor columnar storage (which specifies the pre-aggregatedSUMvalues for specified columns within a range). Additionally, by disabling microblock verification (setting themicro_block_merge_verify_levelparameter to 0), the performance of direct load is improved by approximately 20%.Partitioned export using
SELECT INTO OUTFILEOceanBase Database currently supports exporting data to multiple files using the
SELECT INTO OUTFILEstatement, but it does not support exporting data by partition. V4.3.2 introduces the ability to export data by partition, resulting in a clearer directory structure. This feature also allows the construction of partitioned external tables from the exported file directories, improving query efficiency through partition path pruning.
Integration with external data sources
Enhancements to external tables
OceanBase Database has long supported external tables for CSV files, but as AP (Analytical Processing) workloads have expanded, the need to read Parquet-formatted external data sources has become increasingly common. V4.3.2 introduces support for Parquet-formatted external tables, allowing users to import data from these files into internal tables or perform cross-source join queries directly using external tables. To ensure the timeliness of the file directories scanned by external tables, the new version includes an automatic file directory refresh feature. When creating an external table, users can specify the refresh method (manual, real-time, or periodic) using the
AUTO_REFRESHoption. Additionally, theDBMS_EXTERNAL_TABLE.REFRESH_ALL_TABLE(withinterval int) system package can be used to manage scheduled refresh tasks. For more detailed information on how to use this feature, see Create an external table.
Performance optimizations
Performance optimization for AP scenarios
V4.3.2 optimizes various aspects of AP (Analytical Processing) scenarios, including data block prefetching, vectorized batch processing, filter pushdown, comparison calculations, aggregation calculations, DTL shuffle, and monotonicity filters. Under a 100GB data volume specification, the performance of benchmark tests in AP scenarios such as TPCH, TPCDS, and ClickBench has improved by approximately 10% to 15% compared to V4.3.1.
Data type and storage optimizations
RoaringBitmap
With the rise of big data, enterprises are increasingly focused on user data mining and analysis. RoaringBitmap, known for its space efficiency and computational efficiency, plays a crucial role in scenarios such as user profiling, personalized recommendations, and precise marketing. OceanBase Database V4.3.2 introduces support for the RoaringBitmap data type in MySQL-compatible mode. By storing and operating on a set of unsigned integers, this feature enhances the performance of large-scale set calculations and deduplication. To meet multidimensional analysis needs, this version supports over 20 expressions for cardinality calculation, set operations, bitmap judgment, bitmap construction, bitmap output, and aggregation operations.
Data operation optimizations
Table-level overwrite write
In data warehouses, scenarios such as data refresh, conversion, and cleaning often require data overwrite writes. OceanBase Database V4.3.2 introduces the
INSERT OVERWRITEfeature, allowing atomic clearing of old data and writing of new data within a table. Leveraging the full direct load capability,INSERT OVERWRITEalso demonstrates high execution performance. V4.3.2 supports theINSERT OVERWRITE tablename SELECT * FROM tablenamesyntax, but partition-level overwrite write is not supported yet and will be introduced in a future version.
Task scheduling and asynchronous execution
Asynchronous task scheduling
OceanBase Database currently supports various data import commands such as
insert overwrite,insert select,create table as, andload datato enable real-time data ingestion. However, real-time import requires waiting for the import to complete, and sessions cannot be interrupted, which can be inconvenient for large-scale data imports. V4.3.2 introduces asynchronous task scheduling based on DBMS_SCHEDULER. Users can create, query the status of, and cancel asynchronous import tasks using commands such assubmit job,show job status, andcancel job.
V4.3.1
Version information
- Release date: May 17, 2024
- Version: V4.3.1 Beta
New Features and Enhancements
Materialized view enhancements
Materialized view enhancements
OceanBase Database V4.3.0 supports the materialized view feature. This feature precomputes and stores the query results of a view to reduce real-time computing and improve query performance. It also simplifies complex query logic to meet the needs of AP scenarios. On this basis, V4.3.1 extends the support for real-time materialized views. This feature provides real-time computing capabilities based on the data of materialized views and MLOGs. It is suitable for analytical business scenarios with high real-time requirements. V4.3.1 also adds the primary key constraint for materialized views. This feature allows users to specify the primary key for a materialized view to optimize the performance of single-row lookups, range queries, and join operations based on the primary key. V4.3.1 also extends the support for incremental updates of materialized views in inner join scenarios to improve the refresh performance of materialized views in some scenarios.
In V4.3.0, when you use a materialized view, you need to manually replace the access to the original table in the business script with a query on the materialized view. This introduces the cost of manual rewriting. V4.3.1 supports materialized view rewriting in some scenarios. If the system variable
QUERY_REWRITE_ENABLEDis set toTrue, you can specifyENABLE QUERY REWRITEwhen you create a materialized view to enable the automatic rewriting of queries on the materialized view. In this case, the system can rewrite a query on the original table to a query on the materialized view, thereby reducing the workload of business modifications. For more information about automatic rewriting of queries on materialized views, see Query rewriting on materialized views (MySQL-compatible mode) and Query rewriting on materialized views (Oracle-compatible mode).
Optimizations for data import and export
Incremental direct load (Experimental)
OceanBase Database V4.1.0 supports the direct load feature. This feature simplifies the execution path of data loading, skips modules such as SQL, transactions, and memtables, and directly persists data to an SSTable to significantly improve the data import efficiency. However, in scenarios where a table needs to be imported multiple times, the existing direct load feature requires rewriting all existing data in the table each time, which affects the performance of incremental imports. V4.3.1 optimizes incremental imports. In incremental direct load, you do not need to rewrite existing data. You only need to process new data. This way, multiple imports have the same high performance as the first import. You can specify whether to use incremental direct load in
LOAD DATAandINSERT INTO SELECTstatements by using the/*+ direct(need_sort, max_errors_allowed, load_mode)*/hint. If you do not specifyload_modeor specifyload_modeasfull, the system uses the original full direct load feature. If you specifyload_modeasinc_replace, the system uses incremental direct load. This feature is defined as an experimental feature in V4.3.1. It will be extended and evolved into a production-ready feature in later versions. For more information about incremental direct load, see the Incremental direct load section in Overview of direct load.Performance optimization for SELECT INTO OUTFILE
The existing SELECT INTO OUTFILE feature supports parallel reading of data from a table, but writes data to an external file in a serial manner. This causes a performance bottleneck in data export. V4.3.1 adds the parallel export capability. The single and max_file_size options are added to the SELECT INTO OUTFILE command to control how data is written to an external file. The single option specifies whether to export data to a single file or multiple files. If you specify a parallelism greater than 1 and set single to false, the system exports data to multiple files to achieve parallel reading and writing. The max_file_size option specifies the size of an exported file.
Index enhancements
Full-text index (Experimental) in MySQL-compatible mode
In a relational database, indexes are usually used to accelerate queries that match precise values. However, B-tree indexes cannot be used for fuzzy searches involving large amounts of text data. In this case, you can only perform fuzzy searches on each row of data by scanning the entire table. The performance of this approach is often not satisfactory when the text is large and the data volume is large. In addition, some complex query scenarios, such as approximate matching and relevance sorting, are difficult to support by rewriting SQL statements.
To solve the preceding problems, OceanBase Database V4.3.1 supports the full-text index feature. This feature preprocesses text data, creates keyword indexes, and improves the efficiency of full-text searches. This feature is defined as an experimental feature in V4.3.1. It will be extended and evolved into a production-ready feature in later versions. The following features are introduced in this version:
The full-text index feature is supported in MySQL-compatible mode and is compatible with the basic syntax of MySQL.
You can create a full-text index on a CHAR, VARCHAR, or TEXT column when you create a table.
The feature is applicable to partitioned tables.
You can create multiple full-text indexes on a primary table.
The feature supports three built-in tokenizers: SPACE, NGRAM, and BENG (basic English).
You can perform full-text searches on multiple columns by using one MATCH AGAINST clause.
The NATURAL LANGUAGE MODE is supported.
Partition management
Partition exchange
Over time, a table may accumulate a large amount of historical data. This data may not need to be frequently accessed, but must be retained to meet compliance or historical data analysis requirements. To improve query performance, businesses often need to distinguish between active and inactive data and archive the inactive data. SQL statements can be used to migrate data to a new table. However, this approach is not efficient for large data volumes. Therefore, OceanBase Database V4.3.1 introduces the partition exchange feature. This feature allows you to move data from table A to a partition of table B by modifying the definitions of partitions and tables in the data dictionary without physically copying data. This way, the data can be moved almost instantaneously, significantly improving the performance of data migration. The following features are introduced in this version:
- You can exchange data between a partition of a partitioned table and a non-partitioned table.
- You can create a partitioned table with a range (range columns) partition.
- You can create a partitioned table with a subpartition. The partition type of the primary partition has no requirements, and the subpartition type is range (range columns).
- You can specify the including indexes option. In this case, local indexes are also exchanged and remain available after the exchange.
- You can specify the without validation option. In this case, you must ensure that the data conforms to the partitioning key range.
Integration with external data sources
Partitioned external tables
OceanBase Database V4.2.0 supports external tables. However, external tables can be non-partitioned. In scenarios where a large number of files exist but only a portion of them needs to be scanned for a query, non-partitioned external tables can only scan all files, which results in poor performance. V4.3.1 introduces the partitioned external table feature. This feature supports list partitioning for external tables, similar to ordinary tables. It also provides two syntax options: automatic and manual partitioning. If you specify automatic partitioning, the system groups files into partitions based on the partitioning key. If you specify manual partitioning, you need to specify the subpaths of data files for each partition. In this case, the query on an external table can perform partition pruning based on the partitioning key to reduce the number of scanned files and significantly improve query performance.
Data types and storage optimization
MySQL JSON multi-value index (Experimental)
MySQL 8.0 introduced the multi-value index feature, which is suitable for JSON documents and other collection data types. This feature allows users to build indexes on arrays or sets to efficiently retrieve elements. OceanBase Database V4.3.1 is compatible with the JSON multi-value index feature in MySQL-compatible mode, enabling the creation of efficient secondary indexes on JSON array fields containing multiple elements. This enhances the query capabilities of complex JSON data structures while maintaining the flexibility of the data model and high query performance. This feature is defined as experimental in V4.3.1 and will be expanded and evolved into a production-ready feature in future versions. The following features are included in this version:
- Support for pre-creation of multi-value indexes and composite multi-value indexes;
- Support for unique and non-unique multi-value indexes;
- Support for creating multi-value indexes with JSON array elements of types INT, UINT, DOUBLE, FLOAT, and CHAR;
- Suitable for partitioned tables;
- Support for using multi-value indexes in the MEMBER_OF(), JSON_CONTAINS(), and JSON_OVERLAPS() functions.
MySQL JSON partial update
Some users store business data in JSON documents. When updating a document, the entire document must be read and then fully updated, which can be inefficient for large documents. V4.3.1 introduces the JSON partial update feature, allowing users to update only the specific fields of a JSON document using expressions like json_set, json_replace, or json_remove. This eliminates the need for full coverage updates, improving update performance. This feature requires the log_row_value_options parameter to be enabled.
Query execution and resource management
SQL temporary result compression
When the amount of data involved in SQL execution is excessively large, memory insufficiency can occur, requiring some operators to materialize temporary intermediate results. If the materialized data exceeds the available disk space, SQL execution will fail. V4.3.1 introduces the SQL temporary result compression feature, allowing tenants to configure the spill_compression_codec parameter or use SQL hints (e.g., /*+opt_param('spill_compression_codec', 'lz4') */) to specify whether to compress temporary results and the compression algorithm. Enabling temporary result compression reduces the disk space occupied by temporary results, supporting larger-scale query tasks.
V4.3.0 Beta
Version information
- Release date: March 22, 2024
- Version number: V4.3.0 Beta
New features
Columnar storage engine
In scenarios involving large-scale data complex analysis or ad-hoc queries on massive data, columnar storage is a key capability of an AP database. Columnar storage is a data file organization method that differs from row-based storage. It organizes table data by columns. When data is stored in a columnar format, analysis scenarios can scan only the columns required for query computations, avoiding full-row scans. This reduces I/O and memory resource usage and improves computational speed. Additionally, columnar storage naturally offers better conditions for data compression, making it easier to achieve high compression ratios and reducing storage space and network transmission bandwidth.
However, common columnar storage engines often assume minimal random updates, aiming to keep columnar data static. When faced with significant random data updates, performance issues are inevitable. The OceanBase LSM-Tree architecture addresses this by separately handling baseline and incremental data. This makes it ideal for scenarios with frequent random updates. Therefore, V4.3.0 extends support for the columnar storage engine within the existing architecture. This allows for integrated storage of both columnar and row-based data within a single codebase, architecture, and OBServer node, ensuring balanced performance for both TP and AP queries.
To facilitate AP business migration and ease the transition for existing customers, the new version has adapted and optimized multiple modules around the columnar storage engine, including the optimizer, executor, DDL, and transaction processing. This includes a new cost model based on columnar storage, a vectorized engine, enhanced query pushdown functionality, Skip Index, new columnar encoding algorithms, and adaptive compaction.
Users can flexibly configure tables as row-based, columnar, or hybrid (row-column redundant) based on their workload types. For more information about the columnar storage engine, see Columnar storage.
New vectorized engine
OceanBase has already implemented a vectorized engine based on the Uniform data description method in earlier versions, significantly improving performance compared to non-vectorized engines. However, in deep AP scenarios, there are still some performance limitations. V4.3.0 introduces the 2.0 version of the vectorized engine, which uses the Column data format description. This change avoids the memory usage, serialization, and read/write access overhead associated with ObDatum. Based on the data format description, the new version has also re-implemented a batch of commonly used operators and expressions, including over 10 operators such as HashJoin, AGGR, HashGroupBy, and Exchange(DTL Shuffle), as well as more than 20 MySQL expressions involving relational operations, logical operations, and arithmetic operations. In subsequent V4.3.x versions, the new vectorized engine will continue to be used to supplement and enhance the implementation of other operators and expressions to achieve optimal performance in AP scenarios.
Materialized views
V4.3.0 introduces the Materialized View feature. Materialized views are a key capability for supporting AP business. They precompute and store query results, reducing real-time computation to improve query performance and simplify complex query logic. They are commonly used in scenarios such as rapid report generation and data analysis.
Since materialized views require storing query result sets to optimize query performance, and there is a data dependency between materialized views and base tables, whenever data in the base table changes, the data in the materialized view must be updated to remain synchronized. Therefore, the new version introduces a materialized view refresh mechanism, including full refresh and incremental refresh strategies. Full refresh is a straightforward method where, each time a refresh operation is executed, the system re-executes the corresponding query statements of the materialized view, recalculates, and overwrites the original view result data. This approach is suitable for scenarios with relatively small data volumes. In contrast, incremental refresh only processes the data that has changed since the last refresh. To achieve precise incremental refresh, OceanBase implements a feature similar to Oracle's MLOG (Materialized View Log), tracking incremental updates in the base table through detailed logs. This ensures that materialized views can perform fast incremental refreshes. The incremental refresh method is particularly suitable for scenarios with large data volumes and frequent changes. For more information about materialized views, see Materialized views (MySQL-compatible mode) and Materialized views (Oracle-compatible mode).