The release of OceanBase Database V4.3.0 brings a major update, focusing on optimizing analytical processing (AP) capabilities and integrating transaction processing (TP) with AP. This update leverages the log-structured merge-tree (LSM-tree) architecture of OceanBase Database to combine row-based and columnar storage, introducing a new vectorized engine and a cost evaluation model based on columnar storage. These enhancements significantly boost the efficiency of processing wide tables, improve query performance in AP scenarios, and cater to TP business requirements.
In this version, OceanBase Database introduces materialized views compatible with Oracle, allowing for precomputing and storage of view results to enhance real-time query efficiency, facilitating quick report generation and data analysis. The update also enhances online DDL and adds support for tenant cloning. Optimizations have been implemented to enhance the performance of parallel DML (PDML) and large object (LOB) data import processes, as well as node restart procedures, leading to an overall increase in system efficiency. Furthermore, the version introduces support for backup and restore based on S3, optimizing system resource utilization. It also includes features like index usage monitoring and client-side local data import to improve system usability. Overall, the new OceanBase Database version is well-suited for mixed workload scenarios involving complex analytics, real-time reporting, real-time data warehousing, and online transactions.
OLAP capabilities
Columnar engine
In scenarios involving large-scale data analytics or extensive ad-hoc queries, columnar storage stands out as a crucial feature of an AP database. With columnar storage, the system can efficiently scan only the necessary column data for query processing, avoiding the need to scan entire rows. This targeted approach reduces resource usage, such as I/O and memory, leading to improved processing speed. Moreover, columnar storage naturally provides better data compression conditions, making it easier to achieve higher compression ratios, thereby reducing storage space and network transmission bandwidth.
The LSM-tree architecture of OceanBase Database can handle baseline and incremental data separately. In V4.3.0, OceanBase Database supports a columnar engine built on this architecture, achieving integrated row- and column-based data storage to optimize both TP and AP query performance.
To facilitate AP business migration and ensure a smooth transition for existing customers to the new version, various modules related to the columnar engine have been optimized, including the optimizer, executor, DDL processing, and transaction processing. These optimizations introduce a new cost model and vectorized engine based on columnar storage, enhancements to the query pushdown feature, features like skip index, a new column-based encoding algorithm, and adaptive compactions. In practical AP business scenarios, users can flexibly configure tables in their systems as rowstore tables, columnstore tables, or hybrid rowstore-columnstore tables based on their specific workload requirements.
For more information, see Columnar storage.
Bypass import
OceanBase Database has supported bypass import since V4.2, allowing you to insert data directly into data files. Bypass import skips the SQL layer interface, directly allocates space in data files, and inserts data, thereby improving the data import efficiency. With the release of V4.3.0, columnstore tables now also support bypass data import.
For more information, see Overview.
New vectorized engine
In V4.3.0, OceanBase Database introduces a new vectorized engine that uses column data format descriptions. It also reimplements more than 10 commonly used operators such as HashJoin, AGGR, HashGroupBy, and Exchange (DTL Shuffle), as well as over 20 MySQL expressions including relational operations, logical operations, and arithmetic operations.
- Memory format optimization: The new memory format provides a columnar data structure, reducing memory usage for fixed-length data. This optimization not only boosts data access efficiency but also enables calculation acceleration through the Single Instruction, Multiple Data (SIMD) technology.
- Operator performance optimization: Operators such as Sort, HashJoin, HashGroupBy, Data Shuffle, and AGGR are redesigned and reimplemented based on the new format to comprehensively improve the performance.
Based on the new vectorized engine, OceanBase Database significantly improves the query performance in AP scenarios.
Materialized views
The materialized view feature is introduced since OceanBase Database V4.3.0. Materialized views are a key feature for AP business scenarios. By precomputing and storing the query results of views, real-time calculations are reduced to improve query performance and simplify complex query logic. Materialized views are commonly used for rapid report generation and data analysis scenarios.
In OceanBase Database V4.3.0, a materialized view refresh mechanism is introduced, supporting two strategies: complete refresh and incremental refresh.
Complete refresh is a more direct approach where each time the refresh operation is executed, the system will re-execute the query statement corresponding to the materialized view, completely calculate and overwrite the original view result data. This method is suitable for scenarios with relatively small data volumes.
In contrast, incremental refresh only processes data that has changed since the last refresh. To ensure precise incremental refreshes, OceanBase Database has implemented a materialized view log mechanism similar to Oracle's Materialized View Log (MLOG), which tracks and records incremental update data of the base table in detail through logs, ensuring that the materialized view can be quickly incrementally refreshed. Incremental refresh is suitable for business scenarios with substantial data volumes and frequent data changes.
For more information, see Materialized views.
High-performance kernel
Adaptive cost model
OceanBase Database V4.3.0 optimizes the implementation of the cost model. The cost model can use the DBMS_STATS package to collect or set system statistics parameters to adapt to the hardware environment. The DBA_OB_AUX_STATISTICS view is provided to display the system statistics parameters of the current tenant.
For more information, see Query rewrite.
Tenant cloning
OceanBase Database V4.3.0 supports tenant cloning. You can use this feature to quickly clone a specified tenant. The new tenant and the source tenant share physical macroblocks in the initial state. However, new data changes are isolated between the tenants to ensure that changes in the new tenant do not affect the source tenant.
You can execute
CREATE TENANT new_tenant_name FROM source_tenant_name WITH RESOURCE_POOL [=] resource_pool_name, UNIT [=] unit_configin the sys tenant to quickly clone a specified tenant.After a tenant cloning job is completed, the newly cloned tenant is a standby tenant.
You can execute
ALTER SYSTEM ACTIVATE STANDBY TENANT new_tenant_nameto convert it to the primary tenant to provide services. The new tenant and the source tenant share physical macroblocks in the initial state, but new data changes and resource usage are isolated between the tenants.
You can clone an online tenant for temporary data analysis with high resource consumption or other high-risk operations to avoid risking the online tenant. In addition, you can also clone a tenant for disaster recovery. When irrecoverable misoperations are performed in the source tenant, you can use the new tenant for data rollback.
For more information, see Clone a tenant.
High availability enhancements
Support for S3 as the backup and restore media
OceanBase Database of an earlier version supports two types of storage media for backup and restore: file storage such as Network File System (NFS), and object storage such as Alibaba Cloud Object Storage Service (OSS) and Tencent Cloud Object Storage (COS). OceanBase Database V4.3.0 supports Amazon S3 as the log archive and data backup destination. You can also use backup data on S3 for physical restore.
Ease-of-use enhancements
Index usage monitoring
OceanBase Database V4.3.0 provides the index usage monitoring feature. You can choose to enable index usage monitoring and set a sampling method. For a user tenant, the system will record the index usage information that meets the specified rules to the memory and refresh the information to the internal table every 15 minutes. You can query the DBA_INDEX_USAGE view for the index usage information. Based on the queried information, you can determine whether indexes on tables are referenced and choose to drop useless index tables to release the space.
For more information, see Monitor indexes.