Overview of OceanBase Database AP|V4.6.0|OceanBase Database| docs|Distributed Database

Overview of OceanBase Database AP

Last Updated：2026-05-07 11:26:24 Updated

What is OceanBase AP

In database scenarios, transaction processing (TP) focuses on high-concurrency and strong-consistency online transactions, while analytical processing (AP) focuses on real-time analysis and complex queries of massive data. Together, they support enterprises' data management and analysis needs. OceanBase has long led the TP field with its self-developed distributed architecture, financial-grade high availability, and extreme elasticity, serving many core business applications. With the surge in enterprise-level real-time analysis demands, OceanBase extended its capabilities from TP to AP in V4.3. It introduced native columnar storage and integrated row and columnar storage, and equipped the execution layer with a vectorized engine. On the optimizer side, it enhanced the cost model and statistics for columnar storage, enabling automatic selection of row or column access paths based on cost. These storage, execution, and optimization capabilities work together, allowing the same engine to handle both transactional workloads and real-time analysis efficiently, delivering integrated data management and real-time analysis value to enterprises.

ap-intro

Learn about OceanBase AP's core capabilities and customer practices through this video.

Core features

Integrated storage foundation supporting row, column, and hybrid storage: Users can flexibly specify row, column, or hybrid storage when creating tables to match different business types. Columnar tables in OceanBase use a baseline columnar storage approach with incremental row storage. Baseline columnar storage optimizes complex query performance, while incremental row storage still supports high-concurrency data writes.
Strong transactions and high concurrency in real-time analysis: OceanBase continues to use distributed ACID and multi-replica strong consistency architecture, ensuring data strong consistency in analytical scenarios. It supports smooth scaling and dynamic load balancing between nodes, allowing system performance to scale linearly with resource expansion. It also provides multi-dimensional resource isolation: resources are isolated between tenants and within tenants using resource groups at the user, SQL, and task levels. In a read-only columnar replica deployment, TP and AP traffic can be separated to different replicas, achieving physical strong isolation at the node level.
Vectorized execution engine for faster large-scale data analysis: The vectorized execution engine processes data in batches, using efficient columnar data description formats. It optimizes operators and expressions for batch iteration based on these formats. The storage layer aligns with this format, accelerating projection, predicate, and aggregation paths using SIMD and other techniques. It also supports adaptive adjustment of batch size based on workload. Compared to traditional row-based processing using the volcano model, analytical query performance can improve by about an order of magnitude.
Enterprise-grade query optimizer for improved real-time analysis performance: The optimizer is designed for HTAP and real-time analysis. It performs query rewriting and strategy selection in a larger plan space. It uses one-phase distributed plan generation, considering data distribution and parallelism when enumerating join orders and algorithms, avoiding the issue of optimal performance in single-machine environments but suboptimal in distributed environments. It automatically chooses between row and column access paths based on access characteristics and establishes cost models and statistics for columnar scans to evaluate mechanisms like SkipIndex. For queries with high scan costs, automatic parallelism (AUTO DOP) can be enabled to shorten response times. SQL plan management (SPM) manages plan evolution when data volume, statistics, or versions change, and gray-scale testing with real traffic suppresses plan rollback, ensuring stable operation.
Smart materialized views with multi-level precomputation and frequent real-time refreshes: Users define precomputed results using declarative SQL, and OceanBase automatically manages refresh mechanisms and table dependencies, eliminating the need for complex ETL or data pipelines. It supports real-time materialized views, which can be scheduled to refresh based on target freshness (e.g., every 30 seconds or 5 minutes). Automatic incremental view maintenance only recalculates changes since the last refresh, reducing refresh costs while maintaining freshness. When a materialized view is available, the query optimizer automatically rewrites queries against the base table to read from the materialized view, accelerating complex queries without requiring SQL changes.
Multi-modal data types for AI fusion analysis: OceanBase natively supports complex types like Array, Roaring Bitmap, Map, JSON, and Vector, and provides capabilities like JSON multi-value indexing, vector indexing, and full-text indexing to narrow scan ranges and accelerate retrieval. It supports typical analysis scenarios like label analysis and audience targeting, and also meets AI-related needs like knowledge retrieval and semantic search through unified storage and hybrid retrieval capabilities.
Seamless integration with open ecosystems to enable business innovation: OceanBase can access data from various external systems and collaborate with upstream and downstream tools. It supports real-time data ingestion and processing from streaming systems like Kafka and Flink. It also supports migration and synchronization with existing databases or data warehouses using tools like OMS. It can access various file and object storage formats through external tables and supports catalogs like Hive Metastore and Iceberg for unified metadata access. The SQL layer is highly compatible with MySQL and Oracle, making it easy to integrate with BI, ETL, and other analysis tools. It also integrates with scheduling systems like DolphinScheduler and Airflow, and with monitoring, visualization, and analysis tools like Prometheus, Grafana, Tableau, and QuickBI, supporting data pipeline governance and business insights.

Scenarios

Scenario 1: HTAP mixed workload scenario

Integrated and simplified architecture: The same engine and data support both transactional and analytical workloads. You can choose between row-column hybrid storage and columnar storage replicas based on your business needs.
Low-cost massive storage: Based on the LSM-Tree and advanced compression encoding technology, the storage cost is reduced by 70%-90% compared with traditional solutions.
High-concurrency computing: The peer-to-peer architecture of OceanBase Database inherently supports parallel computing across multiple servers. It can support up to PB-level data storage, providing a stable storage foundation for your business's comprehensive data.
Multi-scenario isolation: Through underlying resource isolation technology and user resource group technology, tasks from different scenarios and users are isolated in terms of resources.

Scenario 2: Real-time data analysis scenario

Real-time data updates: Based on the LSM-Tree architecture, it supports efficient real-time writing. Incremental data is stored in rows, and baseline data is stored in columns. Periodic or adaptive major compactions are executed to generate new columnar baseline data. Once data is written, it can be queried externally, ensuring real-time data availability.
High accuracy and strong consistency: The Multi-Paxos protocol ensures data consistency among multiple replicas. The MVCC model supports non-blocking read/write operations and guarantees transactional consistency of read data. It also supports strong reads from primary replicas and weak reads from other replicas. The WAL mechanism ensures data persistence and atomicity.
High-performance computing: Columnar storage technology, compute pushdown, and query processing based on compressed data enable high-performance data queries. Optimized query rewriting and rule/ cost-based plan selection capabilities, combined with execution optimization using parallel execution engines and vectorized engines, achieve high computational performance. Materialized views further support query and analysis of massive data.
High availability: Inheriting the high availability capabilities of TP systems, it offers RPO=0 and RTO<8s. It supports flexible deployment from single IDC to three regions with five IDCs. It also supports automatic disaster recovery. Cloud Database OceanBase Database also supports single-replica deployment based on shared storage and independent log service, ensuring high availability while reducing costs.
Smooth scaling: You can smoothly scale the system horizontally or vertically without interrupting services. During horizontal scaling, the built-in data dynamic balancing mechanism ensures even distribution of data and service load across nodes.
Multi-model fusion: It fully supports various models, including B-tree indexes, JSON multi-value indexes, full-text indexes, and vector indexes.
Separation of storage and computing: It supports multiple computing nodes accessing the same storage data. Combined with local persistent caching and object storage, it achieves a high-performance storage and computing separation architecture.

Scenario 3: PL/SQL batch processing scenario

Extreme performance improvement, breaking through processing bottlenecks. The columnar storage engine of OceanBase Database is specifically designed for analytical scenarios. Data is stored in columns, resulting in higher compression ratios and reduced I/O. Combined with the vectorized execution engine, the CPU can perform calculations on a batch of data (instead of row by row) in memory, significantly improving CPU cache hit rates and computational efficiency. For typical batch processing tasks, it can achieve 10 times or even higher performance improvements, easily meeting or even shortening the business batch processing window, making quasi-real-time analysis possible.
Seamless and risk-free migration. OceanBase Database is highly compatible with Oracle, not only supporting common SQL syntax and data types but also PL/SQL stored procedures. This means that customers can almost migrate stored procedures with a large amount of business logic from Oracle to OceanBase Database without modifications. Application layer code does not need to be changed, significantly reducing migration risks, costs, and cycles, making system modernization, which was once considered impossible, feasible.
Integrated architecture for cost reduction and efficiency improvement. Traditional architectures separate OLTP databases from data warehouses or big data platforms, requiring maintenance of two systems and complex ETL/CDC synchronization, resulting in high costs and complexity. OceanBase Database's integrated architecture supports both online transaction processing and batch processing analysis with a single system. Its built-in resource isolation mechanism ensures that analytical tasks running on columnar storage do not conflict with transactional tasks on row-based storage. This simplifies the technology stack, reduces operational complexity, and avoids redundant data storage and cross-system synchronization, significantly lowering the total cost of ownership (TCO) for enterprises.

Technical architecture

For more information about the technical architecture of OceanBase Database, see OceanBase system architecture.
For more information about the technical principles of OceanBase Database, see the OceanBase system principles section.

OceanBase

Customer Stories

Documentation