What is OceanBase AP?
In database scenarios, transaction processing (TP) focuses on high-concurrency, strongly consistent online transactions, while analytical processing (AP) focuses on real-time analysis and complex queries of massive data. Together, they support enterprise data management and analysis needs. OceanBase has long maintained a technical lead in the TP domain, serving numerous core business applications with its self-developed distributed architecture, financial-grade high availability, and extreme elasticity. As real-time analysis demands have surged, OceanBase extended its capabilities from TP to AP in version V4.3. This includes introducing native columnar and row-store storage, and integrating them into a unified storage solution. Additionally, OceanBase implemented a vectorized execution engine at the execution layer and enhanced the optimizer with cost models and statistics for columnar storage, allowing automatic selection of access paths based on cost. These interconnected storage, execution, and optimization capabilities enable the same engine to efficiently handle both transactional workloads and real-time analysis, delivering integrated data management and real-time analysis value to enterprises.

Core features
Unified storage foundation supporting row-store, columnar, and hybrid storage: Users can flexibly specify row-store, columnar, or hybrid storage when creating tables to match different business types. Columnar tables in OceanBase use a baseline columnar storage approach for complex query optimization and incremental row-store for high-concurrency data writes.
Strong transaction and high-concurrency guarantees for real-time analysis: OceanBase continues to support distributed ACID and multi-replica strong consistency, ensuring data consistency in analytical scenarios. It supports smooth scaling and dynamic load balancing between nodes, with system performance linearly scaling with resource expansion. OceanBase also provides multi-dimensional resource isolation capabilities: isolation between tenants and within tenants via resource groups at the user, SQL, and task levels. In read-only columnar replica deployment, TP and AP traffic can be separated to different replicas, achieving physical strong isolation at the node level.
Vectorized execution engine for large-scale data analysis acceleration: The vectorized execution engine processes data in batches using efficient columnar data formats. It optimizes operators and expressions for batch iteration based on these formats. The storage layer aligns with this format, accelerating projection, predicate, and aggregation paths using SIMD and other methods. It also supports adaptive batch size adjustment based on workload. Compared to traditional row-based processing in the volcano model, analytical query performance can improve by about an order of magnitude.
Enterprise-grade query optimizer for real-time analysis performance: The optimizer is designed for HTAP and real-time analysis scenarios. It performs query rewriting and strategy selection in a larger plan space. It uses one-phase distributed plan generation, considering data distribution and parallelism when enumerating join orders and algorithms, avoiding the issue of single-machine optimal but distributed suboptimal plans. It automatically chooses between row-store and columnar access paths based on access characteristics and builds cost models and statistics for columnar scans to evaluate mechanisms like SkipIndex. For queries with high scan costs, auto parallelism (AUTO DOP) can be enabled to shorten response times. SQL Plan Management (SPM) manages plan evolution when data volume, statistics, or versions change, and gray-scale verification of real traffic suppresses plan rollback, ensuring stable operation.
Smart materialized views with multi-level precomputation and frequent real-time refreshes: Users define precomputation results using declarative SQL, and OceanBase automatically manages refresh mechanisms and table dependencies, eliminating the need for complex ETL or data pipelines. It supports real-time materialized views and allows scheduling refreshes based on target freshness (e.g., 30 seconds, 5 minutes). Automatic incremental view maintenance only performs incremental calculations for changes since the last refresh, reducing refresh costs while maintaining freshness. When a materialized view is available, the query optimizer automatically rewrites queries against the base table to read from the materialized view, accelerating complex queries without requiring SQL changes.
Multi-modal data types for AI-integrated analysis: OceanBase natively supports complex types like Array, Roaring Bitmap, Map, JSON, and Vector, and provides capabilities like JSON multi-value indexing, vector indexing, and full-text indexing to narrow scan scopes and accelerate retrieval. It supports typical analysis scenarios like tag analysis and audience targeting and meets AI-related requirements like knowledge retrieval and semantic search through unified storage and hybrid retrieval capabilities.
Seamless integration with open ecosystems to empower business innovation: OceanBase can access data from various external systems and collaborates with upstream and downstream tools. It supports real-time data ingestion and processing from streaming systems like Kafka and Flink. It supports migration and synchronization with existing databases or data warehouses using tools like OMS. It accesses various file and object storage formats through external tables and supports catalogs like Hive Metastore and Iceberg for unified metadata access. Its SQL layer is highly compatible with MySQL and Oracle, facilitating BI, ETL, and other analysis tool integrations. It integrates with scheduling systems like DolphinScheduler and Airflow and supports monitoring, visualization, and analysis tools like Prometheus, Grafana, Tableau, and QuickBI, supporting data chain governance and business insights.
Scenarios
Scenario 1: HTAP hybrid workload scenario
- Unified architecture: The same engine and data support both transactional and analytical workloads. You can choose between row-column hybrid storage and columnar storage replicas based on your business needs.
- Low-cost storage for massive data: Based on LSM-Tree and advanced compression encoding technology, the storage cost is reduced by 70%-90% compared to traditional solutions.
- High-concurrency computing: OceanBase's peer-to-peer architecture inherently supports parallel computing across multiple nodes, supporting up to PB-level capacity, providing a stable storage foundation for business data.
- Scenario and user isolation: Through underlying resource isolation technology and user resource groups, tasks from different scenarios and users are isolated.

Scenario 2: Real-time data analysis scenario
- Real-time data updates: Based on the LSM-Tree architecture, it supports efficient real-time writes. Incremental data is stored in rows, and baseline data is stored in columns. Regular or adaptive major compactions generate new columnar baseline data. Once data is written, it can be queried externally, ensuring real-time data availability.
- High accuracy and strong consistency: Multi-Paxos protocol ensures data consistency across replicas. MVCC model supports non-blocking reads and writes, ensuring transaction consistency in read operations. It supports strong reads from the primary replica and weak reads from other replicas. WAL mechanism ensures data persistence and atomicity.
- High-performance computing: Columnar storage technology, query processing on compressed data, and data pushdown enable high-performance data queries. Optimizer query rewriting and rule/ cost-based plan selection capabilities, combined with parallel execution engine and vectorized engine execution optimization, achieve high computational performance. Materialized views further support query and analysis of massive data.
- High availability: Inheriting the high availability capabilities of TP systems, RPO=0, RTO<8s. Supports flexible deployment from single IDC to three regions with five IDCs. Supports automatic disaster recovery. Cloud database supports single-replica architecture based on shared storage and independent log service, reducing costs while ensuring high availability.
- Smooth scaling: Supports smooth horizontal and vertical scaling without interrupting services. During horizontal scaling, the built-in data dynamic balancing mechanism ensures even distribution of data and service load across nodes.
- Multi-model integration: Fully supports various model types, including Btree indexes, JSON multi-value indexes, full-text indexes, and vector indexes.
- Separation of storage and computing: Supports multiple computing nodes accessing the same storage data. Combined with local persistent caching and object storage, it achieves a high-performance storage and computing separation architecture.


Scenario 3: PL/SQL batch processing scenario
- Extreme performance improvement, breaking through processing bottlenecks. OceanBase's columnar storage engine is designed for analytical scenarios. Data is stored in columns, achieving higher compression ratios and significantly reducing I/O. Combined with vectorized execution engine, CPU can perform calculations on a batch of data (not row by row) in memory, greatly improving CPU cache hit rate and computational efficiency. For typical batch processing tasks, it can achieve 10x or higher performance improvement, easily meeting or shortening the business batch processing window, enabling near-real-time analysis.
- Seamless migration with minimal business risk. OceanBase is highly compatible with Oracle, not only supporting common SQL syntax and data types, but also highly compatible with PL/SQL stored procedures. This means customers can migrate stored procedures with a lot of business logic from Oracle to OceanBase with minimal changes. Application layer code does not need to be modified, significantly reducing migration risks, costs, and time, making system modernization possible.
- Unified architecture for cost reduction and efficiency improvement. Traditional "OLTP database + data warehouse/big data platform" architecture requires maintaining two systems, with complex ETL/CDC synchronization, high costs, and complexity. OceanBase's unified architecture supports both online transactions and batch processing analysis with a single system. Its built-in resource isolation mechanism ensures that analytical tasks running on columnar storage do not conflict with transactional tasks on row storage. This simplifies the technology stack, reduces O&M complexity, avoids redundant data storage and cross-system synchronization, significantly lowering the total cost of ownership (TCO) for enterprises.

Technical architecture
- For more information about the technical architecture of OceanBase Database, see OceanBase system architecture.
- For more information about the technical principles of OceanBase Database, see OceanBase system principles.
