Data subscription|V4.4.2|OceanBase Database| docs|Distributed Database

Data subscription

Last Updated：2026-04-02 06:23:56 Updated

Data subscription is a technology for continuously capturing incremental changes from a data source. The core of this technology is to use Change Data Capture (CDC) to real-time or near real-time transmit data changes (additions, updates, and deletions) to a target system. In the OceanBase ecosystem, data subscription is primarily used in incremental database migration scenarios to ensure data consistency between the source database and the target OceanBase database.

Scenarios

Data subscription has the following characteristics:

Low latency synchronization: ensures data consistency between the source and target systems, with latency typically in seconds or milliseconds
Resource efficiency: only transmits differential data, significantly reducing network bandwidth and storage resource consumption
Flexible scalability: supports parallel subscriptions to multiple target systems (such as data warehouses, analytics platforms, and caching systems)
Strong fault tolerance: includes fault recovery and resume-from-breakpoint capabilities, ensuring data is not lost

Typical scenarios include:

Real-time data synchronization
- Business system integration: real-time data synchronization from order systems to BI platforms, supporting real-time reporting and decision analysis
- Cache updates: real-time synchronization of database changes to caching systems such as Redis and Memcached
- Search engine synchronization: real-time synchronization of database changes to search engines such as Elasticsearch and Solr
Data architecture upgrades
- Database migration: migration from traditional databases to OceanBase with zero downtime switching
- Modernization of architecture: splitting monolithic databases into distributed architectures for read/write separation
- Cloud-native transformation: migration of local databases to OceanBase instances in the cloud
Data governance and analysis
- Data lake construction: real-time synchronization of business data to data lakes for offline analysis
- Real-time data warehouse: construction of real-time data warehouses for streaming analysis and machine learning
- Multi-active architecture: implementation of multi-active databases across regions to improve system availability

Core tools and capabilities comparison

Data subscription in the OceanBase ecosystem primarily involves three core tools: self-developed migration tools (such as OMS), external migration tools (such as Flink CDC and DataX), and message middleware (such as Kafka). These tools are suitable for different data subscription scenarios. Below, we will select representative tools and introduce their characteristics, applicable scenarios, and technical advantages.

Self-developed migration tool: OMS

OMS (OceanBase Migration Service) is an enterprise-level data migration and subscription service provided by OceanBase, specifically designed for the OceanBase database ecosystem. It offers a one-stop migration solution from traditional databases to OceanBase.

Core capabilities

High-performance full and incremental migration: supports rapid migration of TB-level data, with full migration based on logical or physical backups and incremental migration through log parsing (such as MySQL Binlog and Oracle Archive Log)
Zero-downtime migration: supports smooth switching without interrupting business operations, ensuring business continuity
Multi-database compatibility: natively supports migration from mainstream databases such as MySQL, Oracle, PostgreSQL, and DB2 without additional adaptation work
Visual monitoring: provides real-time monitoring of migration progress, latency, and error alerts, supporting end-to-end visual management of migration tasks

Technical advantages

Native optimization for OceanBase: deeply optimized for OceanBase's distributed architecture and storage engine, resulting in significantly higher migration efficiency compared to general tools
Low intrusion: only requires reading database logs, without modifying source database configurations, minimizing impact on the source system
High availability: supports multi-node deployment with automatic failover capabilities, ensuring high availability of migration services
Data consistency: supports distributed transactions, ensuring data consistency and integrity, meeting enterprise-level data quality requirements

Applicable scenarios

Enterprise-level database migration projects, especially from traditional databases to OceanBase
Business scenarios with strict requirements for data consistency and availability
Large-scale data migration and real-time synchronization needs, such as upgrading core business systems

External migration tool: Flink CDC

Flink CDC is a distributed stream processing engine based on Apache Flink, focused on real-time data subscription and streaming computation. It directly reads database logs through CDC connectors to achieve end-to-end real-time data processing.

Core capabilities

End-to-end Exactly-Once consistency: ensures data accuracy during transmission and computation, avoiding data duplication or loss
Flexible data transformation: supports complex business logic, including field mapping, data cleaning, and aggregation calculations
Multi-source support: supports various data sources such as MySQL, Oracle, PostgreSQL, and MongoDB through CDC connectors
Unified stream and batch processing: processes real-time stream data and batch data together, simplifying the data processing architecture

Technical advantages

High-performance computing: supports large-scale parallel processing with throughput reaching millions of TPS, meeting high-concurrency data processing needs
State management: built-in state storage for complex state calculations and window operations, such as session analysis and real-time aggregation
Fault tolerance mechanism: based on Checkpoint for fault recovery, ensuring data consistency after failures
Rich ecosystem: seamlessly integrates with big data components such as Kafka, Hive, and Elasticsearch, building a complete data processing ecosystem

Applicable scenarios

Real-time data analysis and streaming computation, such as real-time reporting and real-time risk control
Complex data transformation and cleaning needs, such as multi-source data integration and data standardization
Multi-source data integration and real-time data warehouse construction, such as real-time data lakes and streaming data warehouses

Message middleware: Kafka

Kafka is a distributed stream processing platform that serves as an intermediary and buffer layer in data subscription architectures, connecting data producers and consumers.

Core capabilities

High-throughput message transmission: supports up to a million TPS, meeting large-scale data stream processing needs
Persistent storage: data is persisted to disk, supporting Exactly-Once semantics to ensure data is not lost
Multi-consumer subscription: supports multiple consumers subscribing to the same topic in parallel, enabling data reuse
Partitioning and replication: supports horizontal scaling and high availability deployments, meeting large-scale cluster requirements

Technical advantages

Decoupled architecture: decouples data producers and consumers, enhancing system flexibility and enabling independent scaling and maintenance
Buffering capability: handles fluctuations and peak traffic in downstream systems, providing data buffering to smooth traffic fluctuations
Multi-target distribution: a single piece of data can be subscribed to by multiple consumers, enabling data reuse and reducing data acquisition costs
Horizontal scaling: supports cluster horizontal scaling, meeting large-scale data processing needs with good scalability

Applicable scenarios

As an intermediate layer for CDC tools, temporarily storing change data, such as data buffering from OMS to the target system
Building real-time data pipelines to connect different data processing components, such as data streams from databases to analytics systems
Data buffering and traffic smoothing to handle business peak hours and system fluctuations

Tool selection recommendations

Select based on data source type: Relational databases to OceanBase

Recommended tool: OMS

Applicable scenarios: Migration from databases such as MySQL, Oracle, and PostgreSQL to OceanBase

Core advantages:

Native optimization: deeply optimized for OceanBase's distributed architecture and storage engine, resulting in significantly higher migration efficiency compared to general tools
Visual interface: provides real-time monitoring of migration progress, latency, and error alerts, reducing O&M complexity
Enterprise-level guarantees: supports automated disaster recovery and rollback, providing enterprise-level SLA guarantees
Low intrusion: only requires reading database logs, without modifying source database configurations

Select based on business scenarios

Real-time data synchronization scenarios

Recommended tool combination: OMS + Kafka

Typical applications:

Real-time data synchronization between business systems
Real-time updates to caching systems (Redis, Memcached)
Real-time index updates in search engines (Elasticsearch, Solr)
Real-time data pushing to message queues

Applicable scenarios:

Financial and e-commerce enterprises with high requirements for data consistency
Online business systems requiring real-time data synchronization
Small and medium-sized enterprises sensitive to O&M costs

Real-time analysis scenarios

Recommended tool combination: Flink CDC + Kafka + analytics system

Typical applications:

Real-time data analysis and report generation
Streaming machine learning and AI applications
Real-time risk control and monitoring systems
Real-time recommendations and personalized services

Applicable scenarios:

Internet enterprises requiring real-time data analysis
Scenarios with extremely high requirements for data processing performance
Enterprises with big data technology teams

Data lake construction scenarios

Recommended tool combination: Flink CDC + Kafka + data lake

Typical applications:

Real-time data lake construction
Multi-source data integration and unified management
Real-time data warehouse construction
Data governance and analysis platforms

Applicable scenarios:

Large enterprises needing to build a data middle platform
Multi-line-of-business data integration needs
Enterprises with high requirements for data governance

OceanBase

Customer Stories

Documentation

Data subscription

Scenarios

Core tools and capabilities comparison

Self-developed migration tool: OMS

Core capabilities

Technical advantages

Applicable scenarios

External migration tool: Flink CDC

Core capabilities

Technical advantages

Applicable scenarios

Message middleware: Kafka

Core capabilities

Technical advantages

Applicable scenarios

Tool selection recommendations

Select based on data source type: Relational databases to OceanBase

Select based on business scenarios

Real-time data synchronization scenarios

Real-time analysis scenarios

Data lake construction scenarios