Blog

Building a Real-Time Data Warehouse with Flink and OceanBase

COMMUNITY

Building a Real-Time Data Warehouse with Flink and OceanBase

OceanBase

2024-12-25

Community

Oceanbase|Htap

OceanBase is the leading distributed database for hybrid workloads, with data volumes ranging from hundreds of GB to 1PB. It is developed by Ant Group since 2010, and has been extensively validated and benchmarked against local and international standards for 14 years with best-in-class, reliable disaster recovery. OceanBase is deployable on-premise, Cloud SaaS, and multi-cloud on AWS, Google Cloud, Alibaba Cloud, etc. Currently, it serves more than 2,000 companies around the world, including Alipay, Starbucks, Trip.com, GCash, DANA, Touch 'n Go, Haidilao, Taobao, ICBC, VIVO, etc.

oceanbase database

In 2020 and 2021, OceanBase set the official world record of TPC-C, achieving 20 times tpmC of Oracle. In 2021, OceanBase set the official world record of TPC-H, achieving 30TB data size for analytics. OceanBase also receives recognition from Gartner, Forrester, IDC, and has published over 20 papers at top global database conferences ACM SIGMOD, VLDB and IEEE ICDE.

oceanbase database

At Alipay, a single OceanBase cluster contains over 1,000 nodes, stores 6 petabytes of data, and the size of a single table exceeds 690 billion rows. At this data scale, OceanBase's disaster recovery capabilities can ensure that the RPO (recovery point objective) is 0, the RTO (recovery time objective) is less than 8 seconds, and the peek performance is 544,000 TPS (transactions per second) and 61,000,000 QPS (queries per second).

oceanbase database

DANA, the #1 e-wallet and payment provider in Indonesia, has a success story with OceanBase in mission-critical workloads. With Oceanbase, DANA has achieved seamless scaling at peak performance, with zero downtime and zero data loss, and the number of users has increased from 20 million to 200 million. OceanBase also simplifies the transition to 100% cloud and hybrid cloud environments.

oceanbase database

GCash, the #1 finance super app in the Philippines, empowers innovation and growth with OceanBase. They used 10+ OceanBase clusters to consolidate hundreds of instances, saving 40% of database resources and 70% of data storage space, while supporting 5 times the number of MySQL connections and having high availability capabilities to ensure RPO=0.

OceanBase gradually evolved from distributed OLTP to an all-in-one converged database, now the storage engine, transaction engine and sql engine of OceanBase are all all-in-one versions, which means all these components can suit both OLTP and OLAP. Besides, OceanBase also supports multi-model features (KV, AI and Document) and has an all-in-one multi-cloud infrastructure. Compared with using two clusters for OLTP and OLAP respectively, using OceanBase as a unified system for both OLTP and OLAP has lower costs, better performance (millisecond-level latency), and better data consistency.

oceanbase database

Haidilao Hotpot embraces HTAP real-time analysis by using Oceanbase to build the next-generation inventory management system for real-time intelligent recommendation. With only one OceanBase database, the system achieved a 45% analytics performance boost and a 50% saving in the total cost of database ownership.

In addition to table data (row storage and column storage), OceanBase also supports semi-structured data such as JSON, GIS, Document and Vector, so users can easily build AI applications based on OceanBase.

oceanbase database

Compared with analysis systems based on HBase, using OceanBase to build a real-time analysis system can reduce device resource costs and achieve better performance. At the same time, using Flink CDC can save the development and maintenance costs of the HBase CDC tool.

oceanbase database

Based on the capabilities of OceanBase and Flink, we can go a step further to remove Kafka from the above architecture, and use OceanBase as the ODS, DWD, DWS, and ADS. This architecture has fewer components, which can further reduce costs and achieve better performance.

oceanbase database

We have completed the adaptation of dialect and catalog features on Flink JDBC Connector for OceanBase. Currently, users can directly use the latest Flink JDBC Connector to manage OceanBase data.

To make it easier for users to migrate to OceanBase, we developed a new Flink Sink Connector for OceanBase based on the Flink JDBC Connector. In addition to the basic JDBC sink capabilities, it also implements features such as direct load, multi-table sink, and DDL synchronization.

We are also actively participating in the construction of the Flink CDC community and integrating the above-mentioned sink connector into Flink CDC 3.0 to implement a simple and easy-to-use end-to-end data synchronization solution to migrate to OceanBase.

oceanbase database

In addition to the above functions, we are also developing a Flink command line tool that will support a wider range of data sources, such as Flink CDC Source, Flink JDBC Source, Flink Kafka Source, etc. Users can easily start a data migration task based on it.

Oceanbase Htap

OceanBase

Technology Trend | Product Interpretation | User Practice

Content