Blog编组 28
OceanBase at VLDB 2024: Exploring the Future of Distributed Database

OceanBase at VLDB 2024: Exploring the Future of Distributed Database

右侧logo

The 50th International Conference on Very Large Databases (VLDB 2024), one of the leading international conferences in the database field, took place in Guangzhou from August 26th to August 30th. Drawing leading scholars from the global database community, VLDB offered a concentrated display of the cutting-edge research directions in databases and the latest industrial applications.

At VLDB 2024, OceanBase's two papers, "Replicated Write-Ahead Logging for Distributed Databases" and "Native Distributed Databases: Problems, Challenges, and Opportunities," were selected, garnering significant recognition from the international academic community.

During the workshop "Trends of Integration in Database Technology" at VLDB, OceanBase engaged in in-depth discussions on the trends of integration in distributed databases with globally renowned database scholars such as C.Mohan (HKBU & Tsinghua Distinguished Professor, US National Academy of Engineering member). Industry experts in the database field had in-depth exchanges on topics such as "integrated databases," "database privacy and security," and "AI4DB,"  exploring the future development of databases from both academic research and practical application perspectives.

During the Sponsor Talk, OceanBase CTO Charlie Yang presented the technical evolution of OceanBase from OLTP to All-in-one database. Additionally, OceanBase Lab Researcher Quanqing Xu delivered a keynote address at the Quality in Databases (QDB) workshop and participated in a panel discussion alongside Dr. Divesh Srivastava (VLDB Endowment Chair) from AT&T and Dr. Fatma Ozcan from Google.


oceanbase database

As distributed database technology matures, integration has emerged as a prominent focus in the future evolution of distributed databases. Integrated technology integrates diverse workloads, data models, and storage engines to enable unified management and real-time processing of extensive and heterogeneous data. This shift not only streamlines the technology stack of modern data architecture but also fosters close integration between data and business, significantly enhancing enterprises' digital transformation capabilities at the foundational data level.

At VLDB 2024, OceanBase hosted the Workshop: Trends of Integration in Database Technology, featuring HKBU & Tsinghua Distinguished Professor, US National Academy of Engineering member, C. Mohan,  CCF Fellow, Professor from East China Normal University (ECNU), Aoying Zhou, Professor Gao Cong from Nanyang Technological University (NTU), Professor Ke Yi from Hong Kong University of Science and Technology (HKUST), HKT Group IT Enterprise Architect Ivan Law, OceanBase Founder and Chief Scientist Zhenkun Yang, and OceanBase CTO Charlie Yang, along with other expert scholars and pioneers in the field of databases. The workshop comprehensively discussed the future development direction of integration in distributed databases, covering topics from theory to practice, academia to industry.


🎙️ C. Mohan: HKBU & Tsinghua Distinguished Professor, US National Academy of Engineering member

Professor C. Mohan offered global perspectives on the evolution of database trends and market development across hardware, software, and public policy. He reviewed the evolution of the global database market from OLTP to OLAP and the emergence of specialized database products. Additionally, he introduced the landscape of the global database market and provided forward-looking analysis of the trends in integrated data management systems such as HATP, AI4DB, DB4AI, LLMs (GenAI) + Databases, Privacy + Security, and support for multi modalities.

During his presentation, Professor C. Mohan emphasized that as market demands continue to evolve, data system architectures have become increasingly complex, leading to overlapping functionality and incompatible APIs, resulting in islands of data and specialized APIs which can be very hard to integrate. Looking ahead, with trends such as HTAP, integration of AI + database systems, support for multi modalities, databases will gradually move towards integration, enhancing user experience and empowering the digital transformation.


From Database to All-in Data Power Platform

🎙️ Aoying Zhou: Professor, CCF Fellow, East China Normal University (ECNU)

Professor Aoying Zhou highlighted in his presentation that in the Internet era, data serves as a new power in driving innovation and leading human being to digital civilization. The Internet era and paradigm shift have elevated the value of data, making it a crucial catalyst for economy evolution and digital civilization. Within this context, database, as the key Infrastructure of information society, enables data empowerment. And the data power platform, as the infrastructure for digital transformation, represents the popularization and civilianization of data technology. Professor Aoying Zhou provided profound insights into the development trends of data technology and opportunities for databases from the perspective of distributed/parallel DB, blockchain, cloud native, and open source, emphasizing the importance of "rich applications + database philosophy → new data technology/theory".

Professor Aoying Zhou emphasized that in the digital economy, data serves as the fifth productive factor. He expressed that the current era presents an optimal opportunity for the development of databases, and as technology advances and applications expand, databases are poised to evolve in a more integrated and intelligent direction.


AI for Databases: Foundations, Paradigms, and Open Problem

🎙️ Gao Cong: Professor, Nanyang Technological University (NTU)

Professor Gao Cong provided a comprehensive overview of the application of AI and machine learning technologies within the realm of databases, covering areas such as ML4DB foundation, data access methods, and database testing & admin. He shared cutting-edge research achievements related to AI and machine learning in databases, such as Query Representation Learning, ML-enhanced Indexes, data partitioning, Database Generation and Query rewriting, and further analyzed the potential opportunities and research directions in the future.

Professor Gao Cong emphasized that AI and machine learning have driven numerous innovative database technological research initiatives, while also highlighting the opportunities and challenges that lie ahead. These include integrating separate ML4DB components into DB systems, model efficiency, handling data & workload shifts, training data of high quality, foundation (pretrained) models for ML4DB tasks, and LLMs for ML4DB tasks.


Querying Private Data

🎙️ Ke Yi: Professor, Hong Kong University of Science and Technology (HKUST)

Professor Ke Yi underscored the significance of data security and privacy, from the perspectives of commercial interests of personal privacy and the protection of private data by policies and regulations. He stated that privacy is increasingly becoming a must in data management, leading to numerous technological innovations and prototype implementations. However, there remains a certain gap between the practical implementation of querying private data in commercial databases and theoretical research.

During the presentation, Professor Ke Yi introduced cutting-edge technology research and achievements in the field of data security and privacy, such as Differential Privacy (DP), Computing on Encrypted Data, Trusted Execution Environment (TEE), and Security Multi-party Computation (MPC). He also analyzed and interpreted future research directions of data privacy, pointing out that querying private data by SQL is important but under-studied. Additionally, integrating data security and privacy technologies into database systems to bridge the gap between theory and practice will become an important topic in the future.


Rethinking Distributed Database: from the Stand-alone Model to the Integrated Design

🎙️ Zhenkun Yang: OceanBase Founder and Chief Scientist

Professor Zhenkun Yang highlights that enhancing system performance and reducing overall costs for customers are the essential development directions of databases. He notes that cloud databases have become a key trend in the database management system market, offering a balance between cost and performance.

Based on the LSM-Tree architecture, OceanBase can optimize data compression and achieve efficient read and write operations, catering to both vertical and horizontal scalability requirements. With support for HTAP, OceanBase Dedicated enhances resource utilization through methods such as resource multiplexing, ensuring performance while reducing TCO, and meeting diverse customer needs.


Integrated Data Management in the Real World

🎙️ Ivan Law: HKT Group IT Enterprise Architect, Hong Kong Telecommunications (HKT) Limited

HKT is a leading telecommunications company based in Hong Kong. It offers a wide range of services, including fixed-line, mobile, broadband, and enterprise solutions. In response to the escalating demands for data management and ongoing technological advancements, HKT seeks a secure, scalable, and highly SQL-compatible database system with support for real-time migration and can be deployed on multiple infrastructures.

Ivan Law highlighted that OceanBase, with its support for hybrid deployments across multiple infrastructures, offers high availability and scalability, as well as real-time HTAP. Additionally, OceanBase's compatibility with SQL ensures seamless integration and real-time migration, facilitating HKT's data migration processes. In terms of security, OceanBase’s integrated architecture employs industry-leading encryption standards and role-based access controls, thereby safeguarding HKT's sensitive data. Through this innovative database architecture, HKT has maximized cost-effectiveness and can better leverage advanced technologies within a multi-infrastructure environment. This, in turn, enables HKT to enhance its telecommunications services for customers.


As distributed technology matures to address the expanding and varied application requirements of database users, the trend toward integration has become a paramount focus for distributed databases, including the integration of TP and AP, the integration of AI and databases, and the integration of security and privacy within distributed architecture.

Professor C. Mohan spearheaded the panel discussion that concluded the workshop, alongside other experts. OceanBase CTO Charlie Yang elucidated OceanBase's insights on the trends of integrated databases and introduced the underlying technical implementation of integration technologies in OceanBase database.

OceanBase, a native distributed database tailored for mission-critical workloads, has effectively tackled scalability, availability, and consistency challenges by leveraging the LSM-Tree architecture, Paxos protocol, and an innovative two-phase commit protocol. Notably, OceanBase has innovatively introduced the stand-alone and distributed integrated architecture, which not only facilitates flexible scaling but also ensures exceptional system performance. Additionally, it offers support for HTAP, multi-tenant, multi-model, and will future integrated with AI, catering to diverse workloads and application scenarios. With a commitment to delivering an integrated database system that addresses 80% of users' requirements, OceanBase remains at the forefront of integrated database technology development.

During the panel, experts and scholars delved into an engaging discussion encompassing topics including the integration of distributed databases and AI/ML technologies, database language integration, future development directions for database security and privacy, cloud databases, database hybrid infrastructure, and OceanBase's selection and practical considerations in real-world application scenarios.

oceanbase database


Integrated Databases: The Leading Solution for Enterprise-Level Data Processing Requirements

During the Sponsor Talk, OceanBase CTO Charlie Yang shared the technical evolution of OceanBase from OLTP to all-in-one database. OceanBase is a native distributed database that initially served as an OLTP database, which supports all critical business systems of Alipay and now has over 1000 external clients, an increasing number of which are applying it in mixed workload scenarios such as OLAP, NoSQL, and multi-model applications. Charlie Yang analyzed the technical challenges faced in transitioning from distributed OLTP to an all-in-one database, focusing on aspects such as columnar store, resource isolation, complex queries, and multi-model support in a distributed SQL database.

oceanbase database

The OceanBase research and development team has been continuously dedicated to core database technologies, expanding technical boundaries, and collaborating with the international academic community to build a mutually beneficial ecosystem. With the robust technical support and ongoing product innovation, OceanBase has emerged as a leading solution for addressing enterprise-level data processing needs.

Dr. Quanqing Xu, Researcher at the OceanBase Lab, was invited to give an academic presentation titled "Data Quality in OceanBase" at the International Workshop on Quality in Databases (QDB) 2024 and participated in a panel discussion with Dr. Divesh Srivastava from AT&T and Dr. Fatma Ozcan from Google.

Dr. Quanqing Xu provided an overview of the development of OceanBase and its key features and presented the research progress of OceanBase in data quality from the perspectives of data quality from multiple sources, data freshness, data cleansing, data assessment, and data quality improvement. Looking ahead, OceanBase will continue to focus on several aspects of data quality: First, how to efficiently and reasonably repair data in large-scale databases; Second, how to ensure the consistency of data from multiple sources; Third, how to further leverage machine learning to improve data quality and assess data value.

oceanbase database

Following this, Associate Professor Sourav S Bhowmick from Nanyang Technological University moderated a stimulating panel discussion in which Dr. Quanqing Xu engaged with Dr. Divesh Srivastava from AT&T and Dr. Fatma Ozcan from Google. During the discussion, they highlighted current primary data quality challenges, some of which have not received sufficient attention from the academic community. While academia has focused on certain aspects of data quality research, there remains a need to enhance the transfer of research achievements from academia to industry.

oceanbase database


2 Papers Accepted, OceanBase's Innovations Recognized by Leading Database Conference

OceanBase was honored with the selection of two papers at VLDB 2024. OceanBase Senior Engineer Bin Chen and OceanBase Lab Researcher Quanqing Xu presented the papers at the conference, reflecting acknowledgment of OceanBase's innovative capabilities from the leading database conference.

The first paper, "PALF: Replicated Write-Ahead Logging for Distributed Databases" introduces PALF, a Paxos-backed Append-only Log File System, to address the challenges of designing a replicated logging system as the foundation of a distributed database with the power of ACID transactions. PALF has been successfully applied in OceanBase 4.0 and subsequent versions, effectively supporting the high availability, reliability, and performance features of OceanBase, as well as critical functions such as physical standby databases and backup recovery.

oceanbase database

The second paper "Native Distributed Databases: Problems, Challenges and Opportunities" was co-authored by OceanBase and East China Normal University. The paper, which showcases the integration of academic and industrial perspectives using OceanBase as an example, reveals the main technical challenges and their solutions faced by current distributed databases in areas such as data replication and synchronization, consistency models, distributed transactions, and query processing.

oceanbase database

Following papers such as "OceanBase Paetica: A Hybrid Shared-nothing/Shared-everything Database for Supporting Single Machine and Distributed Cluster" and "OceanBase: A 707 Million tpmC Distributed Relational Database System", the acceptance of these two papers represents a significant milestone in OceanBase's academic standing. Through these papers, OceanBase shares the fundamental technical principles underpinning the distributed database that support mission-crital workloads, facilitating mutual inspiration between academia and industry and catalyzing the progress and innovation of database technology.


In recent years, OceanBase has achieved the successful publication of over 20 papers in leading international database conferences including SIGMOD, VLDB, ICDE, and related journals. With its robust technical expertise, OceanBase has cultivated partnerships with top research teams around the world, collectively propelling innovation, practical applications, and academic advancements of distributed database technology. OceanBase's technical capabilities and innovative accomplishments  have garnered recognition from the global academic community.

Moving forward, OceanBase is committed to amplifying its investment in fundamental research and engineering development, exploring the integration of distributed databases with broader domains such as big data and AI. Furthermore, OceanBase endeavors to disseminate the latest technological advancements and innovative pathways to global technology developers, thereby assisting clients in building modern data architectures and delivering value for the development of database technology and industry.

ICON_SHARE
ICON_SHARE