Blog编组 28
A Record High—Six Papers from OceanBase Selected for the 2025 ACM SIGMOD/PODS Conference

A Record High—Six Papers from OceanBase Selected for the 2025 ACM SIGMOD/PODS Conference

右侧logo

Recently, the 2025 ACM SIGMOD/PODS Conference, a leading international conference in the database field, was successfully held in Berlin, Germany. A total of 6 high-quality papers provided by OceanBase were accepted at the conference, setting a new record for the number of papers from OceanBase to be included at this conference.

The ACM SIGMOD (ACM Special Interest Group on Management of Data) is one of the time-honored and most authoritative international academic conferences in the database field, representing the highest level of database research worldwide. This inclusion not only reflects the ongoing dedication of OceanBase to research in database technology, but also signifies that the database's innovative achievements in core technologies and cutting-edge directions in distributed databases have received broad recognition from the international academic community.

The 6 selected papers cover a variety of categories, including Research Papers, Industry Papers, Tutorials, and Demo Papers. Their research themes focus on core challenges and frontier topics in database systems, such as large transaction processing, modern storage architectures, differential privacy, and query performance optimization. Among them, 2 papers are independently published by OceanBase, demonstrating the cutting-edge research capabilities of the team.

   

MaLT: A Framework for Managing Large Transactions in OceanBase (Independently Authored by OceanBase)

Authors: Chenguang Fang, Chen Qian, Qi Yang, Zeyu Wang, Zhenkun Yang, Fanyu Kong, Quanqing Xu, Hui Cao, Fusheng Han, and Chuanhui Yang.

Large transactions challenge the designs of the modern relational database systems. This paper presents MaLT, a framework designed to efficiently Manage Large Transactions within OceanBase system. The team introduce Transaction Context Table (TCT) and Transaction Data Table (TDT) to manage transaction states in the LSM-tree based storage engine, and further devise an efficient recovery mechanism to provide high availability of the databases after unexpected system failures. Unlike existing LSM-tree based RDBMSs that abstract LSM-trees as key-value stores, MaLT directly implements transactions into the LSM-tree and leverages its unique features. The backfill (i.e., in-row version number update upon commit) and undo operations for transactions are seamlessly integrated into the compaction stage of the LSM-tree. This enables efficient commit and abort, and in the meantime helps avoid the instant latency of recovering the system from uncommitted large transactions. Moreover, MaLT also embeds transaction information directly within the LSM-tree, facilitating various optimizations to improve both read and write performance. Finally, the experimental results demonstrate the effectiveness and the scalability of the approach implemented in OceanBase.

https://dl.acm.org/doi/10.1145/3722212.3724442

     

OLTP Engines on Modern Storage Architectures (ndependently Authored by OceanBase)

Authors: Daokun Hu, Quanqing Xu, and Chuanhui Yang.

Online transaction processing (OLTP) engines are crucial components of database systems, facing significant challenges due to the rapid growth of data on the Internet. In recent years, advancements in storage architecture, such as persistent memory, NVMe SSDs, and CXL, help alleviate these memory and I/O pressures by bridging the performance gap between DRAM and traditional block storage devices or efficiently expanding memory pools. These technologies are used to enhance and accelerate OLTP engines, with emerging storage hardware and protocols offering improved scalability and remote access. This tutorial provides an overview of modern OLTP engines leveraging cutting-edge storage solutions, exploring storage hierarchies, protocols, and programming models that offer insights for researchers and industry professionals. Additionally, it highlights the challenges and opportunities presented by emerging storage architectures for OLTP engines.

https://dl.acm.org/doi/10.1145/3722212.3725633

     

RM^2: Answer Counting Queries Efficiently under Shuffle Differential Privacy

Authors: Qiyao Luo, Jianzhe Yu, Wei Dong, Quanqing Xu, Chuanhui Yang, and Ke Yi.

Differential privacy (DP) is a leading standard for protecting individual privacy in data collection and analysis. This paper explores the shuffle model of DP, which balances privacy and utility by allowing users to send messages to a trusted shuffler before reaching an untrusted analyzer anonymously. The team focus on efficiently implementing the matrix mechanism in shuffle-DP, where efficiency is defined by the number of messages each user sends. The contributions include a baseline shuffle-DP mechanism that naively adapts the matrix mechanism, followed by an improved mechanism that reduces message complexity while maintaining error levels comparable to central-DP. The paper demonstrates the versatility of the approach across common query workloads, such as range queries and data cubes, achieving significant improvements in message efficiency. Experimental results confirm that this method outperforms the baseline solution while closely matching the accuracy of central-DP mechanisms.

https://dl.acm.org/doi/10.1145/3725415

     

Efficient and Accurate Differentially Private Cardinality Continual Releases

Authors: Dongdong Xie, Pinghui Wang, Quanqing Xu, Chuanhui Yang, and Rundong Li.

Accurately estimating the number of unique elements that appear in data streams in real time is a fundamental problem. Traditional sketch-based algorithms such as FM Sketch and HyperLogLog offer memory-friendly solutions for cardinality estimation but fall short in scenarios where the stream elements are privacy-sensitive and require differential privacy. This paper presents a novel cardinality estimation framework, FC, which ensures differential privacy under continual releases while simultaneously achieving low memory usage, high accuracy, and efficient computation. This approach innovatively leverages an efficient cardinality estimator and privacy-preserving mechanisms to overcome the limitations of existing methods. Comprehensive experiments demonstrate that this method reduces memory usage by up to 504 times compared to the best previous method while maintaining nearly the same accuracy. Additionally, under identical memory constraints, this method improves the estimation accuracy by orders of magnitude.

https://dl.acm.org/doi/10.1145/3725288

     

Mitigating the Impedance Mismatch between Prediction Query Execution and Database Engine

Authors: Chenyang Zhang, Junxiong Peng, Chen Xu, Quanqing Xu, and Chuanhui Yang.

Prediction queries that apply machine learning (ML) models to perform analysis on data stored in the database are prevalent with the advance of research. Current database systems introduce Python UDFs to express prediction queries and call ML frameworks for inference. However, the impedance mismatch between database engines and prediction query execution imposes a challenge for query performance. To mitigate the mismatch, this paper proposes to employ a prediction-aware operator in database engines, which leverages inference context reuse cache to achieve an automatic one-off inference context setup and batch-aware function invocation to ensure desirable batching inference. This paper implements a prototype system, called IMBridge, based on an open-source database OceanBase. The experiments show that IMBridge achieves a 71.4x speedup on average over OceanBase for prediction query execution and significantly outperforms other solutions.

https://dl.acm.org/doi/10.1145/3725326

     

A Query-Aware Enormous Database Generator For System Performance Evaluation

Authors: Xuhua Huang, Zirui Hu, Siyang Weng, Rong Zhang, Chengcheng Yang, Xuan Zhou, Weining Qian, Chuanhui Yang, Quanqing Xu.

In production, simulating the real application without exposing the privacy data is essential for database benchmarking or performance debugging. The complex data dependencies hidden behind queries make previous work suffer from critical deficiencies in supporting complex operators with high simulation accuracy. To fill the gap between the existing QAGs and the urgent demands, this paper implements a data generator Mirage with the attractive characteristics of reproducing applications based on the queries even with complex operators and having a theoretical zero error. Specifically, Mirage leverages Query Rewriting and Set Transforming Rules to decouple the implicit dependencies from queries, which greatly simplify the generation problem; it presents a uniform representation of various join types and formulates key population as a Constraint Programming (CP) problem, which can be well solved by an off-the-shelf CP Solver. In this demonstration, users can explore the core features of Mirage in generating synthetic databases, which has the widest support to operators and the best simulation fidelity compared to the related work.

https://dl.acm.org/doi/10.1145/3722212.3725076

oceanbase database

OceanBase has been making continuous breakthroughs at top international academic conferences, manifesting its profound accumulation and forward-thinking vision in foundational research and core technology development in the database field. Rooted in a strong foundation of research and development, OceanBase's research team is dedicated to tackling the core challenges in the database field, consistently driving technological innovation and practical applications in the industry. OceanBase focuses on turning technological breakthroughs into product capabilities, striving to meet global users' demands for high performance, high reliability, and high security of distributed databases.

In the future, OceanBase will continue to dedicate itself to core research in databases, deepen research in foundational technologies, and collaborate with partners in the global academic and industrial communities to jointly promote the innovation and development of database technologies.

ICON_SHARE
ICON_SHARE