From May 19 to 23, 2025, the 41st IEEE International Conference on Data Engineering (ICDE 2025), one of the top three international conferences in the database field, was successfully held in Hong Kong SAR, China. The conference attracted over 850 global scholars, industry experts, and technical professionals, focusing on cutting-edge technological breakthroughs and practical applications in the field of data engineering.
At the conference, OceanBase had 6 papers accepted, setting a new record for the highest number of papers accepted at a single top-tier conference in its history. Notably, one paper received the Best Industry and Application Paper Runner Up award, marking OceanBase's first ICDE award and highlighting the recognition of its technical value by both academia and industry.
In addition, OceanBase successfully hosted a technical Symposium at ICDE 2025 themed "Databases in the AI Era." The session featured prominent scholars and industry pioneers from the database and AI fields, who shared the latest academic advances and innovative practices in the intersection of Data and AI with attendees.
At ICDE 2025, OceanBase had a total of 6 papers accepted, including 2 papers with OceanBase as the first author institution and 4 papers co-authored in collaboration with universities. This marks a new record for the number of OceanBase papers accepted at a single international top-tier conference.
Among them, a paper titled "OceanBase Unitization: Building the Next Generation of Online Map Applications", co-authored by OceanBase as the first author institution along with research teams from Amap and Cornell University, was awarded the Best Industry and Application Paper Runner Up at ICDE 2025. This marks the first time OceanBase has received an award at ICDE, representing a new academic breakthrough for the company. The paper creatively proposes a unitized distributed database system architecture, which, through its unique architectural design and dynamic optimization strategies, has been successfully implemented in Amap. It effectively supports the system's requirements for high reliability, high concurrency, and low latency, offering an innovative industry model for deploying modern distributed databases in large-scale, complex scenarios such as online map applications.
Papers accepted:
● OceanBase Unitization: Building the Next Generation of Online Map Applications (Best Industry and Application Paper Runner Up)
Authors: Quanqing Xu (OceanBase, Ant Group), Wei Sun (AMap, Alibaba), Chuanhui Yang (OceanBase), Jinlong Liu (AMap, Alibaba), Ziyun Wei (Cornell University), Fusheng Han (OceanBase), Liang Wang (Amap, Alibaba), xiaowei zhai (AMap, Alibaba)
Distributed database systems are extensively utilized to provide cloud services for online map platforms, offering consistency, disaster recovery, and high performance, whereas traditional systems relying on singly-homed architecture face challenges in scaling for large-scale services. This paper proposes the architectural design of OceanBase, a distributed database system that "unitizes" services and operations into individual machines. The unitization approach migrates from a singly-homed to a multi-homed design across multiple regions. By leveraging this feature, OceanBase ensures data replication and seamless service handover when a machine goes offline. To validate the design, OceanBase was deployed on AMap, an online map application platform supporting large-scale distributed services. Through a series of experiments, OceanBase exhibits enhanced disaster tolerance capabilities and achieves improved performance for both write-intensive and read-intensive benchmarks.
https://www.computer.org/csdl/proceedings-article/icde/2025/360300e183/26FZCoh4LOU
● How to Answer Secure and Private SQL Queries?
Authors: Qiyao Luo (OceanBase), Quanqing Xu (OceanBase, Ant Group), Chuanhui Yang (OceanBase)
By incorporating advanced cryptographic and privacy techniques, secure and private query processing can defend against sophisticated cyber threats. This tutorial highlights the importance of integrating robust security and privacy measures into query processing to build trustworthy database systems. It reviews current systems and protocols that achieve these goals and discusses future directions for easy-to-use query processing under secure and private protection.
https://www.computer.org/csdl/proceedings-article/icde/2025/360300e486/26FZCFh1nMs
● Hounding Data Diversity: Towards Participant Selection in Vertical Federated Learning
Authors: Xiaokai Zhou (Wuhan University); Xiao Yan (Centre for Perceptual and Interactive Intelligence); Fangcheng Fu (Peking University); Xinyan Li (Wuhan University); Hao Huang (Wuhan University); Quanqing Xu (OceanBase, Ant Group); Chuanhui Yang (OceanBase); Bo Du (Wuhan University); Tieyun Qian(Wuhan University); Jiawei Jiang (Wuhan University)
This paper studies the participant selection problem (PSP) for vertical federated learning (VFL), and formulates PSP as choosing a set of participants that maximizes the likelihood of the data samples. Then, utilize the K-Nearest Neighbors (KNN) classifier as the proxy model, and adapt the Fagin's algorithm, a famous top-k query algorithm, to reduce the amount of encrypted communication. The VFPS-SM solution was deployed across five distributed nodes and conduct experiments with 10 datasets and 3 models to evaluate its performance. The results show that VFPS-SM can reduce the end-to-end running time by up to 35×, selection time 365× and improve model accuracy by 6.0% compared with state-of-the-art baselines.
https://www.computer.org/csdl/proceedings-article/icde/2025/360300c810/26FZBet83L2
● Efficient Structural Clustering over Hypergraphs
Authors: Dong Pan (Hunan University); Xu Zhou (Hunan university); Lingwei Li (Hunan University); Quanqing Xu (OceanBase, Ant Group); Chuanhui Yang (OceanBase); Chenhao Ma (The Chinese University of Hong Kong, Shenzhen); Kenli Li (Hunan University)
This paper proposes a new structural clustering model, HSCAN, specifically for hypergraphs, an Order-Index to accelerate fetching the key information of the HSCAN, a Lightweight Similarity Bucket Index to reduce the index cost, and an index-based sequential query algorithm with high performance and a parallel query algorithm to process large hypergraphs faster. Additionally, it provides the algorithms for constructing Order-Index and Lightweight Similarity Bucket Index. Extensive experiments on both real-world and synthetic datasets show that HSCAN performs better than existing models, and the two index-based query algorithms are up to three orders of magnitude faster than the existing algorithm.
https://www.computer.org/csdl/proceedings-article/icde/2025/360300d480/26FZBLZdp1m
● Query Weak Equivalence and Its Verification in Analytical Databases
Authors: Jinguo You (Kunming University of Science and Technology); Wanting Fu (Kunming University of Science and Technology); Yuxuan Wang (Kunming University of Science and Technology); Peilei He (Kunming University of Science and Technology); Kaiqi Liu (Kunming University of Science and Technology); Quanqing Xu (OceanBase, Ant Group )
This paper proposes weak equivalence for identifying queries that are not semantically equivalent but produce the same results under the read-mostly scenarios such as OLAP. Specifically, for posed queries, it extracts their filter condition expressions, which are then transformed into symbolic representations, namely first-order logic formulae. In terms of their partial order, i.e. containment relationship, the paper introduces Query Lattice, a novel structure that is constructed as a lattice which is partitioned into equivalence classes that are convex to answer queries if they are determined to belong to the classes. The equivalence class enables stored queries to respond to future unseen queries so that redundant generation of query plan and execution can be bypassed. Experimental evaluation of Query Lattice built on top of a prevailing open-source DBMS, PostgreSQL shows that the maximum improvement that Query Lattice can achieve is 44.95% over the original PostgreSQL, when running on the datasets of both TPC-H and TPC-H Skew benchmarks.
https://www.computer.org/csdl/proceedings-article/icde/2025/360300b882/26FZArK5goo
● Artemis: A Customizable Workload Generation Toolkit for Benchmarking Cardinality Estimation
Authors: Zirui Hu (East China Normal University); Rong Zhang (East China Normal University); Chengcheng Yang (East China Normal University); Xuan Zhou (East China Normal University); Quanqing Xu (OceanBase, Ant Group ); Chuanhui Yang (OceanBase)
Cardinality Estimation (CardEst) is crucial for query optimization. This paper introduces Artemis, a customizable workload generator, which can be used to generate various scenarios with the sensitive features for CardEst, including various data dependencies, complex SQL structures, and diverse cardinalities. It designs a PK-oriented deterministic data generation mechanism to plot various data characteristics; a search-based workload generation is proposed for composing queries with various complexities; it takes a constraint optimization-guided way to achieve a cost-effective cardinality calculation. In this demonstration, users can explore the core features of Artemis in generating workloads.
https://www.computer.org/csdl/proceedings-article/icde/2025/360300e628/26FZD0CCVji
On May 20, during the ICDE 2025 conference, OceanBase hosted a technical symposium titled "OceanBase Technical Symposium on Database in the Al Era". The event brought together leading experts and industry pioneers from the database and AI fields to share and discuss the latest advances at the intersection of Data and AI.
At the symposium, Jeffrey Xu Yu, Professor at the Chinese University of Hong Kong explored the cutting edge of AI and relational databases, sharing insights into how graph and neural networks are enabling the evolution and optimization of traditional relational data management systems in the AI era. Lei Chen, Chair Professor at the Hong Kong University of Science and Technology (Guangzhou), examined the dynamic evolution of vector databases in the era of large models, focusing on aspects such as vector embedding, indexing, and retrieval. Xiaochun Yang, Professor and Dean of the Software College at Northeastern University, provided an in-depth analysis of the technologies and recent progress in cross-modal retrieval, bridging the domains of databases and AI. Daniel Ling, General Manager of OneConnect Financial Technology (Hong Kong), shared real-world insights into how Chat BI based on DeepSeek is driving the integration of AI and business intelligence. Charlie Yang, CTO of OceanBase, focused on the data foundation for AI, sharing OceanBase's thinking and practices in building a unified distributed database system for SQL + AI workloadss.
In addition to the symposium, OceanBase also participated in paper presentations, panel discussions, and tutorial sessions throughout ICDE 2025. These academic engagements showcased the latest developments and practical applications in database technologies, offering fresh perspectives and innovative research directions for industry advancement.
In recent years, OceanBase has seen a steady rise in the number of papers published at top-tier international conferences. As of now, its publications have been cited 198 times. The success at ICDE 2025 not only marks a milestone in OceanBase's academic influence but also reflects its growing recognition in both the technological and industrial fields. Particularly in the field of Data and AI, OceanBase has been continuously exploring and innovating in the areas of AI4DB and DB4AI, driving the evolution of database technology into a core infrastructure for the intelligent era.
Looking ahead, OceanBase will continue to invest in collaborative innovation across academia and industry, promote the real-world application of academic achievements, and accelerate the digital transformation of enterprises in the AI era, delivering more efficient and reliable database services to users.