Blog编组 28
Maximize Efficiency and Savings with OceanBase Dedicated

Maximize Efficiency and Savings with OceanBase Dedicated

右侧logo

In today's business environment, it is essential for enterprises to leverage digital transformation for cost reduction, efficiency improvement, and innovation.  Migrating databases to the cloud infrastructures is a key step towards these targets.

OceanBase believes that as the core infrastructure of digitization, cloud databases should deliver sustainable cost reduction and efficiency enhancement to empower enterprise users in driving long-term business innovation and growth. In this article, we will share our thoughts on this subject and introduce OceanBase's solutions in the context of database cloudization.


How to achieve up to 30% TCO savings?

Before getting started, it's important to define the concepts of "cost" and "efficiency". While discussions on reducing computing and storage costs are common, the enhancement of work efficiency for developers and DBAs may not always be explicitly addressed. OceanBase values both aspects equally.


Reducing computing costs: Achieving stronger performance with fewer computing resources

Native Distributed Architecture: All computing nodes capable of providing read and write services, with no standby nodes

When considering the deployment mode of database products, they can be categorized into centralized and distributed databases.  Centralized cloud databases typically utilize a primary/standby deployment architecture, where the primary node provides read and write services, leaving the standby node idle. This can result in inefficient resource allocation. If we require standby nodes to provide services, a read/write separation scheme is generally applied, which leads to a dilemma of consistency: weak consistency may be intolerable for some businesses, while strong consistency can impact query performance.

In the case of scaling a centralized database, vertical scaling is usually adopted. The performance limit of an instance is limited to the hardware of the physical server. If horizontal scaling is required, only the read capacity can be expanded, and the read nodes are also limited by the scale of the cluster, with the number of the database instances generally in the single-digit range.

In contrast to centralized databases, OceanBase distributed database can provide users with the ability to write across multiple nodes. Business systems can fully leverage the computing resources of all nodes in the OceanBase cluster, and scale horizontally as the cluster grows. In TPC-C performance testing, OceanBase's cluster size reached 1,554 nodes.


Stand-alone and distributed integrated architecture: Achieving stronger computing performance with fewer computing resources

In August 2022,  OceanBase introduced the stand-alone and distributed integrated architecture. This architecture uses dynamic log streams to ensure that a single server has a fixed number of log streams, reducing the overhead of the distributed architecture. It also supports the dynamic migration of log streams and enables dynamic horizontal scaling.

The stand-alone and distributed integrated architecture has enabled the OceanBase engine to achieve two breakthroughs: catering to smaller deployment specifications to meet the needs of small and medium-sized enterprises, with OceanBase 4.0's minimum deployment specification at 4C16G, which is set to be further reduced in the future. Furthermore, the architecture also delivers stronger performance. In the same hardware environment, the OLTP performance of OceanBase Community Edition 4.0 is 1.9 times that of MySQL Enterprise Edition 8.0 (based on Sysbench benchmark results), and the OLAP performance is 5 to 6 times that of Greenplum 6.22.1 (based on the TPC-H 100GB benchmark test).


HTAP: Supporting high-performance TP and AP hybrid workloads with a single set of data

OceanBase believes that true Hybrid Transactional/Analytical Processing (HTAP) demands exceptional OLTP performance, followed by real-time analytics support on top of OLTP.

In a traditional OLAP architecture, TP workloads are stored in a centralized or distributed database, periodically or continuously synchronized to a heterogeneous storage using ETL, and then aggregated based on business analysis dimensions. This approach raises three problems: cost, latency, and complexity. It leads to additional AP computing and storage costs, synchronization delays, and errors that can cause online failures in production systems. To address these issues, an alternative approach is to concurrently handle TP and real-time AP using the same set of data, reducing the physical costs of AP databases, simplifying architectural complexity caused by data synchronization, and streamlining applications.

To tackle this, OceanBase implements a hybrid row-column storage solution based on LSM-Tree, striking a balance between OLTP and OLAP performance.

oceanbase database

Furthermore, OceanBase also supports resource isolation for OLTP and OLAP, involving both physical isolation between multiple replicas and logical isolation of CPU, network, disk, and memory within the replicas. OceanBase provides intelligent load engines and resource control engines. The former predicts the execution cost of SQL and prioritizes TP businesses, while the latter allows businesses to flexibly choose and isolate CPU and other resources based on different dimensions, achieving a more reliable load control.

In OceanBase 3.x, OceanBase has already implemented optimization engines, stand-alone execution engines, parallel execution engines, and vectorized execution engines. In May 2021, OceanBase ranked first in the TPC-H benchmark test with 15.26 million QphH@30,000GB.

This benchmark test fully demonstrated OceanBase's distributed query performance and its linear scalability. In OceanBase 4.x, the distributed query optimizer has been restructured to a one-stage architecture, alongside the implementation of an adaptive and highly parallel execution engine, reducing the execution time of 99 queries for TPC-DS 100GB from 918s to 270s.

oceanbase database

Built-in multi-tenant mechanism: Improving database resource utilization and management efficiency

The image below illustrates the operating status of MySQL instances in a specific business scenario. Multiple instances with different specifications are running, each with low CPU utilization and reserved disk space, giving rise to various challenges:

●  Inefficient resource reuse: Each instance occupies resources exclusively, resulting in low resource utilization and density.

●  High storage costs: Each instance consumes fixed storage space, resulting in non-shareable storage.

●  Limited burst capacity: Scaling instances rapidly to handle burst traffic often requires time-consuming vertical scaling.

●  Complex management of multiple instances: As the number of instances grows, operational costs increase, encompassing the management of primary and standby databases, backup and recovery, and issue diagnosis.

oceanbase database

OceanBase has a built-in multi-tenant mechanism that allows multiple MySQL instances to be merged into a single cluster. Drawing from Ant Group's practices, merging business operations with peak and off-peak periods into a single cluster, such as conducting business operations during the day and batch processing and analysis at night, enables comprehensive utilization of the cluster's computing and storage resources.

●  Shared resource: Tenants can share resources and smooth out peak loads. Starting from OceanBase 3.2.x, CPU can be isolated using cgroups. Worker threads from different tenants are placed in different cgroup directories. When a tenant has a high load and the rest of the tenants on the OBServer are idle, the high-load tenant's CPU can exceed the limit, but memory usage is still strictly limited.

●  Burst traffic resilience: Tenants can preempt resources through overselling mechanisms and support rapid scaling in seconds.

●  Storage sharing and isolation: IOPS and storage limits of the tenants' SSTables can be controlled, allowing multiple tenants to share the same storage space.

●  Single-cluster management: Transitioning from managing numerous individual database instances to a single cluster substantially reduces operational and management costs.

oceanbase database


Reducing storage costs: Maximizing compression efficiency while maintaining high performance

Adaptive Data Compression: Ultra-high data compression mechanism, saving 70-90% of customer storage costs

The storage engine of OceanBase is founded on the LSM-tree architecture. Data is divided into static baseline data (stored in an SSTable) and dynamic incremental data (stored in a MemTable). The MemTable, which can be read from and written to, is stored in memory. Upon reaching a specified threshold or during daily compaction, the MemTable's data is compacted with the baseline data and stored in the SSTable on the disk.

oceanbase database

This architecture allows for the utilization of high compression ratio methods, demonstrated in OceanBase's exceptional compression capabilities in diverse application scenarios. Leveraging OceanBase's hybrid row-column storage and efficient encoding method enables substantial storage reduction. To increase the compression ratio, OceanBase adaptively detects a more suitable encoding method during compactions to encode data. OceanBase supports a variety of encoding formats for compression by column, such as Dictionary Encoding, Run-Length encoding (RLE), Delta Encoding, Constant Encoding, Prefix Encoding, Hex Encoding, as well as Column Equal Encoding and Column Prefix Encoding.

Practical applications of data compression across various business scenarios have shown storage space reductions to one-third of their original size, and in certain cases, up to 90%. More importantly, OceanBase not only achieves compression goals without compromising query performance, but also demonstrates improved write (compaction) performance.


With optimized Paxos multi-replica mechanism, computing/storage/network costs decrease by 67%

The classic three-replica deployment structure in databases requires data to be stored in three replicas, each of which contains full log and data files. OceanBase refers to this type of replica as a "full-featured" replica, and if all replicas on a node are full-featured replicas, that node is called a "full-featured" node. In most scenarios, businesses deploy a solution with three full-featured nodes.

Before OceanBase 4.0, OceanBase supported log replicas, which only participate in Paxos voting and log replication, without storing baseline data. If all replicas on a node are log replicas,  that node is called a "log node". By exclusively storing log files and not data files, a log node can curtail its storage expenses by about 33%. Additionally, owing to the reduced performance demands of log nodes, the computing costs drop by 20%. This configuration is recognized as OceanBase's "F-F-L" (2F1L) structure.

Starting from OceanBase 4.0, log replicas have been transformed into arbitration replicas, which only participate in Paxos voting without the need for log replication. These streamlined arbitration nodes do not store log files or data files, and they demand lower performance, thereby leading to a 33% reduction in storage and computing costs. Moreover, the absence of log replication also results in decreased network bandwidth costs, while enabling lightweight cross-city deployments.


Flexible Cloud Storage: Selecting the most appropriate cloud storage at a reasonable cost

The initial step following the deployment of OceanBase Dedicated in a cloud environment is the selection of cloud storage. Various cloud providers offer a range of cloud storage options to cater to diverse business scenarios, yet they also come with inherent limitations. Taking Alibaba Cloud as an illustration, options include local SSDs, local HDDs, ultra disks, standard SSDs, and ESSDs (PL0, PL1, PL2, PL3). The prime choice for databases in terms of performance (latency, IOPS, jitter, etc.) is local disks. However, the availability of local disk options is tied to server specifications. For instance, on ecs.i2.2xlarge, users are restricted to 2 * 1788GiB local disks, while ecs.i2.4xlarge offers 4 * 1788GiB local disks. Cloud disks provide the advantage of on-demand usage, but also present notable issues. For instance, cloud disk access involves network usage, resulting in higher latency compared to local disks, and potential significant network jitter, with IOPS being restricted by ECS specifications.

OceanBase Dedicated (OceanBase Cloud) has implemented numerous optimizations for cloud disks, including:

●  Latency: Thanks to OceanBase's storage engine, write operations are primarily written to the MemTable, while read operations support multi-level caches. In addition to the Block Cache (similar to Oracle and MySQL's Buffer Cache) used to cache SSTable data, there are also Row Cache/Fuse Row Cache (caching data rows), Bloom Filter Cache (caching static data's bloom filter to speed up filtering of empty queries), Clog Cache, and Schema Cache (caching table schema information). With all these caches, the cloud disk P99 RT is only 5% higher than that of local disks.

●  Network jitter: The implementation of distributed error detection technology enables OBServer to promptly identify disk jitter and IO failures, providing feedback to OBProxy to facilitate rapid node switching, thus minimizing business impact.

●  IOPS: OceanBase's utilization of disk IOPS is chiefly concentrated during compactions, and it employs an IO calibration mechanism for precise control over cloud disk IOPS.

●  Disk selection: OceanBase Dedicated supports the most efficient and cost-effective cloud disk types available in the cloud, including options like Alibaba Cloud's ultra disks, standard SSDs, and ESSDs (PL0, PL1, PL2, PL3). It also extends support to various other cloud disk specifications offered by different cloud providers such as AWS, ensuring performance and stability are not compromised.

oceanbase database

Furthermore, OceanBase Dedicated has undergone specific optimizations for historical data archiving scenarios, facilitating the archiving of cold data to cost-effective storage options within OceanBase (such as HDD and ESSD PL0). Users are required to configure settings only once, after which the system will autonomously segregate hot and cold data, and archive cold data to more economical object storage solutions (such as Alibaba Cloud's OSS, AWS's S3, etc.), resulting in significant reductions in storage expenses.


Reduced management costs: Enhanced compatibility to improve development and operations efficiency

Fully compatible with MySQL, minimal data migration costs

Complete compatibility includes application compatibility, database syntax compatibility, and migration tools.

OceanBase is compatible with most features and statements of MySQL 5.7 and 8.0, enabling businesses using MySQL to migrate to OceanBase at a minimal cost. OceanBase also provides customers with the capability to be compatible with MySQL BinLog. The Binlog service of OceanBase Dedicated is designed to collect transaction logs from OceanBase and convert them into the MySQL binary log format. The Binlog service enables users to use existing change data capture (CDC) tools to synchronize incremental data from OceanBase MySQL instances, eliminating the need for secondary development or setting up new environments. Users can seamlessly switch to OceanBase Dedicated while continuing to use their original MySQL engine-compatible incremental data subscription solutions.


Full lifecycle management of business data

As an enterprise-level database solution for all scenarios and forms, OceanBase provides complete enterprise-level products for the full lifecycle management of business data, including development, assessment, migration, operation, diagnostic, and more.

●  Streamlined Migration to OceanBase Dedicated: Database migration involves compatibility assessment and actual migration. In terms of compatibility assessment, OceanBase Dedicated provides the OMA (OceanBase Migration Assessment) platform, which thoroughly analyzes and pre-assesses SQL, database objects, and database performance to identify and prevent compatibility and performance issues post-migration. Additionally, the OMS (OceanBase Migration Service) supports data migration, real-time data synchronization, and incremental data subscription, facilitating low-risk, cost-efficient, and efficient data flow within OceanBase, and establishing a secure, stable, and efficient data replication architecture.

●  Assured Operations in OceanBase Dedicated: OceanBase Dedicated provides the ODC (OceanBase Developer Center) platform for developers and the OCP (OceanBase Control Platform) for operators, covering the entire lifecycle of business systems from design and development to online operation.

●  Comprehensive Database Ecosystem: OceanBase Dedicated is integrated with both its own ecosystem products and the broader upstream and downstream database ecosystem, including ecosystem products from Alibaba Cloud, multi-cloud ecosystems, open-source ecosystems, as well as general database tools and industry software support.


How to achieve sustainable cost reduction and efficiency enhancement?

In the previous sections, we introduced the core technologies of cost reduction and efficiency enhancement in OceanBase Dedicated. Taking this as a starting point, how can we further drive sustainable cost savings and efficiency gains?


NoSQL capabilities based on Table and HBase models

OceanBase, being a relational database, has implemented a fully scalable distributed LSM-Tree storage engine with built-in support for multiple models.

oceanbase database

The adoption of LSM-Tree storage engine by OceanBase effectively supports both TP and AP, providing robust capabilities for high availability, scalability, and distribution. The goal is to replicate this architecture to accommodate different types of business workloads. By utilizing the underlying KV capabilities, OceanBase can offer KV and HBase product capabilities for key-value and wide-table scenarios, respectively. OceanBase's HBase model follows a storage schema comprising a table with four columns—KQTV, corresponding to HBase's Rowkey, Family Qualifier, Timestamp, and Value.

The multi-model architecture of OceanBase allows for the following features:

●  Enabling NoSQL workloads to benefit from OceanBase's enterprise data security and high availability services.

●  Robust elasticity to meet the rapid online scaling requirements of major promotional events such as "Double 11 Festivals".

●  Support for global deployment, such as the "five IDCs across three regions" multi-site deployment architecture.

●  Support for database ACID transaction capabilities and consistency models, achieving double the data compression rate of HBase


Accommodating the deployment requirements of users with varying specifications

The evolution of instance specifications demonstrates OceanBase Dedicated's continuous breakthroughs in supporting small-scale deployments:

●  2020: Cluster instances 14C70G, 30C180G, 62C400G

●  2021: Cluster instances 8C32G, 24C120G

●  2022: Cluster instances 4C16G

●  2023: Tenant instances 1C4G


OceanBase is dedicated to supporting more users across their journey from inception to maturity, facilitating the full development process of businesses.

ICON_SHARE
ICON_SHARE