For any enterprise seeking continuous development and profitability, cost reduction and efficiency enhancement are unavoidable imperatives. However, a blanket approach to cost reduction often yields uncontrollable impacts, such as inefficient operations and slow growth. This is why we emphasize that efficiency should not be compromised, but rather enhanced, while reducing costs.
This article examines the comparison between OceanBase Dedicated and MySQL and provides insights on how OceanBase Dedicated empowers enterprises to achieve cost reduction and efficiency enhancement.
In recent years, there has been a heightened emphasis on cost reduction and efficiency enhancement across various enterprises. This often necessitates cutting back on resources to achieve these objectives. However, when it comes to the database, which plays a fundamental role in software architecture, reducing costs through resource downsizing poses significant risks.
Ensuring cost reduction while upholding exceptional throughput, stability, and performance is crucial. This is precisely the focus of OceanBase – achieving cost reduction through technological advancements.
Let's start with a comparison between MySQL and OceanBase.
● The first pain point of MySQL is having multiple and complex instances. Different MySQL instances have unequal resource utilization rates, with some being high and others low, leading to increased operational complexity when dealing with multiple instances. OceanBase, on the other hand, is a native distributed database that can integrate instances into a single cluster through multi-tenant capabilities, reducing operational complexity while improving overall cluster utilization.
● The second drawback of MySQL is its inadequate data compression. Due to MySQL's B+ tree storage architecture, certain empty spaces are unavoidable on each page. In contrast, OceanBase employs an LSM-Tree storage architecture along with unique compression algorithms. This allows the compression of data to 1/4 ~ 1/5 of MySQL's size, or even higher compression rates, effectively reducing storage expenses.
● The third issue with MySQL is its limited scalability. With MySQL's primary/standby structure, scaling requires replacing machines, causing interruptions in the application during the primary/standby switch. To mitigate such issues, it is often necessary to maintain a consistently high level of traffic capacity. For instance, while an 8C setup may suffice most of the time, it is essential to maintain a 16C capacity for an extended period to meet peak business demands. In contrast, OceanBase offers elastic and transparent scalability, enabling you to adjust specifications as required without the need for prolonged resource maintenance, ultimately leading to cost savings.
● MySQL's weak analytical capabilities present the fourth challenge. As a pure OLTP database, MySQL requires data migration and separate analytical instances to handle analytical workloads, which doubles the costs. In contrast, OceanBase, functioning as an HTAP database, integrates most business processes in one database, without the need for separate analytical instances, effectively reducing costs.
OceanBase aims to address the inherent issues in MySQL's architecture, including inefficient resource allocation and excessive redundancy, ultimately achieving significant cost reductions. Leveraging the technological capabilities of OceanBase Dedicated, the overall total cost of ownership (TCO) can be reduced by 30%. For more details, please refer to the previous article: "Maximize Efficiency and Savings with OceanBase Dedicated".
Before delving into OceanBase's features, let's begin with an overview of the system architecture of OceanBase. There are five key terms: OBProxy, OBServer, Partition, Zone, and Paxos.
● OBProxy: When applications connect to OceanBase, they go through a component called OBProxy. It is a lightweight node that doesn't handle SQL aggregation or other computation, but rather only handles SQL forwarding, which differs from other conventional middleware solutions. OBProxy on OceanBase Dedicated does not incur additional charges.
● Zone: In OceanBase Dedicated, a Zone corresponds to a data center/availability zone. OceanBase clusters are typically deployed across three IDCs to guarantee high availability at the data center level. Within each Zone, there can be one or multiple OBServer, with a Zone having multiple Partitions.
● OBServer: OBServer is the service node in OceanBase, which serves as the node for SQL or storage. Each Zone can accommodate one or more OBServer, allowing Oceanbase to achieve powerful horizontal scaling by adding OBServer within each Zone.
● Partition: Partition is the smallest unit of load balancing in OceanBase. Each partition stores a tablet of data. A user-defined non-partitioned table can also be considered a tablet.
● Paxos Group: OceanBase uses the Paxos protocol to synchronize logs between replicas to ensure data consistency between the replicas and thereby implement high availability of data. The primary replica is elected among multiple replicas by using the election protocol. If the primary replica fails, the system automatically switches over to a standby replica by using the election protocol. The primary replica is called a leader, and the standby replicas are called followers. For example, in the diagram, the three replicas of partition P7 are distributed across three Zones. These three replicas automatically elect a leader, which is now located in Zone1 as shown in the figure.
In comparison to MySQL's primary/standby architecture, where MySQL transforms the data that needs to be synchronized into binlogs and then synchronizes them to the physical logs on the standby machine. Paxos does not involve this complex conversion, resulting in lower latency and higher reliability. In a three-replica cluster, a consensus of at least two out of the three Paxos replicas is required to finalize the leader election and log submission, enabling OceanBase to not lose any data in the event of a failure at the data center level. OceanBase can recover from any level of failure within 8 seconds, significantly improving availability compared to MySQL.
The following diagram represents a common deployment architecture where each application corresponds to a separate database instance. However, the presence of multiple database instances leads to a common issue of uneven resource utilization. For instance:
● Database instance 1 is a system with consistently low traffic, leading to utilization rates generally in the single digits, around 5%-10%.
● Database instance 2 is a highly fluctuating transactional system, such as flash sales or e-commerce systems, resulting in utilization rates fluctuating from 3% to 80%.
● Database instance 3 is a relatively vulnerable system, constantly operating at a critical level.
Simultaneously monitoring multiple MySQL instances leads to increased operational complexity and higher risks.
In OceanBase, multiple machines and resources can be utilized to create a resource pool, with each tenant being allocated a dedicated portion of resources. This system allows a tenant to host what was previously an instance, thus streamlining management by operating only one OceanBase cluster instead of managing multiple RDS or MySQL instances.
This approach enables flexible adjustment of tenant specifications without impacting the business, improving resource allocation efficiency, reducing costs, and significantly simplifying operational complexity by eliminating the need to inspect all the instances.
OceanBase utilizes an advanced compression storage engine based on the LSM-Tree. Unlike the B+ tree, data is divided into static baseline data (stored in an SSTable) and dynamic incremental data (stored in a MemTable) in the LSM-Tree architecture. The MemTable, which can be read from and written to, is stored in memory. Upon reaching a specified threshold or during daily compaction, the MemTable's data is compacted with the baseline data and stored in the SSTable on the disk.
Through this compaction process, the baseline data of the LSM-Tree architecture is compact and gapless, whereas, in B+ tree, each page contains empty spaces. Therefore, OceanBase's compression performance is naturally superior to that of MySQL.
Furthermore, on top of the inherent capabilities of LSM-Tree, OceanBase has implemented two additional levels of compression.
The first level of compression involves storing data in a hybrid row-column format, allowing the microblocks (equivalent to MySQL page level) to transition from row storage to columnar storage. This approach offers numerous benefits, such as enabling encoding for the column-stored data.
It is common to find fields with high duplication rates. For example, in the diagram, the "RATE_ID" field has 4 recurring values. In this scenario, a dictionary can be assigned, where the ID "1901321" corresponds to "0". With this dictionary, each column only needs to store values like "0/1/2", resulting in significant storage compression. As the second level of compression, OceanBase adaptively detects a more suitable encoding method during compactions to encode data.
At the first level of compression, data of 100GB can potentially be compressed to 30GB. Building on this, OceanBase applies an additional common compression to the 30GB data, further reducing its size. This involves using algorithms like LZ4 or other common compression methods.
At the second level of compression, the data size may shrink to 15GB. In reality, many of our clients have experienced a decrease to less than 15% of the original data size, leading to substantial cost savings.
OceanBase's high compression performance will not affect read and write operations. Data compression within the LSM-Tree takes place during the compaction process, which is typically scheduled during off-peak business hours, such as around 2 AM. Users can conveniently choose low-traffic periods for this process. Therefore, during peak hours, the system remains unaffected by performance loss due to compression and can instead benefit from reduced storage costs.
OceanBase offers multi-level elastic scalability, enabling dynamic and real-time adjustments of cluster resources to align with business development and allowing for flexible cost adjustments based on user requirements. This grants significant operational flexibility. The scaling capabilities of OceanBase are categorized into three levels: tenant level, machine specifications level, and machine quantity level.
OceanBase, as a distributed database, organizes multiple machines into a resource pool that can be subdivided into isolated resource groups known as tenants. As tenants are an internal partition of OceanBase's resources that is entirely managed by OceanBase, adjusting their specifications does not require physical resource modifications. This allows users to easily adjust tenant specifications within seconds, with the entire process being completely imperceptible to applications.
The operations team can adjust the CPU and memory resources for a tenant at any time, even during business time, allowing for a seamless increase in the maximum TPS for the tenant.
In the presence of high business traffic, simply adjusting tenant specifications may not be sufficient, requiring an upgrade in machine specifications. For instance, scaling the cluster from 30C to 62C to manage high traffic. Scaling in MySQL involves a primary/standby switch, which can disrupt business operations. In contrast, OceanBase synchronizes data between nodes using the Paxos protocol, with the key feature being self-election, where a consensus of at least two out of the three Paxos replicas is required to finalize the leader election and log submission.
This approach brings two major advantages over MySQL's primary/standby structure:
● OceanBase offers smaller data synchronization units, leading to higher performance and flexibility. OceanBase's Paxos groups operate at the partition level, providing a finer granularity compared to MySQL's node-level log synchronization, thus avoiding the performance impact caused by the need to ensure global order. Additionally, OceanBase supports distributed transaction capability and allows leaders of different partitions to be located on different nodes.
● OceanBase's synchronization logs are lighter and impose a smaller cost. Unlike MySQL's process, which involves producing binlogs and transforming them into relay logs for execution on standby machines, OceanBase's clog is more lightweight and efficient, alleviating the primary/standby latency issues encountered in MySQL.
To upgrade the machine specification in OceanBase, a new machine with higher specifications will be added to the cluster to synchronize the data. Then, a Paxos leader election will be conducted, in which the current leader completes its final log submission and votes for another node to become the new leader. Compared to the disruptive primary/standby switch in MySQL, the entire process of upgrading in OceanBase is imperceptible to applications.
OceanBase's native distributed database has a unique capability that sets it apart from MySQL's primary/standby architecture - seamless horizontal scalability. This means that as new machines are added to the OceanBase cluster, business traffic is automatically migrated to the new machines without any disruption to applications. This capability has been extensively tested and proven to be a superior alternative to sharding strategies.
MySQL, as a standard OLTP (Online Transaction Processing) database, has limitations in terms of optimization capabilities. It struggles with supporting queries with large results and complex table joins, often resulting in performance issues. There is also a lack of flexible resource isolation in MySQL, large queries often impact TP business performance. Consequently, MySQL is generally not suitable for analytical business needs.
When there is a demand for AP (Analytical Processing) in the business, asynchronous data transfer (ETL) is typically utilized to create a dedicated analytical database for OLAP (Online Analytical Processing) workloads. However, this approach incurs greater costs due to the data transfer process and the need for additional analytical instances.
OceanBase is an HTAP (Hybrid Transactional/Analytical Processing) database with the ability to handle both OLTP and OLAP workloads in the same cluster, supported by enhanced AP features in four key aspects:
OceanBase aims to provide a comprehensive HTAP solution for all demands using a single system and dataset, eliminating the need for additional expenses or dedicated analytical nodes. By integrating OLTP and OLAP workloads on one storage engine and structure, OceanBase provides users with reduced costs and increased efficiency.
Furthermore, to ensure that OLTP is not affected by OLAP workloads, OceanBase supports resource isolation in scenarios of mixed workloads:
● Setting up dedicated analytical tenants for physical isolation.
● Within the same tenant, utilizing multiple availability zones/replicas for physical isolation.
● Within the same tenant, utilizing resource groups for resource isolation.
How to leverage OceanBase's capabilities to truly achieve cost reduction?
In general, you can estimate the size of the OceanBase cluster to be approximately 80% to 95% of the total instance specifications of the original MySQL setup and 15% to 20% of the total storage capacity.
Here are two specific instances to exemplify the cost reduction strategies with OceanBase:
Consider the following scenario for a small to medium-sized company: the company has one 16C RDS instance, two 4C RDS instances, and four 2C RDS instances, which is a common setup.
● The 16C RDS is used for the company's core external business and experiences relatively high resource utilization.
● The two 4C RDS instances are utilized for other businesses, such as inventory and order systems, with relatively lower resource utilization.
● The four 2C RDS instances are used for internal services.
The fluctuating resource utilization of these databases, based on their business attributes, poses challenges for operational management. The total capacity is 16C + 2*4C + 4*2C = 32C; 3TB + 2TB + 1TB = 6TB. If migrated to OceanBase, a 30C OceanBase cluster with 1.5TB of storage could effectively meet all business needs, consolidating the scattered resources into a more balanced configuration. Specifically, it is estimated that in a sysbench scenario with 1000 concurrent read-write, OceanBase demonstrates an approximate 30% reduction in total cost.
In a large company scenario, there is a 32C RDS supporting core business operations, a 16C RDS supporting secondary business operations, and five 4C RDS instances supporting internal small-scale services. The overall resource utilization is similar to the previous example, with a total of 32C + 16C + 5*4C = 68C; 10TB + 5TB + 5TB = 20TB. In this case, if migrated to OceanBase, a 62C OceanBase cluster coupled with 4TB of storage could support this entire setup.
In terms of cost planning, in a sysbench scenario with 1000 concurrent read-write, a computing resource of 68C in MySQL requires additional 136C for backup, resulting in significant resource waste. However, in OceanBase, with 62C of computing resources and a scheme that disaggregates leaders and followers, all resources can be fully utilized, leading to approximately a 40% reduction in total cost.
In summary, OceanBase is committed to achieving cost reduction through technological advancements, without compromising database performance in areas such as data throughput, high availability, online DDL capabilities, and user-friendliness.
As OceanBase Dedicated continues to evolve, our goal is to deliver tangible value to enterprises. This entails enhancing efficiency, ultimately achieving true cost reduction and creating greater value for the enterprise.