Blog编组 28
4 Questions to Ask When Considering a Distributed Database

4 Questions to Ask When Considering a Distributed Database

右侧logo

oceanbase database

Photo by Jason Strull on Unsplash

At the Database Talk Show held by InfoQ and OceanBase in August 2022, database experts from OceaneBase and other organizations talked about the distributed database industry. They include Li Wei from a research center; Wang Nan from OceanBase; Yang Jianrong from the DBAPlus community; and Liu Bo from Trip.com. This article is a transcript of their conversations.

What are the key issues solved by distributed databases?

First, performance bottleneck.

Distributed databases overcome the performance bottlenecks of centralized standalone databases in the face of massive data, such as insufficient processing and storage capacities.

Second, data consistency.

Distributed databases must ensure data consistency because the risk of data consistency rises when the number of data nodes increases.

To address this issue, distributed databases support built-in high availability. For example, Trip.com used to combine a commercial database with a storage system and relied on high-end hardware to ensure high availability. This architecture was then replaced by a compromised high-availability solution based on MySQL, and its services may be affected during the database failover. Distributed databases support flexible deployment modes across multiple data centers or regions. They not only provide higher system availability and data security but also make the failover process transparent to application systems. Trip.com has deployed OceanBase Database in its production environment, adopting the “Three Data Centers in One City” deployment mode, which can withstand the failure of a single data center. The three data centers are equivalent, and business traffic is routed to the nearest nodes. In addition, the failover can be performed automatically without complex switchover logic and manual interventions.

Third, high availability.

The high availability of a distributed database ensures that its overall availability will not be affected by a single point of failure. The system availability of a centralized database depends largely on its hardware. In a distributed architecture, high availability is achieved by software, which ensures service continuity in industries that have ultra-high requirements for system availability, such as finance and telecommunication.

Fourth, flexible scalability.

Facing the traffic of an application that goes up and down from time to time, the flexible scalability of a distributed database allows users to reduce costs. For example, the traffic of Trip.com soars in winter and summer, during which the system receives a large amount of online ticket and hotel booking orders. Technologies like Kubernetes can well support the scaling of applications. However, the need for flexible scaling at the database layer is addressed only by distributed databases that provide dynamic scaling capabilities.

Have distributed databases become the mainstream?

This is a controversial topic. Regardless of market share or user acceptance, it is hard to say that distributed databases are the mainstream. However, they are highly expected in more and more scenarios. We believe that distributed databases will be a part of the mainstream in the global trend of cloud migration.

From the perspective of market demand, companies require more computing and storage resources when their business grows faster. For example, the database systems of Trip.com have developed through the following stages: standalone databases, standalone databases with high-end hardware, and sharded databases. Database sharding does not seem to involve any new technology. However, operating and maintaining a sharded system is way more complex because sharding results in swelling sizes of both databases and tables. The introduction of distributed databases relieves the workload on the business side. For example, Trip.com used to maintain or upgrade its systems at 3:00 AM or 4:00 AM each day, an off-peak window during which the traffic is low. However, it is hard to find such a maintenance window nowadays as the company has extended its business worldwide. So, it must maintain or upgrade its systems with zero business downtime. This is why Trip.com migrated its business to distributed databases in 2018. At present, instead of performing maintenance during a fixed window at 3:00 AM or 4:00 AM each day, Trip.com can do it during any off-peak hours in the day. This is just one of the benefits of using distributed databases.

What do I need to consider when choosing a distributed database?

Customers have the most to say on how to choose a database. Based on the feedback of many customers who have hosted their core business systems on the OceanBase Database, they care more about the following requirements than other product features:

First, data is the core asset of an enterprise. Keeping data correct and well organized without loss is the basic requirement for all databases.

Second, if they need a distributed database, they want to make sure that the database supports chain verification to guarantee data consistency between clusters, replicas, partitions, indexes, and even macroblocks. In this way, data consistency can be protected against silent data corruption and hardware failure.

Third, the database must support hybrid deployment and grey release based on heterogeneous chips and operating systems, so that users can build a recovery mechanism and ensure business availability during database transformation and upgrade.

Fourth, the database must support the deployment of DB-Mesh data centers, which allows them to cope with business fluctuations and deploy a cloud-native and fully encrypted database system based on logical data centers (LDCs).

Finally, the database must have a professional technical team for technical support.

On top of the above fundamental requirements, customers look at database features from the following perspectives:

First, basic capabilities and metrics. This perspective involves the basic requirements on performance, functionality, security, reliability, certification, and compatibility with other systems.

Second, product maturity. A database is favored by more customers if it has been deployed in different industries, is verified in scenarios with a different number of users and business loads, and has a complete ecosystem that integrates both the upstream and downstream systems instead of merely a database kernel.

Third, database vendor. The determination and strategic stability of the technical force that supports the product has a huge influence on the stability and availability of business systems, which is crucial, especially for medium and large enterprises.

Fourth, cost efficiency. A database costs more than the upfront investment. The costs for stability maintenance, data migration, technical support, and database O&M must also be considered. Some users even consider the costs of disaster recovery solutions in the case of emergencies.

What are the key factors to consider when migrating to a distributed database?

Let’s talk about the key factors to consider before, during, and after migration by taking OceanBase Database as an example.

Before you migrate your system to OceanBase Database, you can use OceanBase Migration Assessment (OMA) to analyze the business load and get a visual report about the degree of compatibility, recommended migration solution, and possible risks.

To ensure a smooth migration, pay attention to the following two aspects:

First, the migration method of the source data. Data migration can be easy if the source and destination databases are highly compatible and various advanced features can be directly used after the migration. Otherwise, you must manually convert tons of SQL statements and even rewrite your application code, which entails huge costs. Migration tools like OceanBase Migration Service (OMS) can be of great help because they are designed to address issues such as the migration of static and incremental data, customized filters, and operator transformation.

Dual write is a typical migration method that ensures the migration process is smooth and under control. Data synchronization is required in some cases. For example, when you migrate data from A to B, you must continuously synchronize data from A to B and perform some basic operations such as SQL replay and performance verification.

Second, the types of supported source and destination data sources. Data migration is more than migrating data from a centralized database to a distributed database. Some customers may also want to migrate something like streaming data and cached data, which requires the cloud environment to support enough types of destination data sources.

After the migration, you must check whether the migration process is completed and whether the data is consistent. You must also get prepared for unexpected situations before you run your new database.

If you are considering a distributed database for your application, feel free to leave a message below!

ICON_SHARE
ICON_SHARE