Blog编组 28
The Future of Relational Databases
右侧logo

Yang Zhenkun, OceanBase founder and chief scientist, made a keynote speech at HICOOL Global Entrepreneurs Summit. Dr. Yang talked about the milestones of the world’s mainstream relational databases and shared his visions on the future of distributed databases.

When Dr. E. F. Codd, an IBM researcher, described the relational model for the first time in 1969, the database industry stepped into the era of relational databases. After decades of development, relational databases have become a fundamental part of the information infrastructure that supports every aspect of our lives today, from transportation to communication, business, and others.

Mainstream Relational Databases

Over the past half-century, the world has witnessed the birth of great relational databases. For example, Oracle born in the 1970s, as well as DB2 and SQL Server emerging in the 1980s are the three most popular and outstanding commercial databases so far; and PostgreSQL and MySQL born in the 1990s are the top two popular open source databases till today. After them, no other mainstream relational databases have been introduced.

oceanbase database

Relational databases form the foundation of the information society nowadays because of their capabilities of bookkeeping, transfer, and settlement, as well as the ACID transaction properties of these capabilities, namely atomicity, consistency, isolation, and durability. These capabilities greatly facilitate activities in fields such as commerce, government affairs, and the Internet.

oceanbase database

It is extremely hard to develop a new relational database system for real world production due to two reasons. First, as a key infrastructure in the information society, relational databases must meet demanding requirements on their system stability, data consistency, and service reliability. Second, the databases themselves are so complicated and confronted with many technical challenges.

Challenges to Monolithic Relational Databases

All mainstream relational databases in use feature a centralized architecture. While backing the information society today, centralized databases face significant challenges.

First, they are difficult to scale out. A centralized database is essentially a standalone system, which relies on single shared storage.

Second, they incur huge costs. A centralized database costs high in hardware ownership because the reliability, stability, and performance of the entire system rely on the hardware of individual servers.

This issue becomes worse in the Internet age.

In the good old days, a traditional mall, bank, or hotel, for example, was built with a certain number of service counters, no matter how many customers were waiting in line. As long as you could design and test a database system based on the assumption that every service counter has only one operator at work, the database system would just run well because the actual number of customers at counters would never exceed the number of operators.

In the Internet age, however, every person having Internet access becomes an operator, and therefore the traffic to your business systems may increase a hundred or a thousand times, or even more. Also, the traffic may surge or plunge in a rather short period of time. In this case, if you still build a database system based on the possible maximum number of operators, the system may need to hold hundreds of millions or even more operators.

Moreover, the surge in visit and concurrency also brings enormous growth in data volume, which is far beyond the storage capacity of a single centralized database, making it almost impossible for enterprises to perform data analysis.

For transaction processing, we can split up business into smaller parts and run each part on a standalone database, much like dividing an army and transferring soldiers and light equipment in batches by multiple light aircraft. However, for analytical processing on such massive data, we must build a database that is large enough, much like airlifting tanks, cannons, and other heavy equipment by large cargo planes.

Distributed databases play the role as those large cargo planes.

Distributed Databases Set the Trend

For decades, academicians have preached that one size doesn’t fit all. On the contrary, entrepreneurs are looking forward to a database system that solves their problems once and for all, so that their companies can concentrate on business growth.

Some may argue that a large-plane kind of database is not something required for all, and that distributed databases make sense only for medium-sized and large enterprises. While that being true, micro and small enterprises can also benefit from distributed databases.

oceanbase database

The preceding figure shows typical traffic curves of almost all enterprises. The business traffic remains low most of the time and rises up to higher levels only at certain points in time.

Given the stunning computing power and storage capacity of modern servers, it is technically feasible for a company to configure a powerful high-performance database server to accommodate its highest traffic.

The problem is this solution causes great waste, because it does not get the most out of this super server most of the time.

With distributed databases, you can come up with another solution. You can build a distributed database of a single VM server with low configurations to host regular business data. When the business traffic increases or even reaches the peak, you can add more VM servers as required to this distributed system. After peak hours, idle VM servers can be gradually released. As a result, the costs of your company are reduced significantly.

oceanbase database

It is hard to develop a relational database and even harder to develop a distributed relational database. How can we make it possible?

Let’s look at two examples in the industry.

One is Google Spanner. You may have learned that it’s a distributed database upgraded directly from a distributed storage system. The other is OceanBase Database, which has evolved from a semi-distributed database to a high-availability database and then to a distributed relational database.

The two just take different paths to the same destination.

ICON_SHARE
ICON_SHARE