
Every relational database in widespread production use today — Oracle, DB2, SQL Server, PostgreSQL, MySQL — was conceived between the 1970s and the 1990s. The design assumptions were reasonable for that era: traffic was predictable, hardware scaled vertically, and a single well-provisioned server could be sized for the worst case the business plausibly faced. Concurrency was bounded by the number of physical operators behind a counter, an ATM, or a back-office terminal.
The internet broke every one of those assumptions. When every customer with a phone is a potential operator, traffic can rise by two or three orders of magnitude in seconds and collapse just as quickly. E-commerce flash sales, payment peaks, multi-tenant SaaS bursts, real-time risk control — these workloads share a common profile: high concurrency, strong consistency, continuous availability, and unpredictable scale. They define what "mission-critical" means today.
A workload like that places three non-negotiable demands on the database underneath it:
The architectural question is whether the database underneath was ever designed for this.
For decades, the answer to "we need more capacity" was "buy a bigger box." That answer has structural problems that have only become more visible with time.
First, vertical scaling is bounded by physics. Moore's Law no longer delivers reliable single-thread performance gains, and high-end servers and shared storage arrays sit on a steeply non-linear price curve — doubling capacity often costs five to ten times more, not twice as much. Above a certain point, no amount of money buys a faster single machine.
Second, vertical scaling is operationally fragile. The reliability of the entire database depends on the reliability of one server, one storage array, one power domain. High availability is bolted on through primary-standby replication, where asynchronous modes risk data loss and synchronous modes typically cut throughput by 30% or more. Failover is rarely automatic, and recovery times are measured in minutes — sometimes longer.
Third, the cost curve diverges from the data curve. Workloads at 10 TB are still comfortable on a centralized stack. At 100 TB, the high-end storage and licensing costs grow faster than the data. At 1 PB and beyond, the centralized option simply runs out — and teams are forced into a re-architecture under deadline pressure, which is the worst possible time to do it.
The pattern is clear enough: centralized architecture is not failing because it is poorly engineered. It is failing because the workloads it now has to serve are categorically different from the ones it was designed for.
The industry has produced three broad responses to this problem. They are worth comparing directly, because architects evaluating database platforms today are usually choosing between them, even when the choice isn't framed that way.
| Approach | How it works | Where it breaks |
| Classic single-node (MySQL, PostgreSQL, Oracle) | One database process, vertical scaling, sharding handled outside the database | No elastic scale-out; HA bolted on; one server is one blast radius |
| Middleware-based sharding (MyCAT, ShardingSphere, application-level shards) | Multiple independent databases behind a routing layer | No global ACID; cross-shard queries and transactions degrade or fail; application carries the complexity |
| Native distributed SQL (OceanBase, Spanner, CockroachDB) | Distribution is part of the database engine itself | Higher engineering complexity inside the database; ecosystem still maturing relative to monoliths |
Sharding solves capacity. It does not automatically preserve global correctness invariants. Once you split a logical database into multiple physical databases, several invariants quietly disappear: there is no longer a single transaction manager, so cross-shard ACID in middleware-based sharding systems is typically implemented via XA or an external transaction coordinator. While technically feasible, the added latency, prolonged lock holding, and operational complexity around failure recovery often make it unsuitable for high-throughput OLTP at scale. As a result, many large-scale deployments either constrain transactions to a single shard or adopt eventual consistency patterns in practice; there is no global secondary index, so any query that doesn't include the shard key becomes a fan-out; uniqueness across the dataset has to be reinvented at the application layer.
The deeper issue is that sharding picks a single dimension — typically user ID or account ID — and optimizes for it. Any query along a different dimension (a seller's view of orders, a regional aggregate, a fraud pattern across customers) becomes structurally hard. The database is no longer one database; it is a collection of databases held together by application code.
Middleware-based "distributed" systems push the routing logic into a layer between the application and the underlying single-node databases. This is a meaningful improvement over pure application-level sharding, and it has carried many internet workloads through their growth phase. But the limits are the same in kind: queries that don't include the shard key are restricted, secondary indexes that aren't aligned with the shard key are difficult to maintain, and cross-shard transactions are typically delegated to compensating logic or eventual consistency. Industry analyses, including white papers from telecom and financial standards bodies, have converged on the same conclusion: middleware-based distribution is a transitional architecture, not a destination.
Even systems that are distributed at the database layer don't all behave the same way. A common architectural pattern separates the SQL layer from the storage layer entirely: the SQL layer parses, plans, and coordinates; the storage layer handles distribution and replication. This is conceptually clean, but it imposes a cost on every operation. Even a single-row read against data that happens to live on the same physical machine has to travel through the storage abstraction, which means a network hop. Public sysbench results for the better-known examples in this category show single-node throughput at roughly one-fifth to one-tenth of MySQL.
For OLAP, that overhead is irrelevant — analytical queries are long enough that one extra hop disappears in the noise. For OLTP, where workloads are dominated by short, latency-sensitive operations, it shows up directly in p99.
A native distributed SQL database is one where distribution is implemented inside the database engine — not above it as middleware, not below it as a remote storage tier. Done well, this approach earns three properties at once: horizontal scalability, full SQL with global ACID, and high availability that doesn't depend on external orchestration. Done poorly, it inherits the "distributed tax" described above.
The architectural question is therefore not "should the database be distributed?" but "how is distribution implemented, and what does it cost on the operations that matter most?"
Production OLTP traffic, examined carefully across diverse industries, shows a consistent shape: more than 80% of operations touch a single partition, and fewer than 20% genuinely cross node boundaries. Most online transactions partition naturally by user, account, or tenant. A balance check, an order lookup, a profile update — these are local. Cross-partition operations exist (transfers, reconciliations, aggregates) but they are the minority.
This observation reframes the optimization target. Instead of paying coordination overhead on every operation, a native distributed engine can be designed so that the dominant 80% runs with no distributed overhead at all — competing directly with a single-node database on the same hardware — and only the 20% pays the cost of cross-node coordination. This is the assumption underlying OceanBase's unified standalone-distributed architecture.
The assumption only pays off if four implementation choices line up. The architecture below is how OceanBase realizes them:
The combined result is an engine where single-node operations approach native single-node performance, cross-node operations invoke two-phase commit only when a transaction actually spans nodes, and horizontal scaling has been demonstrated end-to-end: OceanBase's published TPC-C result reached 707 million tpmC across more than 1,500 nodes — with throughput scaling near-linearly with cluster size and a workload that includes 10–15% distributed transactions, close to real production ratios.
Three things change for teams running mission-critical workloads on this architecture.
The "grow or re-architect" dilemma goes away. Teams that start with a single-node deployment can scale to dozens or hundreds of nodes without rewriting applications, switching SDKs, or re-modeling data. Sharding is no longer a project; it is a configuration change.
Operational complexity moves out of the application. Application code stops carrying shard keys, routing tables, fan-out logic, and hand-written compensating transactions. Cross-shard joins become regular SQL. Global secondary indexes work the way developers expect. The database absorbs the complexity that middleware previously externalized.
Transactional and analytical workloads can converge. Once the engine handles distribution natively, HTAP becomes a configuration choice rather than a separate system. OceanBase, for example, exposes row, column, and hybrid storage at the table level, isolates tenants by resource group inside a shared cluster, and serves analytical queries against fresh transactional data — replacing what used to be ETL pipelines into a separate warehouse linked by best-effort sync.
For DBAs, the day-to-day implications are concrete: fewer instances to manage, automated failover with sub-8-second RTO, online scaling without maintenance windows, multi-tenant isolation that prevents noisy-neighbor incidents, and a single set of backup, monitoring, and security tooling across the whole platform.
A balanced post requires honesty about the boundaries of this approach.
A distributed system has more moving parts than a single-node system. Even when the database engine absorbs most of that complexity, operating a multi-node cluster still demands network awareness, capacity planning across replicas, and a more sophisticated failure model than a primary-standby pair.
If a workload has no natural partition key — every transaction crosses every node — the scale-out advantage is reduced. The single-node deployment of a unified architecture still removes distributed overhead entirely, and the multi-node deployment is no worse than any other shared-nothing system, but the linear scaling story doesn't apply at the same level.
Pure analytical workloads with extreme scan or join requirements can still benefit from a dedicated columnar engine. A native distributed SQL database with HTAP capabilities covers a wide range of mixed workloads, but it is not a replacement for a specialized data warehouse at the largest scales.
And the surrounding ecosystem — drivers, ORMs, observability tools, talent — has decades less accumulated maturity than the monoliths it is starting to replace. That gap is closing quickly, but it is real today.
The rest of this series moves from architectural context to engineering substance. We'll look at how high availability is actually delivered in production failure scenarios, how the transaction engine maintains correctness under high concurrency, how concurrency control trades pessimistic and optimistic execution, how global consistency is enforced across nodes, how partitioning and replica placement are decided, why Paxos is the consensus choice for critical workloads, how multi-tenancy holds up at SaaS scale, and how operational resilience — scaling, upgrades, recovery — is maintained without business disruption.
Each post stands on its own. Together they describe what it takes to run a mission-critical workload on an architecture that was designed for one — rather than retrofitted from one that wasn't — and how OceanBase's engine has approached each of those problems in production.

At the OceanBase DevCon 2024, we introduced the OceanBase 4.3.0 Beta, unveiling a brand new columnar engine. This release achieves near petabyte-scale, real-time analytics in seconds, and enhances the integration of TP and AP capabilities.


OpenClaw's memory degrades over time—an architectural limitation, not a configuration issue. seekdb M0 solves this with cloud-based memory that persists across sessions and shares learned experience across agents.


Learn how to connect Claude to OceanBase using the Model Context Protocol (MCP) and enable dynamic schema discovery and natural-language SQL without brittle prompt engineering.
