OceanBase Database implements the shared storage (SS) mode based on universal object storage. Unlike the shared-nothing (SN) mode, in the SS mode, the storage resources are completely separate from the compute nodes. Storage resources can be purchased on demand, and the compute nodes can be flexibly scaled up or down, achieving optimal resource elasticity and cost optimization.
What is a separation of compute and storage architecture?
On the basis of shared storage, cloud-native databases with storage and computing separated support are available. The storage and computing resources of a cluster are decoupled using shared storage. All data is stored on the object storage service, and each computing node caches hot data locally. The cache hit rate is usually less than 100%. For businesses where the hot and cold data can be clearly distinguished, the cache hit rate can be as high as 99%~99.9%. The latency of requests that hit the cache is the same as the latency in a unified storage and computing architecture. However, the latency of a request that accesses the object storage service exceeds 80 ms. Businesses must be able to tolerate latency jitter in this architecture. However, it is not recommended for businesses that are highly sensitive to latency jitter. This architecture reduces storage costs and enables independent scaling of computing and storage.
Architecture advantages
- Low cost:
- Reduce storage costs by using low-cost object storage as the primary storage medium.
- Separating data storage and computing for high availability. High availability with single data replica is supported.
- Flexibly manage resources: storage resources and computing resources can be independently scaled up and down, making more efficient use of hardware resources; rapid scale-up: data doesn't need to be physically migrated when computing resources are scaled horizontally.
- High availability: RPO=0, single-replica RTO in minutes, multi-replica RTO <= 8 seconds.
Technical features
- Independent scaling: Compute and storage resources can be independently scaled to meet business needs.
- High availability: The storage layer typically uses a distributed storage system and has higher data reliability and disaster recovery capabilities.
- Resource isolation: Compute and storage tasks are independent, reducing resource contention and improving overall system stability.
- Elastic scalability: Separating computing and storage layers reduces storage costs compared to integrated architectures. This allows independent scaling at the compute and storage levels, providing greater flexibility.
Use case
The main scenarios are those that require relatively large amounts of data and have a clear hot and cold data separation. This makes the scenarios insensitive to latency.
TP OceanBase Database provides storage for databases that support complex queries and analytics, such as historical databases, backup databases, order databases, billing databases, message databases, access footprint databases, log databases, and IoT monitoring databases. This feature helps you significantly reduce storage costs while meeting regulatory compliance requirements. For example, financial institutions often need to retain transaction records for many years. With OceanBase Database, you can efficiently store and manage these records.
In this scenario, high availability is required to ensure that no data is lost during a disaster. In the event of an IDC or AZ failure, the Recovery Point Objective (RPO) must be zero, and the Recovery Time Objective (RTO) must be less than 8 seconds. Data read latency requirements are low, with the primary storage capacity being cold storage. Hot data is cached locally, and support for automatic cold-warm separation and manual configuration of hot data rules are provided. This is the usage scenario. You can add links to related features provided by Alibaba Cloud.
For example, as shown in the following figure, the deployment mode is multi-zone deployment in a single region, with LogService configured.
AP
This business scenario supports low-cost storage and analysis of massive amounts of data. For example, in internet companies and social media platforms, massive amounts of user behavior data are generated daily, and rankings are real-time generated and displayed.
This scenario requires low availability, allows for RPO>0 when an IDC or availability zone (AZ) fails, and has an RTO of several minutes. It requires high throughput and is insensitive to query latency. It supports automatic cold-warm separation and manual configuration of warm data rules.
For a single-replica deployment scenario, the following statements are true: This deployment scenario is for single-zone deployment, and horizontal scaling-in and scaling-out can be performed by adding or removing tenant units within the zone. In object storage, same-city replication or local replication can be selected. If a single node of the OBServer process fails, RPO is 0 and RTO is within minutes. If a zone fails, RPO is 0 and RTO is within several minutes.
KV
It supports scenarios involving massive data storage and querying. This includes storing and analyzing user behavior logs such as clicks, browses, and searches.
In this case, the high availability requirement is low. When an IDC or AZ fails, the RPO is greater than 0, and the RTO can be achieved in minutes. Additionally, storage requires multi-query and less sensitive to query latency. Hot data requires local caching, supports automatic hot/cold data separation, and allows manual configuration of hot data rules. This approach can help achieve more cost-effective storage of wide-table and KV-type data.
In this deployment method, a single zone is used to deploy an OB cluster. You can scale out or scale in tenants by adding or removing tenant units. The object storage service is deployed with either intra-region or in-region redundancy. If an OBServer node in the zone fails, the RPO is 0 and the RTO is in the minute level. If the zone fails, the RPO is 0 and the RTO is in the minute level.
