FAQs about the architecture and features of the product
Do I need one or multiple instances?
If multiple subsystems are deployed and the subsystems do not interact at the database level, it is recommended to use different instances for the subsystems.
How do users use OceanBase Database?
OceanBase Database provides an SQL interface, enabling users to access and operate the database through SQL statements.
Does OceanBase Database support Java Persistence API (JPA)?
Java Persistence API (JPA) is a set of Object Relational Mapping (ORM) specifications in Java standards. With JPA, you can describe the mappings between objects and relational tables through annotations or XML files, and persist the entity objects to a database (i.e., complete Object Model and Data Model mappings). OceanBase Database is a native distributed relational database independently developed by Alibaba and Ant Group. It does not build on any open-source product. Therefore, there is no conflict between OceanBase Database and JPA.
What level of database management do data files belong to?
At present, OceanBase Database has two types of data files, both of which belong to the cluster level:
- Data files: These files save data of all partitions, including the checkpoint data of all partitions.
- clog-related files: These files include clogs (also known as
redo logsor write-ahead logs, or WAL logs) and their index files.
How does OceanBase Database support HTAP?
OceanBase Database has a distributed computing engine that allows multiple computing nodes to run transactional applications and complex analytical applications simultaneously. This enables OceanBase Database to use one computing engine to support mixed workloads. With OceanBase Database, you can use one system to resolve 80% of your issues and make full use of your computing resources without the need to purchase additional hardware resources or software licenses.
What are instances and tenants, and what is the relationship between them?
OceanBase Database is a multi-tenant system. An instance is a tenant in OceanBase Database. Data cannot be accessed between tenants.
What is the relationship between the number of servers and the performance of OceanBase Database?
The TPC-C report of OceanBase Database shows that the system performance generally scales linearly in most scenarios.
What should I pay attention to when I use OceanBase Database for development?
Here are some points to consider during development:
- Pay special attention to memory usage when importing a large amount of data.
- If you want the index to take effect quickly, we recommend that you include the index statement when you create a table.
- We recommend that you use mysql-connector-java 5.1.30 or later.
- When you modify the column type, note that you can shorten the length of a VARCHAR column, but you cannot lengthen it.
- If a connection is idle for more than 15 minutes, the server will proactively disconnect the connection. If you use a connection pool, set the maximum idle time of a connection. For example, set the
minEvictableIdleTimeMillisparameter of Druid to less than 15 minutes.
How does OceanBase Database achieve higher compression ratios compared with traditional databases?
OceanBase Database uses data encoding, a compression technique that is tailored for relational database data. Data encoding generates a series of encoding methods based on the value ranges and types of different fields in a relational database. It understands data better than general compression algorithms, thereby achieving higher compression efficiency.
How much data is required for AP capability of OceanBase Database?
The amount of data is not limited. It can range from hundreds of GB to several PB. AP is a capability of OceanBase Database. You can configure the degree of parallelism based on the size of data. With more data, you can use more computing resources; with less data, you can use fewer computing resources. Essentially, there is no upper limit on the amount of data. The AP capability of OceanBase Database lies in its ability to fully mobilize hardware resources and generate efficient parallel execution plans.
What level of standard SQL support does OceanBase Database have in its latest version?
Most business applications in MySQL mode can be smoothly migrated with no modifications or only slight modifications. Oracle mode of OceanBase Database supports most Oracle features. Applications can be smoothly migrated from Oracle databases to OceanBase Database with slight modifications.
What is the cost of migrating a business application from a MySQL database to OceanBase Database?
OceanBase Database is compatible with general MySQL features and MySQL frontend and backend protocols. A business application can be migrated from a MySQL database to OceanBase Database after being modified slightly or not at all.
How is the AP computing capability of OceanBase Database?
AP in OceanBase Database uses hybrid row-column storage, compilation-based execution, vectorized processing, cost-based query rewriting, and optimization. In addition, OceanBase Database boasts excellent scalability. These advantages endow OceanBase Database with outstanding real-time analytical capabilities in AP. For offline big data processing, distributed big data solutions such as Spark are a better choice.
When your business application connects to OceanBase Database as a distributed database, is it different from connecting to a traditional database?
As a distributed database, OceanBase Database can deploy multiple replicas across servers. To minimize cross-server data access, OceanBase Database provides obproxy. As a dedicated reverse proxy server for OceanBase Database, obproxy offers high-performance and high-accuracy routing and forwarding services to frontend user requests and provides high availability and scalability to backend servers. Compared with proxy servers in other databases, obproxy, which is designed based on the asynchronous framework and stream-based forwarding, occupies limited resources even under heavy load and can support millions of QPS. Additionally, obproxy adopts a memory solution based on FastParse and LockFree, ensuring efficient O&M even with massive deployment.
What are the technical features of OceanBase Database's architecture?
As a native distributed database, OceanBase Database has the following technical features:
Elastic scaling
OceanBase Database supports online elastic scaling. When the storage capacity or processing capability of a cluster is insufficient, you can add new OBServer nodes to the cluster. The system automatically migrates data and distributes appropriate data partitions to the new nodes based on the processing capability of each node. Similarly, when the system capacity and processing capability are in excess, you can remove nodes to reduce costs. For example, during the Double 11 Grand Sale, OceanBase Database provides excellent elastic scaling capabilities.
Load balancing
OceanBase Database manages multiple OBServer nodes to form an OBServer cluster that serves multiple tenants. All OBServer nodes managed by OceanBase Database can be regarded as a large "resource cake". When allocating resources, the system allocates resources based on the resource requests of tenants. The load balancing feature of OceanBase Database ensures that resources are occupied relatively evenly among multiple tenants in the OBServer cluster. Moreover, the load balancing algorithm can still balance resources on existing nodes in dynamic scenarios, such as when adding or removing OBServer nodes or tenants, or when there is a significant tilt in the amount of data in the process of partition addition or deletion.
OceanBase Database uses the Root Service to manage the load balancing among nodes. Different types of replicas have different resource requirements. When the Root Service performs partition management operations, it considers the CPU, disk, memory, and IOPS usage of each OBServer node. It also avoids having all partitions of the same table fall on a few OBServer nodes. In addition, the Root Service places memory-consuming replicas and memory-efficient replicas on the same OBServer node and places disk-space-consuming replicas and disk-space-efficient replicas on the same OBServer node. After load balancing, the usage of all types of resources on all nodes tends to be balanced, making full use of all resources on each node.
Distributed transaction ACID feature
The ACID feature of transactions in OceanBase Database is implemented as follows:
- Atomicity: The snapshot transaction's atomicity is ensured by the two-phase commit protocol.
- Consistency: The consistency of the transaction is ensured.
- Isolation: The multiversion mechanism is used for concurrency control.
- Durability: The Paxos protocol is used to synchronize transaction logs to multiple replicas.
High availability
Each partition in OceanBase Database maintains multiple replicas. Log synchronization between these replicas is achieved through the Paxos protocol. Each partition and its replicas form an independent Paxos group, with one of the replicas serving as the leader and the others as followers. Some partitions on an OBServer node serve as the leader and some as the follower. If the OBServer node fails, the partitions on the node that serve as followers are not affected. The write service of a partition that serves as the leader is interrupted for a short time. Then, a follower replica is elected as the new leader through the Paxos protocol. The entire process takes no more than 30s. The Paxos protocol ensures high availability and performance in a scenario with strong data consistency.
In addition, OceanBase Database supports the primary/standby database architecture. The multi-replica mechanism of an OceanBase cluster provides rich disaster recovery capabilities. It can automatically switch to a standby IDC or region in the case of failures at the IDC or region level, without data loss (RPO = 0). If the primary cluster becomes unavailable due to planned or unplanned outages (the majority of replicas are down), the standby cluster can take over the services. It offers lossless switchover (RPO = 0) and lossy switchover (RPO > 0) to minimize the service interruption time.
OceanBase Database allows you to create, maintain, manage, and monitor one or more standby clusters. A standby cluster serves as a hot backup for the production database. You can allocate resource-intensive reporting operations to standby clusters to improve system performance and resource efficiency.
High-efficiency storage engine
The storage engine of OceanBase Database is built on the LSM-tree architecture. Data in the storage engine is divided into MemTables (also known as MemStores) and SSTables. MemTables are read/write, whereas SSTables are read-only. When you insert, delete, or update data, the data is written to a MemTable first. The transactional property is ensured by redo logs. Paxos is used to synchronize the redo logs to three replicas. When a single server is down, the data integrity is ensured by Paxos, and the high availability of data is ensured by a short recovery time.
When the size of a MemTable exceeds the specified threshold, you need to flush the data in the MemTable to a Mini SSTable to release the memory space. This process is called a mini compaction. As you write user data, the number of Mini SSTables increases. When the number of Mini SSTables exceeds the specified threshold, the system automatically triggers a minor compaction in the background. A mini compaction generates a new Mini SSTable. A major compaction is initiated when the number of mini compactions exceeds the specified threshold or during off-peak hours of a day. In a major compaction, the baseline SSTables (Major SSTables) and incremental SSTables (Mini/Minor SSTables) are merged into a new major SSTable. OceanBase Database optimizes the data storage space through mini and major compactions, providing efficient read and write services, and ensuring transactional properties and data integrity.
Multi-tenant
OceanBase Database is a distributed database that supports the multi-tenant feature. A cluster supports multiple business systems, which means that multiple tenants can share the same cluster.
The multi-tenant architecture makes it possible to make full use of system resources. The same resources can serve more business. You can deploy peak and off-peak period business systems in the same cluster to make full use of system resources. The multi-tenant architecture also ensures tenant isolation in applications and protects tenant data to prevent data breaches. In addition, it ensures stable response times and TPS/QPS for front-end applications regardless of the load on other tenants.
Oracle and MySQL compatibility
OceanBase Database supports Oracle compatibility mode and MySQL compatibility mode. You can choose a different mode based on your business needs.
Memory FAQs
What memory areas are included in OceanBase Database?
OceanBase Database includes the following memory areas:
kv cache: the cache for SSTables and database tables in the LSM-tree.memory store: the memory of MemStores in the LSM-tree.sql work area: the memory used by operator work areas during SQL execution. When the memory is insufficient, the data is written to disk.system memory: memory reserved for network I/O, disk I/O, elections, load balancing, and other purposes.
Are resources in a tenant shared or dedicated?
In a tenant, the sql work area, memory store, and kv cache are dedicated resources; the system memory is shared among tenants. The sql worker threads are isolated between tenants, while net I/O, disk I/O, and clog writer threads are not isolated.
What are the memory usage characteristics of OceanBase Database?
When OceanBase Database starts, it needs to load about several GB of memory. During operation, it will gradually apply for memory as needed until the memory_limit is reached. After an OBServer node applies for memory from the OS, the memory is typically not released or returned to the OS, but maintained in the usage list and Free List for memory management. This is the memory management mechanism designed in OceanBase Database.
Is it normal for the memory usage of OceanBase Database to approach memory_limit after running for a while?
If the memory usage of an OceanBase cluster approaches or reaches the memory_limit after running for a while, this behavior is as expected.
However, if the parameter memory_chunk_cache_size is set, the OBServer node will attempt to return memory blocks larger than memory_chunk_cache_size in the Free List to the OS to increase the reuse rate of memory in the OS and reduce the risk of timeout in memory operations caused by slow RPCs. Typically, the memory_chunk_cache_size does not need to be configured. In specific scenarios, you can perform a scenario analysis with OceanBase Database to determine whether to dynamically adjust the value of memory_chunk_cache_size.
Can the memory upper limit of an OBServer node be dynamically adjusted?
Yes. You can dynamically adjust the memory_limit or memory_limit_percentage to adjust the memory upper limit of an OBServer node. However, before you adjust the parameters, make sure that the available memory resources are sufficient and the target memory upper limit is not lower than the total memory allocated (at the time of tenant creation) to all tenants and 500 tenants. The unit of memory_limit is MB. For example, to set the memory upper limit of an OBServer node to 64 GB, you can use the statement memory_limit ='64G' or the value memory_limit = 65536.
If you set memory_limit to 0, you can use the memory_limit_percentage to flexibly limit the memory usage of the OBServer node in percentage.
Is memory overallocation allowed when you define resource units and resource pools in OceanBase Database?
In OceanBase Database, the Root Service is responsible for allocating resources (units). When the Root Service allocates a unit, it determines whether memory can be overallocated based on the value of resource_hard_limit (resource_hard_limit indicates the percentage of memory that can be overallocated. A value greater than 100 indicates that memory can be overallocated). The Root Service then allocates resources to the unit based on the defined resource units.
However, if the resources are relatively tight, the system allows controlled resource contention among different tenants. When you configure a tenant, CPU resources can be overallocated. If CPU overallocation takes effect, when the load of the OceanBase cluster increases, the load on the tenant can exceed the configured specifications, resulting in competition for CPU resources among different tenants. This, in turn, slows down the business scenarios of the tenants. If memory is overallocated, the total memory specifications of tenants can exceed the value of memory_limit when the tenants are created, but the actual memory usage of OceanBase Database is still constrained by memory_limit. For example, if the total memory usage of running tenants exceeds the value of memory_limit, the tenants will encounter memory overrun errors or even OOM errors.
What checks does OceanBase Database perform during a memory allocation?
The OceanBase Database kernel limits the size of a single memory allocation to 4 GB, which is an internal restriction to avoid unreasonable memory allocation. Apart from this, each time the kernel allocates memory, it also performs the following checks. If any of the corresponding errors occur, analyze the issue based on the relevant information:
| Memory Limit | Error Keyword in observer.log | Troubleshooting Approach |
|---|---|---|
| Upper limit on the memory of a specific context (context) in a tenant | ctx memory has reached upper limit |
The memory of a specific context in the current tenant has reached the upper limit. Check which modules occupy an abnormal amount of memory in this context. Note that only some contexts have this restriction. For example, WORK_AREA has this restriction, while other contexts, together with general memory, are constrained by the memory limit of the tenant. |
| Upper limit on the memory of a tenant | tenant memory has reached the upper limit |
The memory usage of the current tenant has reached the upper limit. Check the memory usage of contexts in the current tenant to identify the unexpected memory usage. Then, focus on contexts with unexpected memory usage for further troubleshooting. |
| Upper limit on the memory of OceanBase Database | server memory has reached the upper limit |
The total memory usage of OceanBase Database has reached the upper limit. Check the memory usage of tenants to identify unexpected memory usage. Then, focus on tenants with unexpected memory usage for further troubleshooting. |
| Upper limit on the physical memory | physical memory exhausted |
Generally, this error occurs due to insufficient physical memory application. It can be related to the deployment mode and parameters. Check the size of physical memory, the configuration of observer memory_limit, the number of OBServer nodes running on the physical machine, and other memory-consuming processes to analyze the breakdown of physical memory usage. |
What are the different types of KVCache in OceanBase Database, and what are their purposes?
You can query the GV$OB_KVCACHE view in OceanBase Database to view the different types of KVCache. Generally, they include the following types:
BloomFilter Cache: OceanBase Database's BloomFilter is built on macroblocks. It is automatically built on a macroblock when the number of missed queries exceeds a specified threshold. The BloomFilter is then cached.Row Cache: It caches specific data rows. During a GET or MULTI_GET query, the corresponding data rows are often cached in theRow Cache. This greatly enhances the performance of hot row queries.Block Index Cache: It caches the indexes of microblocks. Similar to the intermediate layer of a B-tree, theBlock Index Cachehas a high hit ratio because the intermediate layer is usually small.Block Cache: This is theBuffer Cacheof OceanBase Database. It caches specific data blocks. In fact, data blocks in theBlock Cacheare decompressed microblocks.Partition Location Cache: It caches the location information of partitions to help route a query.Schema Cache: It caches the metadata of data tables for execution plan generation and subsequent queries.clog Cache: It caches clog data to accelerate the pulling of Paxos logs in some scenarios.
How does KV Cache achieve dynamic scaling, and what are the eviction rules?
The main part of dynamically scalable memory is the KVCache. OceanBase Database manages most cache memories in the KVCache.
Generally, you do not need to configure the KVCache. In special scenarios, you can set the priorities of different types of KV. KVs with higher priorities are more likely to be retained in the cache. If you want to change the default priorities, contact OceanBase Database technical support engineers.
What common memory issues may occur in OceanBase Database, and what are the possible causes?
OceanBase Database may encounter the following common memory issues:
Out of memory in the work area
The memory size of the work area
limit = tenant memory * ob_sql_work_area_percentage (default value: 5%). If a large number of concurrent requests each occupy a substantial amount of memory in the work area, an "out of memory" error may occur in the work area. Scenarios where this error frequently occurs include UNION, SORT, and GROUP BY operations. You can increase the work area size by using the following method:set global ob_sql_work_area_percentage = 10.Insufficient tenant memory
An error message
"Over tenant memory limits"(error code 4013) is returned to the client, indicating that the tenant memory is insufficient. Generally, you need to analyze the current memory usage to identify the cause.You can use the
GV$OB_MEMORYview or check the logs in theobserver.logfile to identify the module that consumes a large proportion of memory.MemStore memory exhaustion
If data is written to the database at a higher speed than it is dumped from the MemStore, the MemStore will be exhausted. For example, in high-concurrency data import, the MemStore is filled up quickly due to rapid write operations, and the system is unable to dump the data in time. As a result, the MemStore is exhausted and error 4030 is returned to the user.
You can identify and resolve this issue by analyzing the bottleneck of slow dump speed, increasing the dump speed, or decreasing the write speed.
Overall memory usage exceeds the limit on the OBServer node
When the OBServer node process starts, it calculates the maximum physical memory that the process can use based on the specified parameters. If the
memory_limitparameter is set, the memory of the OBServer node process will be directly set to the value of thememory_limitparameter. If thememory_limitparameter is set to 0, the memory limit will be calculated by using the formula:physical memory × memory_limit_percentage.If you find that the memory of the OBServer node exceeds the limit, you can query the
_all_virtual_server_statview to obtain the actual memory limit on the OBServer node, and then check whether thememory_limitandmemory_limit_percentageparameters are correctly configured. If the memory still exceeds the limit and the overall trend is increasing, you may encounter a memory leak. In this case, contact OceanBase Database technical support engineers for troubleshooting.
Frequently asked questions about multi-tenant threads
Is OceanBase Database a single-process or multi-process database?
OceanBase Database is a single-process database. The main threads in the runtime of OceanBase Database are as follows:
election worker: the election thread.net io: the network I/O thread.disk io: the disk I/O thread.clog writer: the clog writing thread.misc timer: multiple background timer threads that are mainly responsible for resource cleanup.compaction worker: the minor merge and major merge threads.sql workerandtransaction worker: the SQL and transaction request processing threads.
What are the background threads in an OBServer node, and what are they used for?
Generally, you do not need to concern yourself with the implementation details of background threads. These details may change in OBServer node version updates.
The following table describes some common background threads in an OBServer node.
| Thread name | Level | Module | Number of threads | Description |
|---|---|---|---|---|
| FrzInfoDet | Tenant | Transaction | 2 | Periodically checks for new freeze_info. |
| LockWaitMgr | Tenant | Transaction | 1 | Periodically checks the timeout period and wakes up transactions waiting for locks. |
| TenantWeakRe | Tenant | Transaction | 1 | Generates timestamps for reads from standby tenants at the tenant level. |
| TransService | Tenant | Transaction | 1 | Processes asynchronous tasks within the transaction module, such as pushing checkpoints to leaders. |
| TransTimeWhe | Tenant | Transaction | max(cpu_num/24, 2) | Handles scheduled tasks for the two-phase commit process of transactions. |
| TsMgr | Process | Transaction | 1 | The background thread for GTS: deletes unnecessary tenants and refreshes GTS for each tenant. |
| TSWorker | Process | Transaction | 1 | Processes results returned from remote GTS accesses and callbacks for transactions. |
| TxLoopWorker | Tenant | Transaction | 1 | Handles scheduled tasks in the transaction module. |
| ArbSer | Process | System | 1 | The arbitration server loads configuration parameters from the configuration file at scheduled intervals. |
| Blacklist | Process | System | 2 | Detects whether the network is connected between the source and destination servers. |
| ConfigMgr | Process | System | 1 | Refreshes configuration parameters. |
| L0_G0 | Tenant | System | 2+min_cpu * cpu_quota_concurrency | Processes most requests of the tenant. |
| L2_G0 | Tenant | System | 1 | Specifically handles requests with a nesting level of 2 |
| L3_G0 | Tenant | System | 1 | Specifically handles requests with a nesting level of 3 |
| L4_G0 | Tenant | System | 1 | Specifically handles requests with a nesting level of 4 |
| L5_G0 | Tenant | System | 1 | Specifically handles requests with a nesting level of 5 |
| L6_G0 | Tenant | System | 1 | Specifically handles requests with a nesting level of 6 |
| L7_G0 | Tenant | System | 1 | Specifically handles requests with a nesting level of 7 |
| L8_G0 | Tenant | System | 1 | Specifically handles requests with a nesting level of 8 |
| L9_G0 | Tenant | System | 1 | Specifically handles requests with a nesting level of 9 |
| LuaHandler | Process | System | 1 | Processes Lua requests in emergency scenarios to read the internal state of the observer process. |
| MemDumpTimer | Process | System | 1 | Periodically prints MEMORY logs. |
| MemoryDump | Process | System | 1 | Periodically statistics memory information. |
| MultiTenant | Process | System | 1 | Manages CPU allocations among tenants for resource scheduling. |
| OB_PLOG | Process | System | 1 | Asynchronously prints diagnostic logs of the observer process. |
| pnio | Process | System | The value of the net_thread_count parameter in the new network framework's pkt-nio mode. | Network I/O threads in the new network framework's pkt-nio mode. |
| pnlisten | Process | System | 1 | Listens to the RPC port and forwards RPC connections to network I/O threads. |
| SignalHandle | Process | System | 1 | Signal handling thread. |
| SignalWorker | Process | System | 1 | Asynchronous signal processing thread. |
| L0_G2 | Tenant | Election | min_cpu, at least 8 | Threads dedicated to handling election requests |
How does an OBServer node isolate CPU resources among tenants?
In an earlier version of OBServer node, the CPU resources of an OBServer node are controlled by limiting the number of active threads (actually consuming CPU resources). When an OBServer node creates a tenant, it specifies the tenant resource pool. The max_cpu parameter in the resource pool definition specifies the maximum number of active threads allowed for the tenant, achieving CPU resource isolation among tenants. In the latest version of OBServer node, the kernel implements cgroup to effectively control and limit CPU, memory, and resource usage. This requires operating system-level configuration. OceanBase Database is integrating this capability into the tool platform, so that cgroup-based resource isolation and control can be directly achieved on future OBServer nodes.
How can I read the number of worker threads of an OBServer node? Does an OBServer node dynamically start new threads to handle loads when it is under high load?
In a log fragment with the keyword dump tenant info in observer.log, the current and maximum values of worker threads are described. Specifically:
tocken_count = allocated_cpu_count (>min_cpu&&<max_cpu) _cpu_quota_concurrency.
For example, on a server running OceanBase Database, Tenant T1 is configured with unit_min_cpu = 2, unit_max_cpu=8. However, the CPU resources on the server are oversold. T1 is actually allocated 5 CPU cores. In this case, token_count = 5_ cpu_quota_concurrency.
What is the principle behind CPU resource allocation for large queries? Does OceanBase Database preempt CPU resources from OLTP tasks when both OLAP and OLTP tasks are running?
You can configure the parameter large_query_threshold when using OceanBase Database to define a query that exceeds the threshold as a large query. OceanBase Database allocates some CPU resources to large queries and limits the maximum percentage of active threads that a tenant can allocate to large queries through the parameter large_query_worker_percentage (default value is 30%).
To ensure that there are sufficient CPU resources for executing OLTP (e.g., transactional small-scale tasks) loads, OceanBase Database limits the number of active threads that a large query can allocate. This way, the large query can obtain sufficient CPU resources to be executed within a response time, thus ensuring that sensitive OLTP loads can also obtain sufficient CPU resources to be executed quickly. Note that OceanBase Database allows the allocation of CPU resources to large queries and OLTP tasks, so you need to set the large_query_threshold parameter to a reasonable value rather than an excessively large value. Otherwise, the large query will easily preempt CPU resources from the system, causing slow responses or even queueing delays of OLTP tasks.