FAQs about the architecture and features of the product
Do I need one or multiple instances?
If multiple subsystems are deployed and each subsystem does not interact with others at the database level, it is recommended to deploy each subsystem in a separate instance.
How do users use OceanBase Database?
OceanBase Database provides an SQL interface, and users can access and operate the database by using the SQL language.
Does OceanBase Database support Java Persistence API (JPA)?
Java Persistence API (JPA) is a set of object-relational mapping (ORM) specifications in Java standards. With JPA, you can describe the mappings between objects and relational tables by using annotations or XML, and persist entity objects to a database (namely, mapping the object model to the data model). OceanBase Database is a native distributed relational database independently developed by Alibaba and Ant Group. It does not build on any open-source product. Therefore, there is no conflict between OceanBase Database and JPA.
What level of database management do data files belong to?
At present, OceanBase Database has two types of data files, both of which belong to the cluster level:
- Data files: These files store data of all partitions, including the checkpoint (CP) data of all partitions.
- Clog-related files: These files include clogs (also known as
redo logsor write-ahead logs, or WAL logs) and their index files.
How does OceanBase Database support HTAP?
OceanBase Database has a distributed computing engine that allows multiple computing nodes to run OLTP applications and complex OLAP applications simultaneously. This enables OceanBase Database to use one computing engine to handle mixed workloads and to allow users to resolve 80% of their issues with one system. This helps users make full use of their computing resources and reduce the costs of purchasing additional hardware resources and software licenses.
What are instances and tenants, and how is the relationship between them?
OceanBase Database is a multi-tenant system. An instance is a tenant in OceanBase Database. Data cannot be accessed between tenants.
What is the relationship between the number of servers and the performance of OceanBase Database?
The TPC-C benchmark test report of OceanBase Database shows that the system can achieve linear scaling in most scenarios.
What are the considerations when I use OceanBase Database for development?
Here are some considerations for your reference:
- Pay special attention to memory usage when importing a large amount of data.
- If you want the index to take effect quickly, we recommend that you include the index statement when you create a table.
- We recommend that you use mysql-connector-java 5.1.30 or later.
- Column type modification is restricted. The length of a column of the VARCHAR type can only be increased and cannot be shortened.
- If a connection is idle for more than 15 minutes, the server will proactively disconnect it. If you use a connection pool, set the maximum idle time of a connection. For example, the
minEvictableIdleTimeMillisparameter of the Druid connection pool is less than 15 minutes.
How does OceanBase Database achieve higher compression and lower space usage compared with traditional databases?
OceanBase Database uses data encoding, a compression technique that is tailored for data. Data encoding generates a series of encoding schemes based on the value ranges and types of fields in a relational table within the database. Because it understands the data, data encoding can achieve higher compression efficiency compared with general compression algorithms.
How much data is required for AP capability of OceanBase Database?
Specifically, according to business needs, it ranges from 100G to PB level without any limits. AP is a capability provided by the OceanBase database. When the data volume is relatively small, the parallelism can be set lower to use fewer computing resources; when the data scale is larger, more computing resources can be allocated. Essentially, there is no strict limit on the data size. The capability of AP mainly lies in its ability to fully utilize the machine's hardware resources and generate efficient parallel execution plans.
What is the level of support for standard SQL in the latest version of OceanBase Database?
Most business applications in MySQL mode can be smoothly migrated with no modifications or only slight modifications. Oracle mode supports most Oracle features. Therefore, you can smoothly migrate your Oracle database to OceanBase Database with slight modifications.
What is the cost of migrating a business application from a MySQL database to OceanBase Database?
OceanBase Database is compatible with general MySQL features and frontend and backend protocols. Therefore, you can migrate your business application from a MySQL database to OceanBase Database with zero or slight modifications.
How does OceanBase Database, which is a database renowned for TP, perform in AP?
OceanBase Database uses technologies such as hybrid row-column storage, compilation for execution, vectorized processing, cost-based query rewriting, and optimization. It leverages its high scalability to deliver cutting-edge real-time analysis capabilities in AP. For offline big data processing, distributed big data solutions such as Spark are a better choice.
When you use OceanBase Database as a distributed database, is there any difference from connecting to a traditional database?
As a distributed database, OceanBase Database can deploy multiple replicas across servers. To minimize cross-server data access, OceanBase Database provides obproxy. As a dedicated reverse proxy server for OceanBase Database, obproxy offers high-performance and high-accuracy routing and forwarding services to frontend user requests and provides high availability and scalability for backend servers. Compared with proxy servers in other databases, obproxy, which is designed based on the asynchronous framework and stream-based forwarding, and adopts memory solutions based on FastParse and LockFree, can process millions of QPS with limited resources and offer efficient O&M support for a large number of deployed proxy servers.
What are the technical features of OceanBase Database's architecture?
As a native distributed database, OceanBase Database has the following technical features:
Elastic scaling
OceanBase Database supports online elastic scaling. When the storage capacity or processing capability of a cluster is insufficient, you can add new OBServer nodes to the cluster. The system automatically migrates data and distributes appropriate partitions to the new nodes based on the processing capability of each node. Similarly, when the system capacity and processing capability are abundant, you can remove nodes to reduce costs. For example, during the Double 11 Grand Sale, the database can provide excellent elastic scaling capabilities.
Load balancing
OceanBase Database manages multiple OBServer nodes in an OBServer cluster to provide data services to multiple tenants. All OBServer nodes managed by the OceanBase cluster can be regarded as a large "resource cake". When allocating resources, the system allocates resources based on the resource requirements of tenants. OceanBase Database ensures that the resource usage of multiple tenants in the OBServer cluster is balanced. In dynamic scenarios such as adding or removing OBServer nodes or tenants or data partition imbalances during partition addition or deletion, the load balancing algorithm can still balance the resources on existing nodes.
OceanBase Database uses Root Service to manage the load balancing among nodes. Different types of replicas have different resource requirements. When Root Service performs partition management operations, it considers the CPU, disk, memory, and IOPS usage of each OBServer node. It avoids having all partitions of a table distributed to a few OBServer nodes. It places a memory-consuming replica and a memory-efficient replica on the same OBServer node. It places a disk-space-consuming replica and a disk-space-efficient replica on the same OBServer node. After load balancing, the various types of resources on all nodes are relatively balanced, making full use of all resources on each node.
Distributed transaction ACID feature
The ACID feature of transactions in the OceanBase database architecture is implemented as follows:
- Atomicity: The snapshot transaction's atomicity is ensured by the two-phase commit protocol.
- Consistency: The consistency of the transaction is ensured.
- Isolation: The multiversion mechanism is used for concurrency control.
- Durability: The transaction logs use the Paxos protocol for multi-replica synchronization.
High availability
Each partition in OceanBase Database maintains multiple replicas. Log synchronization between these replicas is achieved through the Paxos protocol. Each partition and its replicas form an independent Paxos group, with one of the replicas serving as the leader and the others as followers. Some partitions of an OBServer node serve as leaders and some as followers. If an OBServer node fails, the partitions that serve as followers are not affected. The write service of the partitions that serve as leaders is interrupted for a short time. Then, a follower is elected as the new leader through the Paxos protocol. The entire process takes no more than 30s. The Paxos protocol ensures high availability and performance in scenarios of data consistency.
In addition, OceanBase Database supports the primary/standby database architecture. The multi-replica mechanism of an OceanBase cluster provides rich disaster recovery capabilities. It can automatically switch to a standby IDC or region in the case of failures at the server, IDC, or region level, without data loss (RPO=0). If the primary cluster becomes unavailable due to planned or unplanned conditions (such as the failure of the majority of replicas), the standby cluster can take over the services. It offers lossless switchover (RPO=0) and lossy switchover (RPO>0) disaster recovery capabilities to minimize the service interruption time.
You can create, manage, maintain, and monitor one or more standby clusters in OceanBase Database. A standby cluster serves as the hot backup of the production database. You can allocate resource-intensive reporting operations to a standby cluster to improve system performance and resource efficiency.
High-efficiency storage engine
The storage engine of OceanBase Database is built on the LSM-tree architecture. Data in the storage engine is divided into MemTables (also known as MemStores) and SSTables. MemTables are read/write, while SSTables are read-only. When you insert, delete, or update data, the data is written to a MemTable first. The transactional property is ensured by redo logs. Paxos is used to synchronize redo logs among three replicas. When a server node fails, the integrity of data is ensured by using the Paxos protocol. The data's high availability is ensured with a short recovery time.
When the size of a MemTable exceeds the specified threshold, you need to flush the data in the MemTable to a Mini SSTable to release the memory space. This process is called a mini compaction. As you write user data, the number of Mini SSTables increases. When the number of Mini SSTables exceeds the specified threshold, the system automatically triggers a minor compaction in the background. A mini compaction generates a new Mini SSTable. A major SSTable is the baseline SSTable in a minor compaction. Minor SSTables and incremental SSTables generated during the minor compactions are merged into a new major SSTable during a major compaction. OceanBase Database optimizes the data storage space through mini and major compactions, providing efficient read and write services, ensuring transactionality, and data integrity.
Multi-tenant
OceanBase Database is a distributed database that supports the multi-tenant feature. One cluster can serve multiple business systems. The multi-tenant architecture makes it possible to make full use of system resources. This allows the same resources to serve more business. You can deploy different business systems in one cluster to make full use of system resources. The multi-tenant architecture also ensures the isolation between tenants in applications and protects tenant data to prevent data breaches.
Oracle and MySQL compatibility
OceanBase Database supports Oracle compatibility mode and MySQL compatibility mode. You can choose the appropriate mode based on your business needs.
Memory FAQs
What memory areas are included in OceanBase Database?
OceanBase Database includes the following memory areas:
KV cache: the cache for SSTables and database tables in the LSM-tree.Memory store: the memory of MemStores in the LSM-tree.SQL work area: the memory used by operator work areas during SQL execution. When the memory is insufficient, the data is written to disk.System memory: memory reserved for features such as network I/O, disk I/O, election, and load balancing.
Which resources in a tenant are shared and which are not?
In a tenant, the sql work area, memory store, and kv cache are dedicated resources, whereas system memory is shared among tenants. SQL worker threads are isolated between tenants, but net I/O, disk I/O, and clog writer threads are not isolated.
What are the memory usage characteristics of OceanBase Database?
When OceanBase Database starts, it needs to load about several GB of memory. During operation, it will gradually apply for more memory as needed, up to the memory_limit. After an OBServer node applies for memory from the OS, the memory is typically not released back to the OS, but maintained in the usage list and Free List for memory management. This is the memory management mechanism designed in the design of OceanBase Database.
Is it normal for the memory usage of OceanBase Database to approach memory_limit after running for a while?
If the memory usage of an OceanBase cluster approaches or reaches the memory_limit after running for a while, this behavior is as expected.
However, if the parameter memory_chunk_cache_size is set, the OBServer node will attempt to return memory blocks larger than memory_chunk_cache_size in the Free List to the OS to increase the reuse rate of the Free List within OceanBase Database and reduce the risk of timeout in memory operations that trigger RPCs. This can speed up memory operations and reduce the likelihood of memory-related errors. Typically, there is no need to configure the memory_chunk_cache_size parameter. In specific scenarios, you can perform a scenario analysis with OceanBase Database to determine whether to dynamically adjust the value of this parameter.
Can the memory upper limit of an OBServer node be dynamically adjusted?
Yes. You can dynamically adjust the memory_limit or memory_limit_percentage to change the memory upper limit of an OBServer node. However, before you adjust the parameters, make sure that the available memory resources are sufficient and the target memory upper limit is not lower than the total memory allocated (at the time of tenant creation) to all tenants and 500 tenants. The unit of memory_limit is MB. For example, to set the memory upper limit of an OBServer node to 64 GB, you can use the statement memory_limit ='64G' or memory_limit = 67108864.
On the other hand, you can use memory_limit to set a parameter that limits the effective memory of the OBServer node. If you set memory_limit to 0, you can use the memory_limit_percentage parameter to flexibly constrain the memory usage of the OBServer node in percentage.
Is memory overallocation allowed when you define resource units and resource pools in OceanBase Database?
In OceanBase Database, the Root Service is responsible for allocating resources (units). When the Root Service allocates a unit, it determines whether to overallocate memory based on the value of resource_hard_limit (resource_hard_limit indicates the percentage of memory overallocation. A value greater than 100 indicates that memory can be overallocated). The Root Service then allocates resources to the unit based on the defined resource units.
However, if the resources are relatively tight, the system allows controlled resource contention among different tenants. When you configure a tenant, you can overallocate CPU resources. If CPU overallocation takes effect, when the load of the OceanBase cluster increases, the system may experience overload. In this case, threads from different tenants compete for CPU resources, which directly results in slower business processing in the tenant. If memory is overallocated, the total memory of the tenants can exceed the value of memory_limit when the tenants are created. However, the memory usage of OceanBase Database is still constrained by memory_limit. For example, if the total memory consumed by the running tenants exceeds memory_limit, the tenants will encounter memory overrun errors or even experience OOM, where processes are directly terminated.
What checks does OceanBase Database perform during a memory allocation?
The OceanBase Database kernel limits the size of a single memory request to 4 GB, which is an internal limit added to optimize memory usage and prevent unreasonable memory allocation. In addition, the kernel checks the following conditions each time it allocates memory. If any of the following errors occur, analyze the issue based on the corresponding error message:
| Memory limit | Error keyword in observer.log | Troubleshooting tip |
|---|---|---|
| Upper limit on a context (such as a session or a statement) within a tenant | ctx memory has reached upper limit |
The memory usage of a specific context within the tenant has reached the upper limit. Check which modules are occupying an abnormal amount of memory in this context. Note that only some contexts, such as WORK_AREA, have an upper limit. The memory of other contexts is jointly constrained by the memory limit of the tenant. |
| Upper limit on memory of a tenant | tenant memory has reached the upper limit |
The memory usage of the tenant has reached the upper limit. Check the memory usage of contexts within the tenant to identify the cause of the issue. |
| Upper limit on memory of OceanBase Database | server memory has reached the upper limit |
The total memory usage of OceanBase Database has reached the upper limit. Check the memory usage of tenants to identify the cause of the issue. |
| Upper limit on physical memory | physical memory exhausted |
Generally, this error occurs due to insufficient physical memory, which is usually related to the deployment mode and parameters. Check the size of physical memory, the configuration of observer memory_limit, the number of OBServer nodes running on the physical machine, and other memory-consuming processes to analyze the entire physical memory usage. |
What are the different types of KVCache in OceanBase Database, and what are their purposes?
You can query the GV$OB_KVCACHE view for the different types of KVCache in OceanBase Database. Generally, they include the following types:
BloomFilter Cache: OceanBase Database's BloomFilter is built on macroblocks. It is automatically built on a macroblock when the number of missed queries exceeds a specified threshold. The BloomFilter is then cached.Row Cache: It caches specific data rows. During a GET or MultiGet query, the corresponding data rows are often cached in theRow Cache. This greatly enhances the performance of hot row queries.Block Index Cache: It caches the indexes of microblocks. Similar to the intermediate layer of a B-tree, the cache hit ratio of theBlock Index Cacheis generally high.Block Cache: This is the same as the buffer cache in OceanBase Database. It caches specific data blocks. In fact, the data blocks in theBlock Cacheare decompressed.Partition Location Cache: It caches the location information of partitions to help route a query.Schema Cache: It caches the metadata of data tables for execution plan generation and subsequent queries.Clog Cache: It caches clog data to accelerate the pull of Paxos logs in some scenarios.
How does KV Cache achieve dynamic scaling, and what are the eviction rules?
The main part of dynamically scalable memory is KVCache. OceanBase Database manages most cache memories in the KVCache.
Generally, you do not need to configure the KVCache. In special scenarios, you can set the priorities of different types of KV. KVs with higher priority are more likely to be retained in the cache. If you want to modify the default priorities, contact OceanBase Database technical support.
What common memory issues may occur in OceanBase Database, and what are the possible causes?
The following common memory issues may occur in OceanBase Database:
Out of memory in the work area
The memory for the work area
limit = tenant memory * ob_sql_work_area_percentage (default 5%). If a large number of concurrent requests each occupy a substantial amount of memory in the work area, an out-of-memory error may occur in the work area. This often happens during operations such as UNION, SORT, and GROUP BY.You can increase the work area size by using the following method:
set global ob_sql_work_area_percentage = 10.Insufficient tenant memory
An error with the code of 4013, "Over tenant memory limits", is returned to the client. Generally, you need to analyze the current memory consumption to identify the cause of the issue.
You can query the
GV$OB_MEMORYview or check theobserver.logfile to see whether a module has consumed a significantly large proportion of the memory.MemStore memory exhausted
If data is written to the database at a higher speed than it is compacted, the MemStore will be exhausted. For example, during high-concurrency data import, the write speed of MemStore is too fast, and the system fails to compact the MemStore in time, the MemStore will be exhausted and an error with the code of 4030 will be returned to the user.
You can identify and resolve this issue by analyzing the bottleneck of slow compaction, increasing the compaction speed, or reducing the write speed.
Overall memory usage of the OBServer node exceeds the limit
When the OBServer node process starts, it calculates the maximum physical memory that the process can use based on the specified parameters. If the
memory_limitparameter is set, the memory used by the OBServer node process is limited to the value ofmemory_limit. If thememory_limitparameter is set to 0, the memory usage is limited to the value calculated based on the formulaphysical memory × memory_limit_percentage.If you find that the memory usage of the OBServer node exceeds the limit, you can query the
_all_virtual_server_statview for the actual memory limit in the OBServer node and then check whether thememory_limitandmemory_limit_percentageparameters are correctly configured. If the memory usage continues to grow overall, you may be experiencing a memory leak. In this case, contact OceanBase Database technical support.
Frequently asked questions about multi-tenant threads
Is OceanBase Database a single-process or multi-process database?
OceanBase Database is a single-process database. The main threads of the database are as follows:
election worker: the election thread.net io: the network I/O thread.disk io: the disk I/O thread.clog writer: the clog writing thread.misc timer: the resource cleanup thread.compaction worker: the minor and major merge threads.sql workerandtransaction worker: the SQL and transaction request processing threads.
What are the background threads in an OBServer node, and what do they do?
Generally, you do not need to manually manage the background threads. Their behavior may change in different versions of OBServer nodes.
The following common background threads in an OBServer node are listed for your reference.
| Thread name | Level | Module | Number of threads | Description |
|---|---|---|---|---|
| FrzInfoDet | Tenant | Transaction | 2 | Periodically checks whether new freeze_info is generated. |
| LockWaitMgr | Tenant | Transaction | 1 | Periodically checks the timeout period and wakes up transactions waiting for locks. |
| TenantWeakRe | Tenant | Transaction | 1 | Thread for generating timestamp for standby tenants at the tenant level. |
| TransService | Tenant | Transaction | 1 | Processes asynchronous tasks from the transaction module, such as pushing checkpoints to logs. |
| TransTimeWhe | Tenant | Transaction | max(cpu_num/24, 2) | Handles scheduled tasks for two-phase commit in transactions. |
| TsMgr | Process | Transaction | 1 | Thread for handling background tasks in GTS, such as deleting unused tenants and refreshing GTS for tenants. |
| TSWorker | Process | Transaction | 1 | Retrieves results of remote GTS access and callbacks for transactions. |
| TxLoopWorker | Tenant | Transaction | 1 | Handles scheduled tasks in the transaction module. |
| ArbSer | Process | System | 1 | The arbitration server loads configuration parameters from the configuration file at regular intervals. |
| Blacklist | Process | System | 2 | Detects whether the network is connected between the source and destination servers for the purpose of probing and communication. |
| ConfigMgr | Process | System | 1 | Refreshes configuration parameters. |
| L0_G0 | Tenant | System | 2+min_cpu * cpu_quota_concurrency | Handles most requests of the tenant. |
| L2_G0 | Tenant | System | 1 | Specifically handles requests with a nesting level of 2 |
| L3_G0 | Tenant | System | 1 | Specifically handles requests with a nesting level of 3 |
| L4_G0 | Tenant | System | 1 | Specifically handles requests with a nesting level of 4 |
| L5_G0 | Tenant | System | 1 | Specifically handles requests with a nesting level of 5 |
| L6_G0 | Tenant | System | 1 | Specifically handles requests with a nesting level of 6 |
| L7_G0 | Tenant | System | 1 | Specifically handles requests with a nesting level of 7 |
| L8_G0 | Tenant | System | 1 | Specifically handles requests with a nesting level of 8 |
| L9_G0 | Tenant | System | 1 | Specifically handles requests with a nesting level of 9 |
| LuaHandler | Process | System | 1 | Processes Lua requests in emergency scenarios to read the internal state of the observer process. |
| MemDumpTimer | Process | System | 1 | Periodically prints MEMORY logs. |
| MemoryDump | Process | System | 1 | Periodically statistics memory information. |
| MultiTenant | Process | System | 1 | Responsible for refreshing CPU ratios of tenants, which is used for resource scheduling. |
| OB_PLOG | Process | System | 1 | Asynchronously prints diagnostic logs of the observer process. |
| pnio | Process | System | The value specified for the net_thread_count parameter in the configuration file | Network I/O threads in the new network framework, pkt-nio. |
| pnlisten | Process | System | 1 | Listens to the RPC port and forwards RPC connections to network I/O threads. |
| SignalHandle | Process | System | 1 | Signal handling thread. |
| SignalWorker | Process | System | 1 | Asynchronous signal processing thread. |
| L0_G2 | Tenant | Election | min_cpu, at least 8 | Threads dedicated to handling election requests |
How does an OBServer node implement CPU resource isolation in a multi-tenant architecture?
In the current versions of OBServer nodes, the CPU resources are isolated among tenants by controlling the maximum number of active threads (threads actually consuming CPU resources) in each tenant. When a tenant is created, you can specify the tenant resource pool. The max_cpu parameter in the resource pool definition limits the maximum number of active threads in the tenant, thereby isolating the CPU resources among tenants. In the latest version of OBServer nodes, the kernel implements cgroup to effectively control and limit CPU, memory, and resources.
How can I read the number of worker threads of an OBServer worker thread? Does an OBServer node dynamically start new threads to handle loads when it is under high load?
In the log fragments in observer.log that are mainly identified by the keyword dump tenant info, the current and maximum values of the worker threads are described. Specifically, the information is described as follows:
tocken_count = allocated_cpu_count (>min_cpu&&<max_cpu) _cpu_quota_concurrency.
For example, in a server that runs OceanBase Database, a tenant T1 is configured with unit_min_cpu = 2, unit_max_cpu=8. However, the CPU resources on the server are oversold. Actually, only five CPU cores are allocated to T1. In this case, token_count = 5_ cpu_quota_concurrency.
What is the principle behind CPU resource allocation for large queries? Will CPU resources be preempted when both OLAP and OLTP operations are running?
You can configure the parameter large_query_threshold when using OceanBase Database to define a query that exceeds the specified execution time threshold as a large query. OceanBase Database allocates some CPU resources to large queries and limits the maximum number of active worker threads that can be used by large queries through the parameter large_query_worker_percentage (default value: 30%).
To ensure sufficient CPU resources are available for transactional small-scale operations (e.g., OLTP), such as transactions, OceanBase Database limits the number of active worker threads that can be used by large queries. This way, enough CPU resources are reserved for OLTP. Additionally, OceanBase Database can allocate CPU resources to large queries and OLTP, and therefore the parameter large_query_threshold should be set to a reasonable value rather than an excessively large one. Otherwise, large queries will easily occupy most of the CPU resources, causing slow responses or queueing delays for OLTP.