Product architecture and features
Do I need one instance or multiple instances?
If your application has two or more subsystems that do not interact with each other in the database layer, we recommend that you deploy each subsystem in a separate instance.
How do I use OceanBase Database?
Like conventional databases, OceanBase Database provides an SQL interface to enable you to access and manipulate OceanBase Database by using SQL statements.
Does OceanBase Database conflict with JPA?
Java Persistence API (JPA) is a set of object-relational mapping (ORM) specifications in the Java standard. With the JPA technique, you can describe the mapping between objects and relational tables by using annotations or XML and persist entity objects to the database. In this way, you complete the mapping between an object model and a data model. OceanBase Database is a native distributed relational database that is developed fully in-house by Alibaba and Ant Group. It depends on no open-source services. Therefore, OceanBase Database does not conflict with JPA.
What is the database management level for data files?
OceanBase Database files are divided into data files and clog-related files, and they both belong to the cluster level.
- Data files: save the data of various partitions, including the checkpoints of the partitions.
- Clog-related files: include clogs (also known as
REDOor WAL logs) and their index files.
How does OceanBase Database support HTAP?
The innovative distributed computing engine of OceanBase Database allows multiple computing nodes in the system to run online transaction processing (OLTP) and online analytical processing (OLAP) applications at the same time. OceanBase Database uses one computing engine to support hybrid load balancing so that you can use only one system to meet 80% of your needs. This allows you to fully utilize your computing resources and helps you save hardware and software license costs.
What is an instance? What is a tenant? What are the relationships between them?
OceanBase Database is a multi-tenant system. An instance is a tenant in an OceanBase cluster. Data of different tenants are isolated. In other words, data in one tenant cannot be accessed from another tenant.
What is the relationship between the performance of OceanBase Database and the number of servers?
Based on the TPC-C benchmark report of OceanBase Database, the database performance linearly increases with the number of servers in the same scenario.
What do I need to know before I use OceanBase Database in development?
The following list describes some points that you need to know in the development process:
- To import a large amount of data, pay special attention to memory usage.
- The index takes a long time to take effect. Therefore, we recommend that you include the index statement when you create a table.
- After you create a table, you cannot change its primary key. To change the primary key, you can only delete the table and create a new one.
- We recommend that you use mysql-connector-java 5.1.30 or later.
- Column type modifications are subject to many limits. For example, you can increase the length of a VARCHAR string but cannot shorten it.
- The server automatically closes connections that have been idle for more than 15 minutes. When you use a connection pool, you must set a maximum idle time for connections. For example, set the
minEvictableIdleTimeMillisparameter of Druid to a value smaller than 15 minutes.
How does OceanBase Database save more space than conventional databases do?
OceanBase Database achieves a high compression ratio by using data encoding and compression techniques. Data encoding provides a series of encoding methods based on the value range and type of different fields in the database relational tables. Data encoding "knows" data better than general compression algorithms and therefore achieves a higher compression ratio.
What is the data volume required for analytical processing in OceanBase Database?
That depends on business requirements. The analytical processing capability of OceanBase Database is as suitable for handling 100 GB of data as it is for handling several PB of data. Analytical processing is a capability provided by OceanBase Database. When the data volume is small, set the degree of parallelism (DOP) to a small value to reduce computing resource usage. When the data volume is large, set the DOP to a large value to improve the computing capability, but this increases resource usage. OceanBase Database does not have any limit on data volume. The analytical processing capability of OceanBase Database is embodied in the capability of fully utilizing hardware resources and creating effective concurrency plans.
What is the level of support for standard SQL in the latest version of OceanBase Database?
Most MySQL application systems can be smoothly migrated to OceanBase Database. In Oracle mode, basic Oracle features are supported. Application systems running on Oracle can be smoothly migrated to OceanBase Database with only minor modifications.
What is the cost to migrate an application that runs on MySQL to OceanBase Database?
OceanBase Database is compatible with general MySQL features and frontend and backend protocols of MySQL. You can migrate applications from MySQL to OceanBase Database with zero or minimal modifications.
How is the analytical processing capability of OceanBase Database, which is known for its superb transactional processing capability?
The analytical processing capability of OceanBase Database is built on hybrid row-column storage, compiled execution, vector engine, and cost-based query rewriting and optimization. These features, along with the high scalability of OceanBase Database, make it one of the best in the industry in terms of real-time analysis. Also, for offline big data processing, big data solutions such as Spark are a better choice.
What are the differences between OceanBase Database, a distributed database, and conventional databases in connecting to business applications?
As a distributed database, OceanBase Database may distribute replicas on different servers. OceanBase Database uses OceanBase Database Proxy (ODP) to minimize cross-IDC data access. ODP is a reverse proxy server particularly designed for OceanBase Database. It provides high-performance and highly accurate routing services for frontend user requests and highly available and scalable disaster recovery capabilities for backend servers. Unlike other database proxy servers, ODP uses an asynchronous framework and stream forwarding design based on the features of the actual standalone environment and multi-cluster deployment. It uses a fast parsing and lock-free memory solution. ODP can forward millions of queries per second (QPS) with limited resources and provide extensive and convenient O&M support when a large number of servers are involved.
What are the characteristics of the technical architecture of OceanBase Database?
As a native distributed database, OceanBase Database has the following technical characteristics:
Auto scaling
OceanBase Database supports online auto scaling. When the storage or processing capacity of a cluster is insufficient, you can add an OBServer node to the cluster. The system automatically migrates data of some partitions to the new OBServer node based on its processing capacity. Similarly, when the system capacity or processing capacity exceeds your requirements, you can remove one or more OBServer nodes from the cluster to reduce costs. For example, OceanBase Database can provide high auto scaling capacity during massive online promotions.
Load balancing capabilities
OceanBase Database manages many OBServer nodes in a cluster to provide data services for multiple tenants. All OBServer nodes in the OceanBase cluster can be considered a super-large resource cake. Resources are allocated to tenants on demand. The load balancing algorithm of OceanBase Database ensures that the resource usage by tenants in the cluster is balanced. In dynamic scenarios such as OBServer node addition or deletion, tenant addition or deletion, and data volume skew during addition or deletion, the load balancing algorithm can still balance resource usage on existing nodes. OceanBase Database uses Root Service to implement load balancing across the nodes. The resources required by replicas vary based on the replica type. Root Service considers the CPU utilization, disk usage, memory usage, and IOPS on each OBServer node when it performs partition management. This avoids distributing all partitions of the same table to a few OBServer nodes and ensures that replicas with high memory or disk usage and those with low memory or disk usage are located on the same OBServer node. To make full use of resources available on each OBServer node, Root Service balances the usage of various resources among all OBServer nodes after load balancing.
ACID for distributed transactions
OceanBase Database implements atomicity, consistency, isolation, and durability (ACID) for transactions:
- Atomicity: OceanBase Database adopts the two-phase commit protocol to ensure the atomicity of snapshot transactions.
- Consistency: OceanBase Database ensures the consistency of transactions.
- Isolation: OceanBase Database adopts multi-version concurrency control.
- Durability: OceanBase Database synchronizes transaction logs among multiple replicas by using the Paxos protocol.
High availability
OceanBase Database maintains multiple replicas for each partition. The replicas of each partition synchronize logs by using the Paxos protocol. Each partition and its replicas make up an independent Paxos group. In this group, one replica is the leader and the others are followers. Each OBServer node hosts both leaders and followers. When an OBServer node fails, services that access followers on this OBServer node are not affected. However, services that read or write the leader on this OBServer node are temporarily affected before a follower on a functioning OBServer node is elected as the new leader based on the Paxos protocol. The leader-follower switchover takes no more than 30s. The Paxos protocol allows OceanBase Database to provide high availability and performance without compromising data consistency. OceanBase Database also supports the primary/standby architecture. The multi-replica mechanism of the OceanBase cluster provides powerful disaster recovery capabilities. When a failure at the server, IDC, or city level occurs, an automatic failover is performed with no data loss, achieving a recovery point objective (RPO) of 0. If the primary cluster becomes unavailable when a majority of replicas fail, the standby cluster takes over the services and provides lossless failover (RPO = 0) and lossy failover (RPO> 0). These two disaster recovery capabilities can minimize service downtime. OceanBase Database allows you to create, maintain, manage, and monitor one or more standby clusters. A standby cluster accommodates a hot backup for the production cluster, namely the primary cluster. The administrator can allocate resource-intensive table-related operations to standby clusters to improve system performance and resource utilization.
Efficient storage engine
The storage engine of OceanBase Database is designed based on the LSM-Tree architecture, where data is divided into MemTables (also known as MemStores) and SSTables. MemTables provide read and write services and SSTables provide only read services. The inserted, deleted, and updated data is first written to a MemTable. The transactional performance is ensured based on REDO logs. Data is synchronized among the three replicas by using the Paxos protocol. When a server fails, the Paxos protocol ensures data integrity and high data availability in a short recovery period. When the size of the MemTable exceeds the specified threshold, the data in this MemTable is transferred to an SSTable to release the memory. This process is called minor compaction. A minor compaction generates a new SSTable. When the number of minor compactions exceeds the specified threshold or during off-peak hours of a day, the storage engine merges the baseline SSTables and the incremental SSTables generated by minor compactions into one SSTable. This process is called major compaction. OceanBase Database optimizes data storage, provides efficient read and write services, and ensures transactional performance and data integrity by performing minor compactions and major compactions.
Multitenancy
OceanBase Database is a distributed database that supports multitenancy. An OceanBase cluster supports multiple business systems. The multitenancy architecture allows OceanBase Database to make full use of system resources to serve more business systems. Business systems with different on-peak and off-peak hours are deployed in the same cluster to make full use of the system resources. Applications of different tenants are isolated. In terms of data security, data access across tenants is prohibited to avoid risks of data breaches. In terms of resource usage, each tenant has an exclusive resource quota. The response time, TPS, and QPS of applications in this tenant are stable and not affected by loads of other tenants.
Compatibility with Oracle and MySQL
OceanBase Database supports the Oracle and MySQL modes. You can select a mode as needed.
Memory
What are the memory areas in OceanBase Database?
The memory of OceanBase Database is divided into the following areas:
KVCache: the cache for SSTables and database tables in theLSM-Tree.Memory store: theMemStorein theLSM-Tree.SQL workarea: the memory space used by operators during SQL statement execution. The disk space is used if the SQL workarea is insufficient.System memory: the memory reserved for features such asnetwork I/O,disk I/O,election, and load balancing.
Which resources in OceanBase Database are shared by tenants and which are not?
In terms of memory resources, the SQL workarea, memory store, and KVCache are exclusive to tenants, and the system memory is shared by tenants. In terms of threads, the SQL worker threads are isolated among tenants, and the network I/O, disk I/O, and clog writer threads are not isolated among tenants.
What are the characteristics of memory usage in OceanBase Database?
OceanBase Database requires several GB of memory at startup and requests for more memory on demand during running until the memory limit specified in the memory_limit parameter is reached. Generally, the memory requested by an OBServer node from the operating system will not be released or returned to the operating system. Instead, it is maintained in the used list and free list of the memory management module. This is the memory management mechanism designed for OceanBase Database.
Is it normal if the used memory of OceanBase Database reaches the memory limit after it runs for a while?
After an OceanBase cluster runs for a while, it is expected that the used memory will reach the memory limit and the memory usage does not decrease. However, if the memory_chunk_cache_size parameter is specified, the OBServer node will attempt to return the memory chunks in the free list that exceed the value of memory_chunk_cache_size to the operating system. This increases the reuse rate of the free list inside OceanBase Database and reduces RPC timeout risks that are caused by slow memory operations. Generally, the memory_chunk_cache_size does not need to be specified. In special scenarios, you need to discuss with OceanBase Technical Support to determine to dynamically adjust the memory_chunk_cache_size parameter.
Can the memory limit be dynamically adjusted for an OBServer node?
You can dynamically adjust the memory limit of an OBServer node by changing the value of the memory_limit or memory_limit_percentage parameter. Before you adjust the memory limit, check whether memory resources are sufficient and whether the memory limit of the OBServer node is lower than the sum of memory allocated to all tenants upon tenant creation, including the sys tenant and the SYS500 tenant. The unit of memory_limit is MB. For example, to change the value of the memory_limit parameter to set the memory limit of an OBServer node to 64 GB, use the memory_limit ='64G' or memory_limit = 65536 statement. When an OBServer node is running, you can also use the memory_limit parameter to restrict effective memory parameters for the OBServer node. If you set memory_limit to 0, you can also use the memory_limit_percentage parameter to flexibly limit the memory usage of the OBServer node.
Is the over-allocation of memory resources supported for resource units and resource pools in OceanBase Database?
When you define resource units and resource pools in OceanBase Database, Root Service is responsible for resource (unit) allocation. Root Service determines whether the over-allocation of memory resources is supported based on the value of resource_hard_limit. The resource_hard_limit parameter specifies the percentage for over-allocation of memory resources. If its value is greater than 100, the over-allocation of memory resources is supported. Root Service allocates resources based on the specified resource units. If resources are insufficient, different tenant workloads of the OBServer node can run alternately on a controllable basis. When you create a tenant, you can over-allocate the CPU resources. If CPU resources are over-allocated, the OceanBase cluster will be overloaded and threads of different tenants will contend for CPU resources, which slows down the operations in business systems of the tenant. If memory resources are over-allocated, although the sum of memory specified for the tenants can exceed memory_limit when the tenants are created, the memory usage in OceanBase Database is still subject to memory_limit. For example, if the sum of used memory is about to exceed memory_limit, the memory overrun or out-of-memory (OOM) exception is returned.
What are the limits when OceanBase Database performs memory allocation?
In OceanBase Database, the memory that can be requested each time must not exceed 4 GB. This limit is invisible to users. This limit is added to the kernel to ensure proper memory usage and avoid inappropriate memory allocation. The following table describes the checks that the kernel of OceanBase Database performs during each memory request. When an error occurs, analyze the issue based on the error message.
| Memory limit | Error keywords in observer.log | Troubleshooting procedure |
|---|---|---|
| Maximum memory for a context of the tenant | ctx memory has reached upper limit |
The memory occupied by a context of the current tenant reaches the upper limit. Find the modules that have abnormal memory usage in this context. This limit applies only to part of the context modules such as WORK_AREA. Other context modules and memory are subject to the memory limit of the tenant. |
| Maximum memory for the tenant | tenant memory has reached the upper limit |
The used memory of the current tenant reaches the upper limit. Find the contexts that have unexpected memory usage in the current tenant. |
| Maximum memory for OceanBase Database | server memory has reached the upper limit |
The total used memory in OceanBase Database reaches the upper limit. Find the tenants that have unexpected memory usage. |
| Maximum physical memory | physical memory exhausted |
The physical memory requested is insufficient. This issue is usually related to the deployment method and parameters. Check the size of the physical memory, the value of the memory_limit parameter, the number of OBServer nodes running on the physical server, and the deployment of other processes that consume memory resources. |
What are the KVCaches of OceanBase Database? What are they respectively used for?
You can query the GV$OB_KVCACHE view for the KVCaches of OceanBase Database. KVCaches of OceanBase Database are divided into the following types:
BloomFilter cache: In OceanBase Database, Bloom filters are built on macroblocks and created on demand. When the number of empty queries on a macroblock exceeds the threshold, a Bloom filter is created and placed into the cache.Row cache: Data rows returned by Get/MultiGet queries may be placed in the row cache to significantly improve the query performance on frequently queried rows.Block index cache: Microblock indexes are stored in the block index cache. The block index cache is similar to the intermediate layer of a B-tree. The intermediate layer is usually small. Therefore, the hit rate of the block index cache is high.Block cache: The block cache is the buffer cache in OceanBase Database that stores data blocks. It stores decompressed microblocks.Partition location cache: The location information of partitions is stored in the partition location cache to facilitate query routing.Schema cache: The schema cache stores the metadata of tables for generating execution plans and executing subsequent queries.Clog cache: The clog cache stores clogs to accelerate Paxos log pulling under some circumstances.
How does the KVCache implement dynamic scaling and what are the eviction rules?
The KVCache is dynamically scalable. In OceanBase Database, most caches in the KV format are managed in the KVCache. Generally, you do not need to configure the KVCache. In special scenarios, you can set the priorities for different KV types. Data of a KV type with a higher priority is more likely to be retained in the cache than data of a KV type with a lower priority. To adjust the default configurations of KVCache-related priorities, contact OceanBase Technical Support.
What are the general issues regarding the memory in OeanBase Database? What are the possible causes?
General issues in OceanBase database and possible causes:
Insufficient workarea memory
The maximum memory for a workarea is calculated by using the following formula:
tenant memory × ob_sql_work_area_percentage (default value: 5%). If a large number of concurrent requests are initiated and each request consumes many memory resources of the workarea, an error indicating insufficient workarea memory may be returned. This issue frequently occurs in scenarios such as UNION, SORT, and GROUP BY. To avoid this issue, you can set theob_sql_work_area_percentageparameter to a large value. For example, you can set it to 10%.Insufficient tenant memory
The error
"Over tenant memory limits"with the error code 4013 is returned on the client, indicating that the tenant memory is insufficient. You need to analyze the consumption of memory resources. You can query theGV$OB_MEMORYview or theobserver.logfile to check for the module that consumes the largest proportion of the memory resources.MemStore depletion
If data writing is faster than minor compactions, the MemStore is easily depleted. For example, if a large amount of data is concurrently imported, the data is written too fast in the MemStore, and minor compactions are not promptly performed, the MemStore is used up and the error code 4030 is returned. To resolve this issue, find and solve the bottleneck that causes slow minor compactions to increase the minor compaction speed, or slow down data writing.
OBServer memory overrun
The observer process will calculate the maximum physical memory available based on the configured parameters at startup. If
memory_limitis set to a positive number, the maximum memory available for the observer process is the value ofmemory_limit. Ifmemory_limitis set to 0, the maximum available memory is calculated by using the following formula:physical memory × memory_limit_percentage. If OBServer memory overrun is detected, you can query_all_virtual_server_statfor the actual maximum memory for the OBServer node and then check whether thememory_limitandmemory_limit_percentageparameters are correctly specified. If memory overrun still exists and becomes increasingly serious, a memory leak may occur. In this case, contact OceanBase Technical Support for troubleshooting.
Multitenancy and threads
How does OceanBase Database manage threads?
In OceanBase Database, dynamically creating threads on an OBServer node is avoided. Basically all threads are created after an OBServer node is started. The number of threads does not change unless parameters are adjusted. The number of SQL worker threads and transaction worker threads of a tenant on a specific OBServer node is subject to the unit specification of the tenant. Other threads are specified by the corresponding parameters.
What are the principles for allocating sources to threads in OceanBase Database?
In OceanBase Database, resources are allocated to threads based on the following principles:
- At present, network and I/O resources are not controlled.
- Memory: Memory resources of different tenants are separately managed. For the SQL statement being executed, the memory allocator of the tenant to which this SQL statement belongs is used to allocate memory resources to this SQL statement.
- CPU: The number of active threads of a tenant is limited at any time. Assume that 16 CPU cores are configured for a tenant and the
cpu_quota_concurrencyparameter is set to 2. In this case, this tenant can have at most 32 (16 × 2) active SQL threads. However, the actual number of active threads is controlled by the user-mode thread scheduler of the OBServer node.
Does OceanBase Database run on a single process or multiple processes?
OceanBase Database is a single-process database and involves the following threads during running:
Election worker thread: used to elect a leader.Network I/O thread: used to process network I/Os.Disk I/O thread: used to process disk I/Os.Clog writer thread: used to write clogs.Misc timer thread: used to clear resources. It consists of multiple backend timer threads.Compaction worker thread: used to process minor merges and major merges.SQL worker threadandtransaction worker thread: used to process SQL and transaction requests.
What are the backend threads of an OBServer node? What are they respectively used for?
In most cases, you do not need to pay attention to the implementation details of backend threads. Backend threads are updated along with OBServer iterations. The following table describes part of the general backend threads and their features.
| Thread name | Module | Thread quantity | Feature |
|---|---|---|---|
| TsMgr | Trx | 1 | Refreshes the local GTS cache. |
| WeakReadSvr | Trx | 1 | Calculates the local server-level read timestamp of the standby IDC. |
| HAGtsSource | Trx | 1 | Refreshes ha_gts. The refresh interval is specified by gts_refresh_interval. |
| TransCheckpoint Worker | Trx | 1 | Used for the checkpoint operations of user transactions. |
| GCPartAdpt | Trx | 1 | Regularly checks the all_tenant_gc_partition_info table to obtain the garbage collection (GC) information of partitions. The thread helps facilitate the GC and two-phase commit of transactions. |
| PartWorker | Trx | 1 | Traverses all partitions on the local server to perform the following operations:
|
| LockWaitMgr | Trx | 1 | Wakes up a thread in the wait lock queue in hotspot row scenarios. For example, it kills a session when the execution of a statement times out. |
| ClogAdapter | Trx | 8 | Asynchronously commits clogs at the transaction layer. |
| ObTransService | Trx | 6 | Performs the following operations:
|
| GCCollector | Storage | 1 | Regularly checks whether GC operation can be performed on all partitions on the local OBServer nodes. If yes, triggers a GC operation. |
| RebuildSche | Storage | 1 | Regularly drives partition tasks that need to be rebuilt. |
| PurgeWorker | storage | 1 | Deletes MemTable labels in the background. |
| DAG | Storage | 16 | The worker thread of dag, which is used to run tasks such as the minor compaction, major compaction, and migration of partitions. |
| DagScheduler | Storage | 1 | The scheduling thread of dag. |
| STableChecksumUp | Storage | 1 | Batch submits SSTable checksum reporting tasks. |
| PartitionScheduler | Storage | 2 | Schedules minor compaction and major compaction tasks. |
| MemstoreGC | Storage | 1 | Recycles the frozen MemStore after a warm-up for minor compactions. |
| ObStoreFile | Storage | 1 | Checks for bad macroblocks and recycles macroblocks. |
| PartSerCb | Storage | 40 | Schedules 20 threads to handle the switchover of a partition leader. |
| ObPartitionSplit Worker | Storage | 1 | Splits a partition in the background. |
| FreezeTimer | Storage | 1 | Checks the MemStore memory and triggers a freeze. |
| RebuildTask | Storage | 1 | Schedules the D replica to pull data. |
| IndexSche | Storage | 1 | Schedules the index creation task. |
| FreInfoReload | Storage | 1 | Synchronizes major freeze information in internal tables. |
| TableMgrGC | Storage | 1 | Recycles MemTables to release memory. |
| BackupInfoUpdate | Storage | 1 | Regularly refreshes backup data. |
| LogDiskMon | Storage | 1 | Detects the multiple disks of the system. |
| KVCacheWash | Storage | 1 | Washes the KVCache. The detection interval is specified by the _cache_wash_interval parameter. Value range: [1 ms,1 m]. Default value: 200 ms. If you specify a value without a unit, the value is treated as milliseconds. |
| KVCacheRep | Storage | 1 | Manages memory fragments of the cache to recycle the memory. The detection interval is specified by the _cache_wash_interval parameter. Value range: [1 ms,1 m]. Default value: 200 ms. If you specify a value without a unit, the value is treated as milliseconds. |
| RSMonitor | RS | 1 | Monitors the status of the current Root Service. |
| CacheCalculator | RS | 1 | Reserves memory for the location cache and schema cache. |
| ObDailyMergeScheduler | RS | 1 | Schedules major compaction tasks. |
| ObEmptyServerChecker | RS | 1 | Detects whether an OBServer node contains replicas. |
| ObFetchPrimaryDDLOperator | RS | 1 | Synchronizes the logic of the standby database to the schema of the SYS tenant of the primary database. |
| ObFreezeInfoUpdater | RS | 1 | Speeds up the progress of freeze and manages the snapshot recovery points. |
| ObGlobalIndexBuilder | RS | 1 | Schedules the creation of global indexes. |
| ObGlobalMaxDecidedTransVersionMgr | RS | 1 | Collects the statistics of the versions of global maximum transactions. |
| ObLeaderCoordinator | RS | 1 | Manages the distribution of leader replicas in a cluster. |
| ObLogArchiveScheduler | RS | 1 | Archives the log files. |
| ObMajorFreezeLauncher | RS | 1 | Launches scheduled major compaction tasks daily. |
| ObPartitionSpliter | RS | 1 | Schedules the compaction of partitions. |
| ObRebalanceTaskMgr | RS | 1 | Schedules the execution of load balancing tasks. |
| ObRootBackup | RS | 1 | Schedules backup tasks. |
| ObRootBalancer | RS | 1 | Generates load balancing tasks. |
| ObRsGtsMonitor | RS | 1 | Monitors the distribution of GTS instances and initiates tasks to migrate or generate GTS replicas based on the distribution of GTS replicas. |
| ObRsGtsTaskMgr | RS | 1 | Executes the tasks related to GTS replicas. |
| ObHeartbeatChecker | RS | 1 | Checks the status of heartbeats between OBServer nodes. |
| ObStandbyClusterSchemaProcessor | RS | 1 | Synchronizes the replica of a standby database to the user tenant schema of the primary database. |
| ObRestoreScheduler | RS | 1 | Schedules the restoration process. |
| ObWorkQueue | RS | 4 | Executes asynchronous Root Service tasks. The number of threads is specified by the rootservice_async_task_thread_count parameter. Value range: [1,10]. |
| ObAsyncTaskQueue | RS | 16 | Executes the creation of indexes. |
| ObSwitchTaskQueue | RS | 6 | Executes the task queue for the primary/standby switchover. The number of threads is specified by the switchover_process_thread_count parameter. Value range: [1,1000]. |
| LocalityReload | RS | 1 | Regularly refreshes the locality information. |
| RSqlPool | RS | 1 | Allows the standby cluster to regularly maintain the RS_LIST of the primary cluster. |
| ServerTracerTimer | RS | 1 | Schedules the tasks to maintain the status of other OBServer nodes. |
| LogEngine | CLOG | 1 | Collects and monitors the disk space occupied by clog files and recycles clog files. |
| CLGWR | CLOG | 1 | Writes clog items to the disk. |
| ClogHisRep | CLOG | 3 | Updates the internal clog history table when a partition is added or removed. |
| BatchSubmitCtx | CLOG | 1 | Splits the transactions optimized in phase 1. |
| LogScanRunnable | CLOG | 4 | Scans all clog files when an OBServer node is started. |
| LogStateDri | CLOG | 1 | Switches the primary and standby replicas of the clog module and checks the status of the replica association. |
| ObElectionGCThread | Election | 1 | Recycles the memory of election objects. |
| Blacklist | CLOG | 1 | Detects network connection with the destination server. |
| BRPC | CLOG | 5 | Sends the batch_rpc aggregation package in the background. |
| CLOGReqMinor | CLOG | 1 | Triggers a minor freeze based on the clog disk usage to speed up the recycling of clog files. |
| LineCache | CLOG | 1 | Optimizes the synchronization of historical data in the liboblog scenario. |
| EXTLogWash | CLOG | 1 | Frequency: 100 ms. |
| CLogFileGC | CLOG | 1 | Regularly recycles clog files. |
| CKPTLogRep | CLOG | 1 | Used for the checkpoint operations of log replicas. |
| RebuildRetry | CLOG | 1 | Retries the partition rebuilding task. |
| ReplayEngine | CLOG | Physical core | Replays the clog files on the follower. |
| PxPoolTh | SQL | 0 | The thread pool for parallel execution. The number of threads is specified by the system variable parallel_max_servers. Default value: 0. |
| sql_mem_timer | SQL | 1 | Automatically manages memory. |
| OmtNodeBalancer | Common | 1 | Refreshes multi-tenant information in the background. |
| MultiTenant | Common | 1 | Refreshes CPU utilization for multiple tenants to schedule resources. |
| SignalHandle | Common | 1 | Processes signals. |
| LeaseUpdate | Common | 3 | Monitors the lease between the RootServer and each OBServer node. |
| RsDDL | Common | 1 | Executes the DDL operations of the Root Service. |
| MysqlIO | Common | 12 | Manages the connection with MySQL. |
| all_meta_table | Common | 8 | Reports the status of a partition or partition group (PG). |
| all_pg_partition_meta_table | Common | 8 | Reports the status of partitions in a PG. |
| TimerMonitor | Common | 1 | Monitors the actions of scheduled internal threads and triggers alerts for exceptions during execution. |
| ConfigMgr | Common | 1 | Refreshes parameters. |
| OB_ALOG | Common | 1 | Writes asynchronous system logs to the disk. |
| ELE_ALOG | Election | 1 | Writes asynchronous election module logs to the disk. |
How are CPU resources effectively isolated in the multitenancy architecture of OceanBase Database?
In the current version, each OBServer node controls the consumption of CPU resources by controlling the number of active threads that actually consume CPU resources. When you create a tenant in an OceanBase cluster, you must specify a resource pool for the tenant. The max_cpu parameter of the resource pool specifies the maximum number of active threads for the tenant. In this way, CPU resources are isolated among tenants. In the latest version, the OBServer kernel implements cgroup isolation to effectively control and limit CPU and memory resources with the support of OS-level configurations. This capability is being integrated into the tool platform. In this way, the OBServer nodes deployed later can effectively control and limit resources through cgroup isolation.
How do I know the number of worker threads of an OBServer node? Does an OBServer node dynamically create new threads when the load is high?
In observer.log, the snippet that contains dump tenant info shows the number of active worker threads and the maximum number of worker threads. Example:
token_count = cpu_count × cpu_quota_concurrency. The value of cpu_count indicates the number of CPU cores allocated to the tenant, which is greater than min_cpu but smaller than max_cpu.
Assume that unit_min_cpu is set to 2 and unit_max_cpu is set to 8 for the T1 tenant on an OBServer, and the CPU resources configured on the OBServer are over-allocated. The actual number of CPU cores allocated to the T1 tenant is 5. Therefore, token_count = 5 × cpu_quota_concurrency.
How does an OBServer node control the number of active threads and the maximum number of threads for a tenant?
An OBServer node calculates the number of active threads based on the value of cpu_count × cpu_quota_concurrency. These threads are created when the tenant is created. For example, the active thread A is suspended when it executes a large query. In this case, the tenant creates another active thread. However, the total number of threads that can be created by a tenant cannot exceed the value of cpu_count × workers_per_cpu_quota. If the number of threads of a tenant reaches the upper limit, new threads cannot be created and Thread A will not be suspended. This keeps the number of active threads unchanged. Assume that a tenant is configured with 16 CPU cores and the cpu_quota_concurrency parameter is set to 2. In this case, the tenant can have at most 32 active SQL threads. The number of active threads is controlled by the user-mode thread scheduler of the OBServer node.
What is the principle for allocating CPU resources to large queries? Do threads contend for CPU resources if OLAP and OLTP exist at the same time?
In OceanBase Database, you can specify the large_query_threshold parameter to define large queries. A query whose execution time exceeds the specified threshold is considered a large query. If large and small queries run at the same time, OceanBase Database allocates part of CPU resources to large queries and uses the large_query_worker_percentage parameter to limit the maximum number of active worker threads that execute large queries. The default value of the large_query_worker_percentage parameter is 30, in percentage. In this way, OceanBase Database limits the CPU resources available for large queries. This ensures that the system has sufficient CPU resources to process the load of OLTP business, such as small transactions used for trading business. As a result, sufficient CPU resources are available for promptly executing the OLTP loads that demand a short response time. Although OceanBase Database balances the allocation of CPU resources to large queries and OLTP loads, you must set the large_query_threshold parameter within a reasonable value range. If you specify a large value, threads that execute large queries may contend for CPU resources, which leads to a slow OLTP response or a queue of accumulated requests.