Performance tuning methods
Ideally, the performance of backup and restore operations is limited only by data distribution (number and size of partitions) and hardware capabilities (CPU, disk, and network). However, under the default configuration, backup and restore operations may not fully utilize hardware resources. To address this, OceanBase Database provides resource isolation strategies and the following parameters to optimize performance:
Network configuration: The
sys_bkgd_net_percentageparameter limits the percentage of total network bandwidth that can be utilized by background system tasks (including backup and restore tasks). Setting an appropriate value forsys_bkgd_net_percentageallows backup and restore tasks to make full use of network bandwidth without affecting foreground business operations.CPU and I/O configuration: OceanBase Database also provides the resource manager for resource isolation, allowing you to limit the CPU and I/O usage of different types of tasks. If resource isolation is configured for backup and restore tasks, set the limits based on actual resources and business needs to avoid bottlenecks in CPU and I/O usage.
Other configurations: When sufficient CPU, I/O, and network resources are available, it is possible to further improve the performance of backup and restore tasks by increasing their concurrency through related parameters (e.g.,
ha_low_thread_score,log_archive_concurrency,log_restore_concurrency, andha_high_thread_score).
Resource configuration
Cluster-level parameter sys_bkgd_net_percentage
The sys_bkgd_net_percentage parameter is used to set the percentage of network bandwidth that can be occupied by background system tasks. The default value is 60% of the server's network interface controller (NIC). When the bandwidth is fully utilized, the value of sys_bkgd_net_percentage can be appropriately increased without affecting business requests.
After the parameter is successfully set, you can view the log as follows:
Log in to the server where the OBServer node resides as the
adminuser.Go to the installation directory of OceanBase Database.
Here is an example of the installation path of OceanBase Database:
/home/admin/oceanbase/. In practice, follow the actual environment.[admin@xxx /]$ cd /home/admin/oceanbaseRun the following command to view the NIC speed.
[admin@xxx oceanbase]$ grep -E 'print band limit|succeed to init_bandwidth_throttle' log/observer.log*Here,
observer.logis the observer log when the cluster starts.For example, the query result is as follows:
log/observer.log.20210811100806:[2021-08-11 10:06:32.934433] INFO [SERVER] ob_server.cpp:1783 [76957] [0] [Y0-0000000000000000] [lt=4] [dc=0] succeed to init_bandwidth_throttle(sys_bkgd_net_percentage_=60,ethernet_speed_=1310720000,rate=786432000) log/observer.log.20210811100806:[2021-08-1110:07:42.351813] INFO [COMMON] utility.cpp:1487 [77169][418] [Y9FA64586E9E-0005C93F15DAE715] [lt=11] [dc=0] print band limit(comment= in , copy_KB=0, sleep_ms_sum=0, speed_KB_per_s=0, total_sleep_ms=0,total__bytes=531, rate_KB/s=786432,print_interval_ms=69417)In the first query result,
sys_bkgd_net_percentage_=60indicates that the network bandwidth for background system tasks is 60% of the NIC speed of the server.network_speed=1310720000indicates that the maximum NIC speed identified by OceanBase Database is 1310720000 B/s.rate=786432000indicates that the limited maximum speed is 786432000 B/s, andrate = network_speed * sys_bkgd_net_percentage.In the second query result,
rate_KB/s=786432indicates that the limited maximum speed (rate) identified by OceanBase Database is 786432 KB/s.
The NIC speed identified by OceanBase Database may be inaccurate. After you view the log and find that the speed identified by OceanBase Database does not match the actual speed, you can modify the speed by referring to Check NIC speed.
After the modification, you can query the V$OB_NIC_INFO view to confirm whether the NIC speed is updated.
Resource isolation in the resource manager
Resource manager is the resource isolation mechanism in OceanBase Database. It allows you to configure resource limits for different background tasks, including CPU, IOPS, and network bandwidth, through function-level resource isolation. For more information about resource isolation, see Resource isolation overview.
In function-level resource isolation, the background tasks related to backup and restore are as follows:
- ha_high: High-priority, high-reliability tasks such as replication, rebuild, and restore.
- ha_mid: Medium-priority, high-reliability tasks, such as migrations.
- ha_low: Low-priority, high-reliability tasks such as backup and backup cleanup.
You can use the DBA_OB_RSRC_DIRECTIVES view to check the resource isolation plan configured for the current tenant. If the query result is empty, it indicates that no resource isolation plan is configured for the tenant. If the query result contains records of a background task, check whether the CPU, I/O, network bandwidth, and other resources have reached the bottleneck and whether they comply with the resource isolation limits. If so, we recommend that you modify the resource isolation plan without affecting the frontend business. For more information about how to modify the resource isolation plan, see Update a resource management plan directive (MySQL mode) and Update a resource management plan directive (Oracle mode).
We recommend that you do not configure a resource isolation plan when performing performance tests for backup and recovery.
Data backup
| Parameter | Description | Default value | Remarks |
|---|---|---|---|
| ha_low_thread_score | Specifies the number of concurrent threads for data backup. This is a tenant-level parameter. | 0, which indicates that the default concurrency is 2. | For small tenants with no more than 4 CPU cores, we recommend that you set this parameter to the default value. For large tenants, you can set this parameter to 10 first. If the backup speed is too slow, you can double the value as needed. We recommend that you set this parameter to the maximum value of 100 when you perform performance tests for backup and restore. |
Log archiving
| Parameter | Description | Default value | Remarks |
|---|---|---|---|
| log_archive_concurrency | The concurrency of log archiving for a tenant. | 0, which means the system calculates the number of archive worker threads for the tenant based on the value of MAX_CPU in the following adaptive rules:
|
We recommend that you keep the default value for small and large tenants and let the system calculate the number of worker threads based on the adaptive rules. |
Physical restore
| Parameter | Description | Default value | Remarks |
|---|---|---|---|
| log_restore_concurrency | The number of concurrent threads for log restore for a tenant. | 0, which means the number of CPU cores of the MAX_CPU value for the tenant. |
Increasing the value of this parameter increases the number of worker threads and the memory resource overhead. We recommend that you set this parameter to the default value of 0. If the restore speed is too slow, you can appropriately increase the value of this parameter based on the actual resources of the server. |
| ha_high_thread_score | The number of concurrent threads for data restore for a tenant. | 0, which means 8 by default. | We recommend that you use the default value in non-performance test scenarios. If you want to perform a backup restore performance test, we recommend that you set this parameter to its maximum value of 100. |
| _restore_idle_time | The interval between RS restores in minutes. This is a cluster-level hidden parameter. | 1m, which means 1 minute. | If you set this parameter to 10s, the data restore time can be reduced by several dozen seconds to two minutes. We recommend that you set this parameter for small-scale tenants with high performance requirements (CPU ≤ 4C). Otherwise, the effect is not significant. |
Table-level restore
Physical restore of temporary auxiliary tenants
You can use the views introduced in Views related to table restore to obtain the name of the temporary auxiliary tenant associated with the task. By default, the name of the auxiliary tenant starts with the AUX prefix.
The physical restore of the temporary auxiliary tenant in table-level restore is consistent with that in physical restore. For more information about performance tuning in this phase, see the Physical restore-related information in this topic.
Cross-tenant import of tables
| Parameter | Description | Default value | Configuration recommendation |
|---|---|---|---|
| ddl_thread_score | The tenant-level parameter specifies the capacity of the table-level restore thread pool on the OBServer node where the tenant resides. | 0, which indicates the default table-level restore thread pool capacity of 2 |
|
| recover_table_concurrency | The tenant-level parameter specifies the concurrency for restoring multiple tables, that is, the number of tables that can have cross-tenant export performed simultaneously. | 0, which indicates that one table can be exported at a time. | We recommend that you set this parameter to the default value of 0. In addition:
NoticeCross-tenant export consumes a large amount of temporary storage space and memory resources. We recommend that you increase the values based on the resources of the tenant. The consumption of temporary space is particularly high in the index reconstruction phase. If a table contains a large amount of data, a large amount of temporary space may be consumed. High consumption of temporary space may cause query failures, transaction rollbacks, and service unavailability. |
| recover_table_dop | The tenant-level parameter specifies the concurrency for restoring a single table. | 0, which indicates the default concurrency of 1. | We recommend that you set this parameter to 8 or 16, which can meet the requirements of most scenarios. If the table to be restored is a non-partitioned table with a large amount of data or a table with a partition containing a large amount of data, we recommend that you double the value of the |