Backup and restore performance tuning|V4.3.5| docs|Distributed Database

Backup and restore performance tuning

Last Updated：2025-06-25 06:47:49 Updated

Performance tuning methods

Ideally, the performance of backup and restore operations is limited only by data distribution (number and size of partitions) and hardware capabilities (CPU, disk, and network). However, under the default configuration, backup and restore operations may not fully utilize hardware resources. To address this, OceanBase Database provides resource isolation strategies and the following parameters to optimize performance:

Network configuration: The sys_bkgd_net_percentage parameter limits the percentage of total network bandwidth that can be utilized by background system tasks (including backup and restore tasks). Setting an appropriate value for sys_bkgd_net_percentage allows backup and restore tasks to make full use of network bandwidth without affecting foreground business operations.
CPU and I/O configuration: OceanBase Database also provides the resource manager for resource isolation, allowing you to limit the CPU and I/O usage of different types of tasks. If resource isolation is configured for backup and restore tasks, set the limits based on actual resources and business needs to avoid bottlenecks in CPU and I/O usage.
Other configurations: When sufficient CPU, I/O, and network resources are available, it is possible to further improve the performance of backup and restore tasks by increasing their concurrency through related parameters (e.g., ha_low_thread_score, log_archive_concurrency, log_restore_concurrency, and ha_high_thread_score).

Resource configuration

Cluster-level parameter `sys_bkgd_net_percentage`

The sys_bkgd_net_percentage parameter is used to set the percentage of network bandwidth that can be occupied by background system tasks. The default value is 60% of the server's network interface controller (NIC). When the bandwidth is fully utilized, the value of sys_bkgd_net_percentage can be appropriately increased without affecting business requests.

After the parameter is successfully set, you can view the log as follows:

Log in to the server where the OBServer node resides as the admin user.
Go to the installation directory of OceanBase Database.

Here is an example of the installation path of OceanBase Database: /home/admin/oceanbase/. In practice, follow the actual environment.
```
[admin@xxx /]$ cd /home/admin/oceanbase
```
Run the following command to view the NIC speed.
```
[admin@xxx oceanbase]$ grep -E 'print band limit|succeed to init_bandwidth_throttle' log/observer.log*
```
Here, observer.log is the observer log when the cluster starts.

For example, the query result is as follows:
```
log/observer.log.20210811100806:[2021-08-11 10:06:32.934433] INFO  [SERVER] ob_server.cpp:1783 [76957] [0] [Y0-0000000000000000] [lt=4] [dc=0] succeed to init_bandwidth_throttle(sys_bkgd_net_percentage_=60,ethernet_speed_=1310720000,rate=786432000)
log/observer.log.20210811100806:[2021-08-1110:07:42.351813] INFO  [COMMON] utility.cpp:1487 [77169][418] [Y9FA64586E9E-0005C93F15DAE715] [lt=11] [dc=0] print band limit(comment= in , copy_KB=0, sleep_ms_sum=0, speed_KB_per_s=0, total_sleep_ms=0,total__bytes=531, rate_KB/s=786432,print_interval_ms=69417)
```
In the first query result, sys_bkgd_net_percentage_=60 indicates that the network bandwidth for background system tasks is 60% of the NIC speed of the server. network_speed=1310720000 indicates that the maximum NIC speed identified by OceanBase Database is 1310720000 B/s. rate=786432000 indicates that the limited maximum speed is 786432000 B/s, and rate = network_speed * sys_bkgd_net_percentage.

In the second query result, rate_KB/s=786432 indicates that the limited maximum speed (rate) identified by OceanBase Database is 786432 KB/s.

The NIC speed identified by OceanBase Database may be inaccurate. After you view the log and find that the speed identified by OceanBase Database does not match the actual speed, you can modify the speed by referring to Check NIC speed.

After the modification, you can query the V$OB_NIC_INFO view to confirm whether the NIC speed is updated.

Resource isolation in the resource manager

Resource manager is the resource isolation mechanism in OceanBase Database. It allows you to configure resource limits for different background tasks, including CPU, IOPS, and network bandwidth, through function-level resource isolation. For more information about resource isolation, see Resource isolation overview.

In function-level resource isolation, the background tasks related to backup and restore are as follows:

ha_high: High-priority, high-reliability tasks such as replication, rebuild, and restore.
ha_mid: Medium-priority, high-reliability tasks, such as migrations.
ha_low: Low-priority, high-reliability tasks such as backup and backup cleanup.

You can use the DBA_OB_RSRC_DIRECTIVES view to check the resource isolation plan configured for the current tenant. If the query result is empty, it indicates that no resource isolation plan is configured for the tenant. If the query result contains records of a background task, check whether the CPU, I/O, network bandwidth, and other resources have reached the bottleneck and whether they comply with the resource isolation limits. If so, we recommend that you modify the resource isolation plan without affecting the frontend business. For more information about how to modify the resource isolation plan, see Update a resource management plan directive (MySQL mode) and Update a resource management plan directive (Oracle mode).

We recommend that you do not configure a resource isolation plan when performing performance tests for backup and recovery.

Data backup

Parameter	Description	Default value	Remarks
ha_low_thread_score	Specifies the number of concurrent threads for data backup. This is a tenant-level parameter.	0, which indicates that the default concurrency is 2.	For small tenants with no more than 4 CPU cores, we recommend that you set this parameter to the default value. For large tenants, you can set this parameter to 10 first. If the backup speed is too slow, you can double the value as needed. We recommend that you set this parameter to the maximum value of 100 when you perform performance tests for backup and restore.

Log archiving

Parameter	Description	Default value	Remarks
log_archive_concurrency	The concurrency of log archiving for a tenant.	0, which means the system calculates the number of archive worker threads for the tenant based on the value of `MAX_CPU` in the following adaptive rules: If `tenant's MAX_CPU <= 8`, `archive threads = MAX_CPU`. If `8 < tenant's MAX_CPU < 32`, `archive threads = tenant's MAX_CPU / 2`, with a minimum value of 8. If `tenant's MAX_CPU >= 32`, `archive threads = tenant's MAX_CPU / 4`, with a minimum value of 16.	We recommend that you keep the default value for small and large tenants and let the system calculate the number of worker threads based on the adaptive rules.

Physical restore

Parameter	Description	Default value	Remarks
log_restore_concurrency	The number of concurrent threads for log restore for a tenant.	0, which means the number of CPU cores of the `MAX_CPU` value for the tenant.	Increasing the value of this parameter increases the number of worker threads and the memory resource overhead. We recommend that you set this parameter to the default value of 0. If the restore speed is too slow, you can appropriately increase the value of this parameter based on the actual resources of the server.
ha_high_thread_score	The number of concurrent threads for data restore for a tenant.	0, which means 8 by default.	We recommend that you use the default value in non-performance test scenarios. If you want to perform a backup restore performance test, we recommend that you set this parameter to its maximum value of 100.
_restore_idle_time	The interval between RS restores in minutes. This is a cluster-level hidden parameter.	1m, which means 1 minute.	If you set this parameter to 10s, the data restore time can be reduced by several dozen seconds to two minutes. We recommend that you set this parameter for small-scale tenants with high performance requirements (CPU ≤ 4C). Otherwise, the effect is not significant.

Table-level restore

Physical restore of temporary auxiliary tenants

You can use the views introduced in Views related to table restore to obtain the name of the temporary auxiliary tenant associated with the task. By default, the name of the auxiliary tenant starts with the AUX prefix.

The physical restore of the temporary auxiliary tenant in table-level restore is consistent with that in physical restore. For more information about performance tuning in this phase, see the Physical restore-related information in this topic.

Cross-tenant import of tables

Parameter	Description	Default value	Configuration recommendation
ddl_thread_score	The tenant-level parameter specifies the capacity of the table-level restore thread pool on the OBServer node where the tenant resides.	0, which indicates the default table-level restore thread pool capacity of 2	For small tenants with up to 4 CPU cores, we recommend that you set this parameter to 4. For large tenants, you can initially set this parameter to 8. If the restore speed is too slow, you can double the value based on the settings of the `recover_table_concurrency` and `recover_table_dop` parameters and the actual resources of the tenant. The maximum number of parallel threads required to restore a single table = `number of partitions × recover_table_dop`. For example, if a table has 4 partitions and `recover_table_dop=2`, the required number of parallel threads is `4×2=8`. When restoring multiple tables, the maximum number of parallel threads required = the sum of the maximum number of parallel threads required for restoring each table. For example, if you need to restore two tables, Table 1 has 4 partitions, and Table 2 has 8 partitions, and `recover_table_dop=2`, the maximum number of parallel threads required is `4×2+8×2=24`.
recover_table_concurrency	The tenant-level parameter specifies the concurrency for restoring multiple tables, that is, the number of tables that can have cross-tenant export performed simultaneously.	0, which indicates that one table can be exported at a time.	We recommend that you set this parameter to the default value of 0. In addition: If the restore speed is too slow in scenarios involving the restoration of multiple tables, you can double the value based on the actual resources of the tenant. In scenarios involving the restoration of a large number of small tables, we recommend that you set this parameter to 8 or 16. Notice Cross-tenant export consumes a large amount of temporary storage space and memory resources. We recommend that you increase the values based on the resources of the tenant. The consumption of temporary space is particularly high in the index reconstruction phase. If a table contains a large amount of data, a large amount of temporary space may be consumed. High consumption of temporary space may cause query failures, transaction rollbacks, and service unavailability.
recover_table_dop	The tenant-level parameter specifies the concurrency for restoring a single table.	0, which indicates the default concurrency of 1.	We recommend that you set this parameter to 8 or 16, which can meet the requirements of most scenarios. If the table to be restored is a non-partitioned table with a large amount of data or a table with a partition containing a large amount of data, we recommend that you double the value of the `recover_table_dop` parameter based on the data volume of the partition.