Optimize backup and restore performance|V4.3.1| docs|Distributed Database

Optimize backup and restore performance

Last Updated：2025-12-04 02:53:14 Updated

Performance tuning methods

Ideally, the performance of backup and restore should be limited only by data distribution (number of partitions and size of partitions) and hardware (CPU, disk, and network). However, by default, backup and restore cannot fully utilize hardware performance. To address this, OceanBase Database offers resource isolation strategies and the following parameters for performance tuning:

Network configuration: The sys_bkgd_net_percentage parameter specifies the maximum percentage of total network bandwidth available to background system tasks (including backup and restore tasks). Setting this parameter to an appropriate value helps backup and restore tasks make full use of network bandwidth without affecting the performance of frontend business.
CPU and I/O configurations: OceanBase Database also offers the Resource Manager feature for isolating CPU and I/O resources for different types of tasks. If resource isolation is configured for backup and restore tasks, set the upper limits based on actual resource and business needs to avoid bottlenecks of CPU and I/O resources for backup and restore tasks.
Other configurations: If sufficient CPU, I/O, and network resources are available, you can increase the concurrency of backup and restore tasks by setting related parameters (ha_low_thread_score, log_archive_concurrency, log_restore_concurrency, and ha_high_thread_score) to improve the performance of backup and restore tasks.

Cluster-level parameter `sys_bkgd_net_percentage`

The sys_bkgd_net_percentage parameter specifies the percentage of network bandwidth available to background system tasks. The default value is 60% of the network card rate of the server. You can increase the value of this parameter to a higher percentage when the network bandwidth is sufficient, without affecting business requests.

To view the log after the parameter is set, follow these steps:

Log in to the server where the OBServer node resides as the admin user.
Go to the installation directory of OceanBase Database.

For example, if OceanBase Database is installed at /home/admin/oceanbase/, follow the instructions below. Note that you must specify the actual installation path in your environment.
```
[admin@xxx /]$ cd /home/admin/oceanbase
```
Run the following command to view the network card rate.
```
[admin@xxx oceanbase]$ grep -E 'print band limit|succeed to init_bandwidth_throttle' log/observer.log*
```
Here, observer.log is the observer log file generated when the cluster is started.

For example, the query result is as follows:
```
log/observer.log.20210811100806:[2021-08-11 10:06:32.934433] INFO  [SERVER] ob_server.cpp:1783 [76957] [0] [Y0-0000000000000000] [lt=4] [dc=0] succeed to init_bandwidth_throttle(sys_bkgd_net_percentage_=60,ethernet_speed_=1310720000,rate=786432000)
log/observer.log.20210811100806:[2021-08-1110:07:42.351813] INFO  [COMMON] utility.cpp:1487 [77169][418] [Y9FA64586E9E-0005C93F15DAE715] [lt=11] [dc=0] print band limit(comment= in , copy_KB=0, sleep_ms_sum=0, speed_KB_per_s=0, total_sleep_ms=0,total__bytes=531, rate_KB/s=786432,print_interval_ms=69417)
```
In the first query result, sys_bkgd_net_percentage_=60 indicates that background system tasks can use 60% of the network bandwidth, which is the rate of the network card of the server; network_speed=1310720000 indicates that the maximum network card rate identified by OceanBase Database is 1310720000 B/s; and rate=786432000 indicates that the maximum network card rate after throttling is 786432000 B/s, and rate = network_speed * sys_bkgd_net_percentage.

In the second query result, rate_KB/s=786432 indicates that the identified maximum throttling rate (rate) is 786432 KB/s.

The network card rate identified by OceanBase Database may be inaccurate. After you view the log and find that the network card rate identified by OceanBase Database is inconsistent with the actual one, you can modify it by referring to Check the NIC rate.

After the network card rate is modified, you can query the V$OB_NIC_INFO view to confirm whether the modified network card rate takes effect.

Resource isolation in the Resource Manager

The Resource Manager is the resource isolation mechanism in OceanBase Database. With function-level resource isolation, you can configure resource limits, such as CPU and IOPS, for different background tasks. For more information, see Overview of resource isolation.

In function-level resource isolation, background tasks related to backup and restore are classified into the following categories based on priority and reliability:

ha_high: tasks of the high-priority and high-reliability category, such as replication, rebuild, and restore.
ha_mid: tasks of the medium-priority and high-reliability category, such as migration.
ha_low: tasks of the low-priority and high-reliability category, such as backup and backup cleanup.

You can query the DBA_OB_RSRC_IO_DIRECTIVES view for the resource isolation plan configured for the current tenant. If the query result set is empty, the tenant has no resource isolation plan configured. If the query result set contains records of background tasks, you can check whether the CPU, I/O, network bandwidth, or other resources are bottlenecked and whether the bottlenecks are within the limits of resource isolation. If you confirm that the bottlenecks are within the limits and resource isolation is not the cause, you can modify the resource isolation plan without affecting the frontend business. For more information, see Modify a resource management plan (MySQL mode) and Modify a resource management plan (Oracle mode).

Do not configure a resource isolation plan during performance testing of backup and restore.

Parameter	Description	Default value	Remarks
ha_low_thread_score	The maximum number of threads for concurrent data backup. This parameter is a tenant-level parameter.	0, which indicates that the default value, 2, is used.	We recommend that you set this parameter to the default value for small-sized tenants (CPU cores ≤ 4) and start with the value of 10 for large-sized tenants. If you find that the backup speed is too slow, you can double the value. We recommend that you set this parameter to 100 during performance testing for backup and restore.

Parameter	Description	Default value	Note
log_archive_concurrency	The maximum number of concurrent threads for log archive. This parameter is a tenant-level parameter.	0. In this case, the system calculates the number of threads for archive work based on the `MAX_CPU` of the tenant following adaptive rules. If `tenant's MAX_CPU <= 8`, `number of archive threads = MAX_CPU`. If `8 < tenant's MAX_CPU < 32`, `number of archive threads = tenant's MAX_CPU / 2`, with a minimum value of 8. If `tenant's MAX_CPU >= 32`, `number of archive threads = tenant's MAX_CPU / 4`, with a minimum value of 16.	We recommend that you set this parameter to the default value for both large- and small-scale tenants so that the system can calculate the number of worker threads based on the adaptive rules.

Parameter	Description	Default value	Notes
log_restore_concurrency	The maximum number of threads for restoring logs in a tenant. The value 0 means the number of threads is equal to the number of CPU cores of the tenant.	0	Increasing this parameter increases the number of threads and the memory resource overhead. We recommend that you set this parameter to the default value 0. If the restore speed is too slow, you can increase this parameter based on the actual resources of the server.
ha_high_thread_score	The maximum number of threads for restoring data in a tenant. The value 0 means the default value 8.	0	We recommend that you use the default value in non-performance test scenarios and the maximum value in performance test scenarios.
_restore_idle_time	Cluster-level hidden configuration item used to control the scheduling interval for RS recovery.	1m, which means 1 minute	When adjusted to 10s, the time consumed for data recovery will be reduced by tens of seconds to two minutes. It is recommended to adjust for small-scale tenants with higher performance requirements (tenant CPU ≤ 4C), otherwise the effect is not obvious.

Optimize backup and restore performance

Performance tuning methods

Resource configuration related

Cluster-level parameter sys_bkgd_net_percentage

Resource isolation in the Resource Manager

Data backup-related

Log archive-related

Restore-related

References

Cluster-level parameter `sys_bkgd_net_percentage`