Performance tuning methods
Ideally, the performance of backup and restore should be limited only by data distribution (number of partitions and size of partitions) and hardware (CPU, disk, and network). However, by default, backup and restore cannot fully utilize hardware performance. To address this, OceanBase Database offers resource isolation strategies and the following parameters for performance tuning:
Network configuration: The
sys_bkgd_net_percentageparameter specifies the maximum percentage of total network bandwidth available to background system tasks (including backup and restore tasks). Setting this parameter to an appropriate value helps backup and restore tasks make full use of network bandwidth without affecting the performance of frontend business.CPU and I/O configurations: OceanBase Database also offers the Resource Manager feature for isolating CPU and I/O resources for different types of tasks. If resource isolation is configured for backup and restore tasks, set the upper limits based on actual resource and business needs to avoid bottlenecks of CPU and I/O resources for backup and restore tasks.
Other configurations: If sufficient CPU, I/O, and network resources are available, you can increase the concurrency of backup and restore tasks by setting related parameters (
ha_low_thread_score,log_archive_concurrency,log_restore_concurrency, andha_high_thread_score) to improve the performance of backup and restore tasks.
Resource configuration related
Cluster-level parameter sys_bkgd_net_percentage
The sys_bkgd_net_percentage parameter specifies the percentage of network bandwidth available to background system tasks. The default value is 60% of the network card rate of the server. You can increase the value of this parameter to a higher percentage when the network bandwidth is sufficient, without affecting business requests.
To view the log after the parameter is set, follow these steps:
Log in to the server where the OBServer node resides as the
adminuser.Go to the installation directory of OceanBase Database.
For example, if OceanBase Database is installed at
/home/admin/oceanbase/, follow the instructions below. Note that you must specify the actual installation path in your environment.[admin@xxx /]$ cd /home/admin/oceanbaseRun the following command to view the network card rate.
[admin@xxx oceanbase]$ grep -E 'print band limit|succeed to init_bandwidth_throttle' log/observer.log*Here,
observer.logis the observer log file generated when the cluster is started.For example, the query result is as follows:
log/observer.log.20210811100806:[2021-08-11 10:06:32.934433] INFO [SERVER] ob_server.cpp:1783 [76957] [0] [Y0-0000000000000000] [lt=4] [dc=0] succeed to init_bandwidth_throttle(sys_bkgd_net_percentage_=60,ethernet_speed_=1310720000,rate=786432000) log/observer.log.20210811100806:[2021-08-1110:07:42.351813] INFO [COMMON] utility.cpp:1487 [77169][418] [Y9FA64586E9E-0005C93F15DAE715] [lt=11] [dc=0] print band limit(comment= in , copy_KB=0, sleep_ms_sum=0, speed_KB_per_s=0, total_sleep_ms=0,total__bytes=531, rate_KB/s=786432,print_interval_ms=69417)In the first query result,
sys_bkgd_net_percentage_=60indicates that background system tasks can use 60% of the network bandwidth, which is the rate of the network card of the server;network_speed=1310720000indicates that the maximum network card rate identified by OceanBase Database is 1310720000 B/s; andrate=786432000indicates that the maximum network card rate after throttling is 786432000 B/s, andrate = network_speed * sys_bkgd_net_percentage.In the second query result,
rate_KB/s=786432indicates that the identified maximum throttling rate (rate) is 786432 KB/s.
The network card rate identified by OceanBase Database may be inaccurate. After you view the log and find that the network card rate identified by OceanBase Database is inconsistent with the actual one, you can modify it by referring to Check the NIC rate.
After the network card rate is modified, you can query the V$OB_NIC_INFO view to confirm whether the modified network card rate takes effect.
Resource isolation in the Resource Manager
The Resource Manager is the resource isolation mechanism in OceanBase Database. With function-level resource isolation, you can configure resource limits, such as CPU and IOPS, for different background tasks. For more information, see Overview of resource isolation.
In function-level resource isolation, background tasks related to backup and restore are classified into the following categories based on priority and reliability:
- ha_high: tasks of the high-priority and high-reliability category, such as replication, rebuild, and restore.
- ha_mid: tasks of the medium-priority and high-reliability category, such as migration.
- ha_low: tasks of the low-priority and high-reliability category, such as backup and backup cleanup.
You can query the DBA_OB_RSRC_IO_DIRECTIVES view for the resource isolation plan configured for the current tenant. If the query result set is empty, the tenant has no resource isolation plan configured. If the query result set contains records of background tasks, you can check whether the CPU, I/O, network bandwidth, or other resources are bottlenecked and whether the bottlenecks are within the limits of resource isolation. If you confirm that the bottlenecks are within the limits and resource isolation is not the cause, you can modify the resource isolation plan without affecting the frontend business. For more information, see Modify a resource management plan (MySQL mode) and Modify a resource management plan (Oracle mode).
Do not configure a resource isolation plan during performance testing of backup and restore.
Data backup-related
| Parameter | Description | Default value | Remarks |
|---|---|---|---|
| ha_low_thread_score | The maximum number of threads for concurrent data backup. This parameter is a tenant-level parameter. | 0, which indicates that the default value, 2, is used. | We recommend that you set this parameter to the default value for small-sized tenants (CPU cores ≤ 4) and start with the value of 10 for large-sized tenants. If you find that the backup speed is too slow, you can double the value. We recommend that you set this parameter to 100 during performance testing for backup and restore. |
Log archive-related
| Parameter | Description | Default value | Note |
|---|---|---|---|
| log_archive_concurrency | The maximum number of concurrent threads for log archive. This parameter is a tenant-level parameter. | 0. In this case, the system calculates the number of threads for archive work based on the MAX_CPU of the tenant following adaptive rules.
|
We recommend that you set this parameter to the default value for both large- and small-scale tenants so that the system can calculate the number of worker threads based on the adaptive rules. |
Restore-related
| Parameter | Description | Default value | Notes |
|---|---|---|---|
| log_restore_concurrency | The maximum number of threads for restoring logs in a tenant. The value 0 means the number of threads is equal to the number of CPU cores of the tenant. | 0 | Increasing this parameter increases the number of threads and the memory resource overhead. We recommend that you set this parameter to the default value 0. If the restore speed is too slow, you can increase this parameter based on the actual resources of the server. |
| ha_high_thread_score | The maximum number of threads for restoring data in a tenant. The value 0 means the default value 8. | 0 | We recommend that you use the default value in non-performance test scenarios and the maximum value in performance test scenarios. |
| _restore_idle_time | Cluster-level hidden configuration item used to control the scheduling interval for RS recovery. | 1m, which means 1 minute | When adjusted to 10s, the time consumed for data recovery will be reduced by tens of seconds to two minutes. It is recommended to adjust for small-scale tenants with higher performance requirements (tenant CPU ≤ 4C), otherwise the effect is not obvious. |