Performance tuning methods
Ideally, the performance of backup and restore should be limited only by the data distribution (number of partitions and size of partitions) and hardware (CPU, disk, and network). However, by default, backup and restore cannot fully utilize the hardware performance. To address this, OceanBase Database offers resource isolation strategies and the following parameters for performance tuning:
Network configuration: The
sys_bkgd_net_percentageparameter specifies the maximum percentage of total network bandwidth available to background system tasks (including backup and restore tasks). Setting this parameter to an appropriate value helps the backup and restore tasks make full use of the network bandwidth without affecting the foreground business.CPU and I/O configurations: OceanBase Database also offers the Resource Manager feature for isolating the CPU and I/O resources of different types of tasks. If resource isolation is configured for backup and restore tasks, set the upper limits based on actual resource and business requirements to avoid bottlenecks of CPU and I/O resources for backup and restore tasks.
Other configurations: If sufficient CPU, I/O, and network resources are available, you can increase the concurrency of backup and restore tasks by setting relevant parameters (
ha_low_thread_score,log_archive_concurrency,log_restore_concurrency, andha_high_thread_score) to improve the performance.
Resource configuration related
Cluster-level parameter sys_bkgd_net_percentage
The sys_bkgd_net_percentage parameter specifies the percentage of network bandwidth available to background system tasks. The default value is 60% of the network card rate of the server. When the network bandwidth is full, you can increase the value of the sys_bkgd_net_percentage parameter to reserve sufficient network bandwidth for business requests.
After the parameter is set, perform the following steps to view the log:
Log in to the server where the OBServer node resides as the
adminuser.Navigate to the installation directory of OceanBase Database.
For example, if OceanBase Database is installed at
/home/admin/oceanbase/, follow the instructions below. If the actual installation path is different, proceed with the appropriate modifications.[admin@xxx /]$ cd /home/admin/oceanbaseRun the following command to view the network card rate.
[admin@xxx oceanbase]$ grep -E 'print band limit|succeed to init_bandwidth_throttle' log/observer.log*Here,
observer.logis the observer log file generated when the cluster starts.For example, the query result is as follows:
log/observer.log.20210811100806:[2021-08-11 10:06:32.934433] INFO [SERVER] ob_server.cpp:1783 [76957] [0] [Y0-0000000000000000] [lt=4] [dc=0] succeed to init_bandwidth_throttle(sys_bkgd_net_percentage_=60,ethernet_speed_=1310720000,rate=786432000) log/observer.log.20210811100806:[2021-08-1110:07:42.351813] INFO [COMMON] utility.cpp:1487 [77169][418] [Y9FA64586E9E-0005C93F15DAE715] [lt=11] [dc=0] print band limit(comment= in , copy_KB=0, sleep_ms_sum=0, speed_KB_per_s=0, total_sleep_ms=0,total__bytes=531, rate_KB/s=786432,print_interval_ms=69417)In the first query result,
sys_bkgd_net_percentage_=60indicates that background system tasks can use 60% of the network bandwidth, which is the network card rate of the server;network_speed=1310720000indicates that the maximum network card rate identified by OceanBase Database is 1310720000 B/s; andrate=786432000indicates that the maximum network bandwidth after speed limiting is 786432000 B/s, andrate = network_speed * sys_bkgd_net_percentage.In the second query result,
rate_KB/s=786432indicates that the identified maximum speed limit (rate) is 786432 KB/s.
If the network card rate identified by OceanBase Database is inaccurate, after you view the log, you can modify the network card rate based on the checklist for network card rate and then view the log again.
Resource isolation in the Resource Manager
The Resource Manager is the resource isolation mechanism in OceanBase Database. In function-level resource isolation, you can configure the upper limits of resources, such as CPU, IOPS, and network bandwidth, for different background tasks. For more information, see Overview of resource isolation.
In function-level resource isolation, background tasks related to backup and restore are classified into the following categories based on priority and reliability:
- ha_high: tasks of high priority and reliability, such as replication, rebuild, and restore.
- ha_mid: tasks of medium priority and reliability, such as migration.
- ha_low: tasks of low priority and reliability, such as backup and backup cleanup.
You can query the DBA_OB_RSRC_IO_DIRECTIVES view for the resource isolation plan configured for the current tenant. If the query result set is empty, resource isolation is not configured for the tenant. If the query result set contains records of background tasks, check whether the CPU, I/O, network bandwidth, and other resources are bottlenecked and whether the bottleneck is within the limits set by resource isolation. If yes, modify the resource isolation plan without affecting the frontend business. For more information, see Modify a resource management plan (MySQL mode) and Modify a resource management plan (Oracle mode).
Do not configure resource isolation plans during performance tests of backup and restore.
Data backup-related
| Parameter | Description | Default value | Note |
|---|---|---|---|
| ha_low_thread_score | The maximum number of threads for concurrent data backup. This parameter is a tenant-level parameter. | 0, which indicates that the default value, 2, is used. | We recommend that you set this parameter to the default value for small-sized tenants (CPU cores ≤ 4) and start with the value of 10 for large-sized tenants. If you find that the backup speed is too slow, you can double the value. During performance tests for backup and restore, we recommend that you set this parameter to the maximum value, 100. |
Log archive-related
| Parameter | Description | Default value | Notes |
|---|---|---|---|
| log_archive_concurrency | The maximum number of concurrent threads for log archive. This is a tenant-level parameter. | 0. In this case, the system calculates the number of archive threads based on the MAX_CPU of the tenant by using the following adaptive rule:
|
We recommend that you set this parameter to the default value for both large- and small-scale tenants so that the system can adaptively calculate the number of worker threads. |
Restoration-related
| Parameter | Description | Default value | Remarks |
|---|---|---|---|
| log_restore_concurrency | The maximum number of concurrent log restorations. This is a tenant-level parameter. | 0, which means the number of concurrent threads is equal to the number of cores of the tenant's MAX_CPU. |
Increasing this parameter increases the number of worker threads and the memory resource overhead. We recommend that you set this parameter to the default value of 0 and increase it only if you find that the restoration speed is too slow. |
| ha_high_thread_score | The maximum number of concurrent data restorations. This is a tenant-level parameter. | 0, which means the default number of concurrent threads is 8. | We recommend that you use the default value in non-performance test scenarios and set it to the maximum value of 100 in performance test scenarios. |
| _restore_idle_time | Cluster-level hidden configuration item used to control the scheduling interval for RS recovery. | 1m, which means 1 minute | When adjusted to 10s, the time consumed for data recovery will be reduced by tens of seconds to two minutes. It is recommended to adjust for small-scale tenants with higher performance requirements (tenant CPU ≤ 4C), otherwise the effect is not obvious. |