Performance tuning methods
Ideally, the performance of backup and restore should be limited only by the data distribution (number of partitions and size of partitions) and the hardware performance (CPU, disk, and network). However, by default, backup and restore cannot fully utilize the hardware performance. To address this, OceanBase Database offers resource isolation strategies and the following parameters for performance tuning:
Network configuration: The
sys_bkgd_net_percentageparameter specifies the maximum percentage of total network bandwidth available to background system tasks (including backup and restore tasks). Setting this parameter to an appropriate value helps the backup and restore tasks make full use of the network bandwidth without affecting the foreground business.CPU and I/O configurations: OceanBase Database also offers the Resource Manager feature for isolating the CPU and I/O resources of different types of tasks. If resource isolation is configured for backup and restore tasks, set the upper limits based on actual resource and business needs to avoid bottlenecks of CPU and I/O resources for backup and restore tasks.
Other configurations: If sufficient CPU, I/O, and network resources are available, you can increase the concurrency of backup and restore tasks by setting relevant parameters (
ha_low_thread_score,log_archive_concurrency,log_restore_concurrency, andha_high_thread_score) to improve the performance.
Resource configuration related
Cluster-level parameter sys_bkgd_net_percentage
The sys_bkgd_net_percentage parameter specifies the percentage of network bandwidth available for background system tasks. The default value is 60% of the network card rate of the server. When the network bandwidth is full, you can increase the value of the sys_bkgd_net_percentage parameter to reserve more network bandwidth for background system tasks without affecting business requests.
After the parameter is set, follow these steps to view the log:
Log in to the server where the OBServer node resides as the
adminuser.Navigate to the installation directory of OceanBase Database.
For example, if OceanBase Database is installed at
/home/admin/oceanbase/, follow the instructions below. If the actual installation path is different, proceed with the actual path.[admin@xxx /]$ cd /home/admin/oceanbaseRun the following command to view the network card rate.
[admin@xxx oceanbase]$ grep -E 'print band limit|succeed to init_bandwidth_throttle' log/observer.log*Here,
observer.logis the observer log file generated when the cluster is started.For example, the query result is as follows:
log/observer.log.20210811100806:[2021-08-11 10:06:32.934433] INFO [SERVER] ob_server.cpp:1783 [76957] [0] [Y0-0000000000000000] [lt=4] [dc=0] succeed to init_bandwidth_throttle(sys_bkgd_net_percentage_=60,ethernet_speed_=1310720000,rate=786432000) log/observer.log.20210811100806:[2021-08-1110:07:42.351813] INFO [COMMON] utility.cpp:1487 [77169][418] [Y9FA64586E9E-0005C93F15DAE715] [lt=11] [dc=0] print band limit(comment= in , copy_KB=0, sleep_ms_sum=0, speed_KB_per_s=0, total_sleep_ms=0,total__bytes=531, rate_KB/s=786432,print_interval_ms=69417)In the first query result,
sys_bkgd_net_percentage_=60indicates that the network bandwidth of 60% of the network card rate is available for background system tasks;network_speed=1310720000indicates that the maximum network card rate identified by OceanBase Database is 1310720000 B/s; andrate=786432000indicates that the maximum network rate after throttling is 786432000 B/s, andrate = network_speed * sys_bkgd_net_percentage.In the second query result,
rate_KB/s=786432indicates that the identified maximum throttled network rate (rate) is 786432 KB/s.
The network card rate identified by OceanBase Database may be inaccurate. After you view the log and find that the network card rate identified by OceanBase Database is inconsistent with the actual one, you can modify the network card rate based on the checklist for network card rate.
Resource isolation in the Resource Manager
The Resource Manager is the resource isolation mechanism in OceanBase Database. With function-level resource isolation, you can set resource usage limits for different background tasks, such as CPU and IOPS. For more information, see Overview of resource isolation.
In function-level resource isolation, background tasks related to backup and restore are classified into the following categories based on priority and reliability:
- ha_high: tasks of the high-priority and high-reliability category, such as replication, rebuild, and restore.
- ha_mid: tasks of the medium-priority and high-reliability category, such as migration.
- ha_low: tasks of the low-priority and high-reliability category, such as backup and backup cleanup.
You can view the resource isolation plan configured for the current tenant through the view DBA_OB_RSRC_IO_DIRECTIVES. If the query result is empty, it means that the tenant has not configured a resource isolation plan. If records corresponding to background tasks are found, you can first observe whether CPU, IO, network bandwidth, etc., have reached a bottleneck and conform to the limits of resource isolation. If confirmed to be in compliance, it is recommended to appropriately modify the resource isolation plan without affecting foreground business operations. For detailed operations on modifying resource plans, refer to Update Resource Management Plan Content (MySQL Mode) and Update Resource Management Plan Content (Oracle Mode).
Do not configure a resource isolation plan during performance testing of backup and restore.
Data backup-related
| Parameter | Description | Default value | Remarks |
|---|---|---|---|
| ha_low_thread_score | The maximum number of threads for concurrent data backup. This is a tenant-level parameter. | 0, which indicates that the default value of 2 is used. | We recommend that you set this parameter to the default value for small-sized tenants (CPU cores ≤ 4), and that you set it to 10 for large-sized tenants. If you find that the backup speed is too slow, you can increase the value by 2. During performance tests for backup and restore, we recommend that you set this parameter to the maximum value of 100. |
Log archive-related
| Parameter | Description | Default value | Remarks |
|---|---|---|---|
| log_archive_concurrency | The maximum number of concurrent archive processes in a tenant. | 0. In this case, the system calculates the number of archive worker threads based on the adaptive rule below and the MAX_CPU of the tenant:
|
We recommend that you set this parameter to the default value for both large- and small-scale tenants so that the system can adaptively calculate the number of worker threads. |
Restore-related
| Parameter | Description | Default value | Remarks |
|---|---|---|---|
| log_restore_concurrency | The maximum number of threads for restoring logs in a tenant. The value 0 specifies the number of CPU cores of the tenant with the MAX_CPU attribute. |
The number of CPU cores of the tenant with the MAX_CPU attribute. |
Increasing this parameter increases the number of worker threads and the memory resource overhead. We recommend that you set this parameter to the default value 0 and increase it only if you find that the restore speed is too slow. |
| ha_high_thread_score | The maximum number of threads for restoring data in a tenant. The value 0 specifies the default value 8. | 8 | We recommend that you use the default value in non-performance test scenarios and the maximum value 100 in performance test scenarios. |
| _restore_idle_time | A cluster-level hidden parameter to control the scheduling interval for restore of an RS node. | 1m, which specifies 1 minute | If you set this parameter to 10s, the data restore time for an RS node can be shortened by several dozen seconds to two minutes. We recommend that you set this parameter to an appropriate value for tenants with high performance requirements (CPU ≤ 4C), otherwise the modification has no noticeable effect. |