In a deployment architecture that spans multiple IDCs or regions, network bandwidth resources are typically more constrained. Therefore, when configuring the backup and archive paths, it is important to consider the zones, IDCs, or regions that the paths can access. This allows for the efficient use of existing network resources and avoids unnecessary traffic between IDCs or regions, thereby improving data transmission efficiency.
This section provides a configuration example for the source backup and archive paths in a three IDCs across two regions deployment architecture.
Scenario
The following figure shows a deployment architecture of a three IDCs and two cities scenario. It shows the data flow involved in backup. Five replicas of OceanBase Database are distributed across three IDCs and two cities: 2 (Shenzhen Nanshan) + 2 (Shenzhen Bao'an) + 1 (Hangzhou). In normal operation, the leader replicas are mainly located in two zones in Shenzhen Nanshan, namely z1 and z2, and the leader of the Root Service (RS) is located on the obs4 node in zone z2.
Assume that the backup media, NFS, is deployed in the Shenzhen Nanshan IDC (IDC1). The user has configured the system to allow cross-IDC access within the same city but not across cities. In this case, the OBServer nodes in both IDC1 and IDC2 can access the NFS. Access from the OBServer nodes in IDC1 to the NFS is considered intra-IDC access, while access from the OBServer nodes in IDC2 to the NFS involves inter-IDC network traffic. The OBServer nodes in Hangzhou cannot access the NFS because it is not mounted there.
Configure the source information of the backup path
By default, data backup replicas are selected from all available F replicas, and Follower replicas are usually prioritized. As shown in the preceding figure, the Root Service (RS) may schedule backup tasks to the obs10 node in Hangzhou IDC, which cannot access the NFS. This will result in backup task failures. Therefore, you need to configure the source information of the backup path to restrict the nodes that can execute backup tasks.
In this scenario, based on the rule that backup tasks cannot access NFS across cities, you can configure the source information of the backup path to region=Region1, idc=IDC1,IDC2, or zone=Z1,Z2,Z3,Z4. When configuring the source information, the metadata file of the data backup is operated by the RS node. To avoid losing access privileges to the backup path due to the RS node being outside the configured source range, which can lead to backup task failures, we recommend that you set the PRIMARY_ZONE of the tenant to ALTER TENANT tenant_name PRIMARY_ZONE='z1,z2'.
Assume that the backup path is file:///data/nfs/backup/data. Under the user tenant, the statement to configure the source information of the backup path to region=Region1 is as follows:
obclient(root@mysql001)[(none)]> ALTER SYSTEM SET DATA_BACKUP_DEST = 'file:///data/nfs/backup/data?region=Region1';
After the configuration is successful, the RS will no longer forward backup tasks to nodes in Region2. Instead, it will only forward them to nodes in Region1 that can provide backup services. The RS will prioritize Follower replicas for backup tasks to avoid increasing the load on Leader replicas and affecting other services.
As described above, when the source is configured to region=Region1, idc=IDC1,IDC2, or zone=Z1,Z2,Z3,Z4, since there is no priority among these options, a large number of backup tasks may be forwarded to Follower replicas in IDC2 (as shown in the following figure). In this case, the backup data traffic will be inter-IDC traffic. If your cross-IDC network bandwidth is limited, you can modify the source configuration to idc=IDC1;IDC2 or zone=Z1,Z2;Z3,Z4.
For example, you can use the following statement to modify the source configuration to idc=IDC1;IDC2:
obclient> ALTER SYSTEM CHANGE EXTERNAL_STORAGE_DEST PATH='file:///data/nfs/backup/data' SET ATTRIBUTE='idc=IDC1;IDC2';
After the modification, when the RS selects nodes to execute backup tasks, nodes in IDC1 have higher priority than nodes in IDC2. This allows backup tasks to primarily use intra-IDC network resources. IDC2 serves as an alternative backup path, enhancing the tolerance of backup tasks to exceptions (such as node failures in IDC1). Additionally, if the RS is switched to IDC2, it will still have access to the backup media, preventing backup failures.
Configure the source information for the log archive path
Similarly, for log archiving, in this scenario, based on the rule that NFS cannot be accessed across cities, the source information for the archive path can be configured as region=Region1, idc=IDC1;IDC2, or zone=Z1,Z2;Z3,Z4. Since all log archiving traffic is handled by the leader replica, there is no difference in access priority for the log archive path. Therefore, when configuring, zone=Z1,Z2;Z3,Z4 is equivalent to zone=Z1,Z2,Z3,Z4, and idc=IDC1;IDC2 is equivalent to idc=IDC1,IDC2.
Assume that the PRIMARY_ZONE of a user tenant is PRIMARY_ZONE='Z1,Z2,Z3,Z4', and the archive path is file:///data/nfs/backup/archive. There are two scenarios:
Scenario 1: Initial scenario with sufficient network bandwidth
In the initial scenario, since the network traffic for log archiving is relatively small, the user tenant can configure the source information for the archive path to region=Region1 to restrict access to only nodes in Region1:
obclient(root@mysql001)[(none)]> ALTER SYSTEM SET LOG_ARCHIVE_DEST = 'LOCATION=file:///data/nfs/backup/archive?region=Region1';
After the configuration is successful, it can prevent unexpected access requests (such as requests from OBServer nodes in Region2) that may occur if the leader replica is switched to Region2 due to maintenance or other reasons. If the system detects that the leader replica is accessing the archive path in Region2, it will generate an error log (error code -9063) to alert the administrator to check: (1) whether the node has lost access privileges due to a change in the locality name; (2) whether the leader replica has been switched to a node without access privileges, and to update the source configuration or manually switch the leader replica to an OBServer node with access privileges.
Scenario 2: Network bandwidth is limited
As data continues to be written, the network traffic for log archiving increases. In this case, you can modify the source configuration for the archive path to idc=IDC1 and update the tenant's PRIMARY_ZONE to PRIMARY_ZONE='Z1,Z2'.
Note
The PRIMARY_ZONE of the tenant must be consistent with the source configuration to prevent the leader replica from losing access privileges to the archive path if it is outside the source configuration range, which would prevent the archiving process from progressing.
obclient(root@mysql001)[(none)]> ALTER SYSTEM CHANGE EXTERNAL_STORAGE_DEST PATH='file:///data/nfs/backup/archive' SET ATTRIBUTE='idc=IDC1';