Backup and restore is a core component of the high availability feature of OceanBase Database. It ensures data security by preventing issues such as storage medium damage and user errors. If data is lost due to storage medium damage or user errors, you can restore the data.
Overview
The backup and restore module of OceanBase Database provides three main features: backup, restore, and cleanup.
OceanBase Database supports tenant-level physical backup. Physical backup consists of data backup and log archiving, so it is composed of these two features. Here, the tenant refers to the user's User tenant, and physical backup is not supported for the sys tenant and Meta tenant.
Data backup refers to the feature of backing up data, which is divided into full backup and incremental backup:
Full backup backs up all macroblocks.
Incremental backup refers to backing up macroblocks added or modified since the last backup.
Notice
Before you perform a physical backup, you must enable log archiving first.
The data backed up in data backup includes the following:
Tenant-related information, such as the tenant name, cluster name, timezone, locality, and compatibility mode (MySQL or Oracle) of the tenant.
Data of all user tables
Note
Data backup backs up system variables and tenant configurations but does not back up cluster-level configurations or private system table data.
Log archiving refers to the automatic archiving of log data. The OBServer node periodically archives log data to the specified backup path. This action is fully automatic and does not require external triggering. After the log archiving service is enabled, the data dictionary is also archived.
The overall architecture of physical restore is as follows:
Physical restore supports tenant-level restore and table-level restore.
Tenant-level restore: Tenant-level restore is the process of rebuilding a new tenant based on the backup data of an existing tenant. Tenant-level restore ensures global consistency across tables and partitions.
Table-level restore: Restores user-specified tables from backup data to an existing tenant. The target tenant can be the same as the source tenant, a different tenant in the same cluster, or a tenant in a different cluster.
Tenant-level restore supports full restore and quick restore.
Notice
After a tenant is restored by using fast restore, the tenant does not support manually triggering major compaction, data backup, or Switchover/Failover to become the primary tenant. The tenant can only exist as a standby tenant.
Full restore: Full restore refers to restoring macroblock data and incremental logs. After all data is restored from the backup medium to the local environment, the restored tenant can provide services. The full restore process includes the restore and recover processes of tenant system tables and user tables. Restore refers to restoring the baseline data required for the restore to the OBServer node of the target tenant. Recover refers to restoring the logs corresponding to the baseline to the corresponding OBServer node.
Fast restore: Fast restore refers to providing services to users without restoring macroblock data. This reduces the restore waiting time and lowers the user cost.
Select the restore time point for your physical restore.
Complete restore: No restore timestamp is specified.
Incomplete restore with specified SCN or timestamp: SCN is the precise version number in OceanBase Database. In Oracle mode, the timestamp is precise to the nanosecond, with no precision loss. In MySQL mode, the timestamp is precise to the microsecond, and the precision after the microsecond is lost.
For the physical restore process, see Restore process.
Backup media requirements
OceanBase Database supports backup media such as Alibaba Cloud OSS, NFS, Azure Blob, AWS S3, and object storage services compatible with the S3 protocol (such as Huawei OBS, Google GCS, and Tencent Cloud COS). Some backup media require meeting certain basic requirements before they can be used.
SDK version requirements
The following table lists the correspondence between the object storage SDK versions and the observer versions.
oss-c-sdk |
s3-cpp-sdk |
|
|---|---|---|
| V4.3.4 and later | 3.11.2 | 1.11.156 |
API requirements
Alibaba Cloud OSS:
The following table lists the interfaces required for official Alibaba Cloud OSS.
API nameDescriptionPutObject Uploads a single object. DeleteObject Deletes a single object. DeleteObjects Deletes objects in batches. GetObject Retrieves an object. ListObjects Lists all objects in the bucket (strong consistency is required). HeadObject Retrieves the metadata of an object. AppendObject Uploads an object in append mode. PutObjectTagging (optional) Sets or updates the tags of an object. GetObjectTagging (optional) Retrieves the tags of an object. InitiateMultipartUpload Initializes a multipart upload. UploadPart Uploads a part. CompleteMultipartUpload Combines uploaded parts into a single object. AbortMultipartUpload Cancels a multipart upload and deletes the uploaded parts. ListMultipartUploads Lists the information about initialized but not completed or not terminated multipart uploads. ListParts Lists the information about uploaded parts in an upload task. Only the V1 signature algorithm is supported.
NFS: The version must be NFS 3 or later.
Object storage services compatible with the S3 protocol (such as Huawei OBS, Google GCS, and Tencent Cloud COS):
The following table lists the S3 API operations that must be supported.
API nameDescriptionPutObject Uploads a single object. DeleteObject Deletes a single object. DeleteObjects Deletes objects in batches. GetObject Downloads a single object. ListObjects Lists all objects under a path. HeadObject Retrieves the metadata of an object. PutObjectTagging (optional) Sets the tags of an object. GetObjectTagging (optional) Retrieves the tags of an object. CreateMultipartUpload Initializes a multipart upload. UploadPart Uploads a single part. CompleteMultipartUpload Combines uploaded parts into a single object. AbortMultipartUpload Aborts a multipart upload and deletes the uploaded parts. ListMultipartUploads Lists the uploaded parts. ListParts Lists the information about uploaded parts in an upload task. The object access URL must support Virtual-hosted–style. For more information about Virtual-hosted–style requests, see AWS S3 documentation.
Before you select a backup medium, you can run the test_io_device command in the ob_admin tool to verify whether the I/O interfaces and current I/O permissions provided by the backup medium meet the requirements for backup and restore. You can also run the io_adapter_benchmark command in the ob_admin tool to view the read and write performance from OBServer nodes to the backup medium, which serves as a reference for backup performance. For more information about the test_io_device and io_adapter_benchmark commands, see test_io_device and io_adapter_benchmark.
Directory structure
Data backup directory
The data backup directory and the file types saved in each directory are as follows:
data_backup_dest
├── format.obbak // The metadata of the backup path.
├── check_file
│ └── 1002_connect_file_20230111T193020.obbak // The connectivity check file.
├── backup_sets // The directory that summarizes all data backup sets.
│ ├── backup_set_1_full_end_success_20230111T193420.obbak // The placeholder for the end of a full backup.
│ ├── backup_set_1_full_start.obbak // The placeholder for the start of a full backup.
│ ├── backup_set_2_inc_start.obbak // The placeholder for the start of an incremental backup.
│ └── backup_set_2_inc_end_success_20230111T194420.obbak // The placeholder for the end of an incremental backup.
└── backup_set_1_full // The directory for a full backup set. The suffix `full` indicates a full backup, and the suffix `inc` indicates an incremental backup.
├── backup_set_1_full_20230111T193330_20230111T193420.obbak // The placeholder for the start and end times of a full backup.
├── single_backup_set_info.obbak // The metadata of the current backup set.
├── tenant_backup_set_infos.obbak // The metadata of all full backups of the current tenant.
├── infos
├── logstream_1 // The directory for log stream 1.
└── logstream_1001 // The directory for log stream 1001.
In the data backup directory, the top-level directory contains the following three types of data:
format.obbak: The metadata of the backup path.check_file: The connectivity check file for the user data backup directory.backup_sets: The directory that summarizes all data backup sets.backup_set_1_full: The directory for a full backup set. The suffixfullindicates a full backup, and the suffixincindicates an incremental backup. A backup set is generated for each data backup. After the data backup is completed, the backup set will no longer be modified.In a data backup set, the following data is mainly stored:
backup_set_1_full_20230111T193330_20230111T193420.obbak: The file that displays the ID, start time, and end time of the current backup set. This file is only for information display.single_backup_set_info.obbak: The metadata of the current backup set, including the backup point and the dependent logs.tenant_backup_set_infos.obbak: The metadata of all backup sets of the current tenant.infos: The directory for the metadata of the data backup set.logstream_1: The directory for all data of log stream 1. Log stream 1 is the system log stream of the OceanBase Database tenant.logstream_1001: The directory for all data of log stream 1001. Log streams with numbers greater than 1000 are the user log streams of the OceanBase Database tenant.
Cluster-level parameter backup directory
When a cluster-level parameter backup is initiated, the system generates a backup file in the specified directory. The specific directory structure is as follows:
cluster_parameters_backup_dest
├── cluster_parameter.20240710T103610.obbak # The backup file for non-default cluster-level parameters. The file naming format is `cluster_parameter.[timestamp]`.
└── cluster_parameter.20241018T140609.obbak
Log archive directory
For backup media such as NFS, OSS, and Azure Blob, the log archive directory and the file types saved in each directory are as follows:
log_archive_dest
├── check_file
│ └── 1002_connect_file_20230111T193049.obbak // Connectivity check file
├── format.obbak // Formatted information of the backup path
├── rounds // Round placeholder directory
│ └── round_d1002r1_start.obarc // Round start placeholder
├── pieces // Piece placeholder directory
│ ├── piece_d1002r1p1_start_20230111T193049.obarc // Piece start placeholder: piece_DESTID_ROUNDID_PIECEID_start_DATE
│ └── piece_d1002r1p1_end_20230111T193249.obarc // Piece end placeholder: piece_DESTID_ROUNDID_PIECEID_end_DATE
└── piece_d1002r1p1 // Piece directory. Directory name format: piece_DESTID_ROUNDID_PIECEID
├── piece_d1002r1p1_20230111T193049_20230111T193249.obarc // Records the continuous interval of the piece
├── checkpoint
├── single_piece_info.obarc // Records the metadata of the current Piece
├── tenant_archive_piece_infos.obarc // Records the metadata of all frozen Pieces in the current tenant
├── file_info.obarc // List of all log stream files
├── logstream_1 // Log stream 1
└── logstream_1001 // Log stream 1001
In the log archive directory above, the top-level directory contains the following three types of data:
format.obbak: Records the metadata of the archive path, including information about the tenants using the path.check_file: Used for connectivity checks of the user's log archive directory.rounds: A summary list of all Rounds in the log archive. It records the list of all Rounds.pieces: A summary list of all Pieces in the log archive. It records the list of all Pieces.piece_d1002r1p1: The directory for a Piece in the log archive, namedpiece_DESTID_ROUNDID_PIECEID. Here,DESTIDrefers to the id corresponding tolog_archive_dest;ROUNDIDrefers to the id of the log archive Round, which is a monotonically increasing integer; andPIECEIDrefers to the id of the log archive Piece, which is also a monotonically increasing integer.Inside a log archive Piece directory, the following data is stored:
piece_d1002r1p1_20230111T193049_20230111T193249.obarc: This file displays the id, start time, and end time of the current Piece and is only for informational purposes.checkpoint: This directory records the archive points of active Pieces. The ObArchiveScheduler module regularly updates the archive point information in this directory. Specifically:single_piece_info.obarc: This file records the metadata of the current Piece.tenant_archive_piece_infos.obarc: This file records the metadata of all frozen Pieces in the current tenant.file_info.obarc: This file records the list of log streams in the Piece.logstream_1: This directory records the log files of log stream 1, which is the system log stream of the OceanBase Database tenant.logstream_1001: This directory records the log files of log stream 1001, which is a user log stream of the OceanBase Database tenant.
Differences from V3.x/V2.x
Log archiving
Feature |
V3.x/V2.2x |
V4.x |
|---|---|---|
| Archiving level | Cluster level | Tenant level |
| Archiving granularity | Partition level | Log stream level |
| Permissions | Only the sys tenant can perform operations, such as setting the archiving path, enabling archiving, and viewing the archiving progress. |
Both the sys tenant and the administrator of a user tenant can perform operations. |
| Usage |
|
Use the ALTER SYSTEM SET LOG_ARCHIVE_DEST statement to set the tenant-level archiving path and piece switching cycle. By default, the cycle is 1d (1 day). The archiving path and data backup path can be independently configured. |
| Piece switching | The piece switching feature is disabled by default. | The piece switching feature is enabled by default, and the cycle is 1 day. |
| Method to set the archiving delay time | Use the ALTER SYSTEM SET LOG_ARCHIVE_CHECKPOINT_INTERVAL statement. |
Use the ALTER SYSTEM SET ARCHIVE_LAG_TARGET statement. |
Result of executing the ALTER SYSTEM ARCHIVELOG statement in the sys tenant |
Enables archiving for all tenants in the current cluster. New tenants created after archiving is enabled will also have archiving enabled. | Enables archiving for all tenants in the current cluster. New tenants created after archiving is enabled will not have archiving enabled. |
| Log compression | Use the ALTER SYSTEM SET BACKUP_LOG_ARCHIVE_OPTION statement. |
Not supported. |
| Views | The following three views are related to archiving:
|
The following eight views are related to archiving:
|
| Media requirements | SSD is required. | HDD or SSD is supported. |
| Number of archive files | The number of files is proportional to the number of partitions. In a scenario with millions of partitions, this will generate a large number of small files. | The number of files is small and not related to the number of partitions. This will not generate a large number of small files. |
| Standby database archiving | Not supported. | Supported. |
Data backup
Feature |
V3.x/V2.2x |
V4.x |
|---|---|---|
| Backup level | Cluster level | Tenant level |
| Privileges | Only the sys tenant can perform backup operations, such as setting the backup path, initiating a backup, and checking the backup progress. |
The sys tenant and the administrator user of a user tenant can perform backup operations. |
| Method for setting the backup path | You can use the ALTER SYSTEM SET BACKUP_DEST statement to set the backup path at the cluster level. |
You can use the ALTER SYSTEM SET DATA_BACKUP_DEST statement to set the backup path at the tenant level. The data backup path and log archiving path can be configured independently. |
| Data backup to a specified path | The sys tenant can execute the ALTER SYSTEM BACKUP TENANT tenant_name_list TO backup_destination; statement to initiate a backup. |
Not supported |
| BACKUP PLUS ARCHIVELOG feature | Not supported | Supported |
| Data snapshot retention during backup | Snapshots are retained during backup, which may cause storage space expansion. | Snapshots are not retained during backup, so no storage space expansion occurs. |
| Backup of standby databases | Not supported | Supported |
| Views | The following five views are related to backup:
|
The following 10 views are related to backup:
|
Physical restore
Feature |
V3.x/V2.2x |
V4.x |
|---|---|---|
| Data path | You can specify the cluster-level backup path in the restore command. | You need to specify both the data backup path and the log archiving path. |
| Restore concurrency setting | Before initiating a restore command, you can use the ALTER SYSTEM SET RESTORE_CONCURRENCY statement to set the restore concurrency. |
You can specify the concurrecy parameter in the restore command. |
| Key management |
|
|
| Tenant role after restore | The primary tenant, which is the primary database. | The standby tenant, which is the standby database. |
| Upgrade | During restore, the tenant is automatically upgraded. | After restore, you need to manually upgrade the tenant. |
| Table-level restore | Supported. You can restore a table only to a new tenant (a tenant created during restore). You cannot restore a table to an existing tenant. | Supported starting from V4.2.1. You can restore a table only to an existing tenant. You cannot restore a table to a new tenant (a tenant created during restore). |
| Quick restore | Not supported | Supported starting from V4.3.3 |
Restore by using the ADD RESTORE SOURCE statement |
Supported | Not supported |
References
For more information about physical backup and restore, see Backup and restore.
