Backup and restore is a core component of the high availability feature of OceanBase Database. It ensures data security by preventing data loss due to storage media damage or user errors. If data is lost due to storage media damage or user errors, you can restore the data.
Overview
The backup and restore module of OceanBase Database provides backup, restore, and cleanup features.
OceanBase Database supports tenant-level physical backup. A physical backup consists of data backup and log archiving, which are two components. Here, a tenant refers to a user tenant. Physical backup is not supported for the sys tenant or the Meta tenant.
Data backup refers to the feature of backing up data. This feature is divided into full backup and incremental backup:
Full backup refers to the backup of all macroblocks.
Incremental backup refers to the backup of macroblocks added or modified after the last backup.
Notice
Before you perform a physical backup, you must enable the log archiving mode.
The data that is backed up in a data backup includes the following:
Tenant-related information, including the tenant name, cluster name, time zone (timezone), replica distribution (locality), and compatibility mode (MySQL or Oracle) of the tenant.
All user table data
Note
A data backup backs up system variables and tenant-level parameters, but does not back up cluster-level parameters or private system table data.
Log archiving refers to the automatic archiving of log data. OBServer nodes regularly archive log data to the specified backup path. This process is fully automatic and does not require external triggering.
The overall architecture of physical restore is as follows:
Physical restore supports tenant-level restore and table-level restore.
Tenant-level restore: Tenant-level restore refers to the process of rebuilding a new tenant based on existing data backups. Tenant-level restore ensures global consistency across tables and partitions.
Table-level restore: Table-level restore refers to restoring user-specified tables from backup data to an existing tenant. The existing tenant can be the same as the original tenant, a different tenant within the same cluster, or a tenant in a different cluster.
Tenant-level restore supports full restore and quick restore.
Notice
A tenant restored by using quick restore cannot be manually initiated to merge, cannot be backed up, and cannot be switched to or failed over to a primary database. It can only exist as a standby database.
Full restore: Full restore refers to the restoration of macroblock data and incremental logs. After all data is restored from the backup media to the local environment, the restored tenant can provide services. The full restore process includes the restore and recover processes for tenant system tables and user tables. Restore refers to restoring the baseline data required for the restore to the OBServer nodes of the target tenant. Recover refers to restoring the logs corresponding to the baseline to the corresponding OBServer nodes.
Quick restore: Quick restore refers to providing services to users without restoring macroblock data. This reduces restore waiting time and lowers user costs.
Choose a restore point for your physical restore.
Complete restore: No restore timestamp is specified.
Incomplete restore with a specified SCN or timestamp: SCN is the precise version number of OceanBase Database. In Oracle mode, the timestamp is precise to the nanosecond, with no precision loss. In MySQL mode, the timestamp is precise to the microsecond, with precision loss after the microsecond.
For more information about the physical restore process, see Restore process.
Backup media requirements
OceanBase Database supports the following backup media: Alibaba Cloud OSS, NFS, Azure Blob (starting from V4.3.5 BP3 in V4.3.5), AWS S3, and object storage services compatible with the S3 protocol (such as Huawei OBS, Google GCS, and Tencent Cloud COS). Some of these backup media require meeting certain basic requirements before they can be used.
SDK version requirements
The following table lists the corresponding object storage SDK versions for different observer versions:
| oss-c-sdk | s3-cpp-sdk | |
|---|---|---|
| 4.3.4 and later | 3.11.2 | 1.11.156 |
Interface requirements
Alibaba Cloud OSS:
The following table lists the interfaces supported by Alibaba Cloud OSS.
Interface Name Description PutObject Uploads a single object. DeleteObject Deletes a single object. DeleteObjects Deletes objects in batches. GetObject Retrieves an object. ListObjects Lists all objects in the bucket. (Strong consistency is required.) HeadObject Retrieves the metadata of an object. AppendObject Uploads an object in append mode. PutObjectTagging (Optional) Sets or updates the tags of an object. GetObjectTagging (Optional) Retrieves the tags of an object. InitiateMultipartUpload Initializes a multipart upload. UploadPart Uploads a part. CompleteMultipartUpload Combines uploaded parts into a single object. AbortMultipartUpload Aborts a multipart upload and deletes the uploaded parts. ListMultipartUploads Lists the information of multipart uploads that have been initialized but not completed or terminated. ListParts Lists the information of uploaded parts in an upload task. Only the V1 signature algorithm is supported.
NFS: The version must be NFS 3 or later.
Object storage services compatible with the S3 protocol (such as Huawei OBS, Google GCS, and Tencent Cloud COS):
The following table lists the S3 API operations that must be supported.
Interface Name Description PutObject Uploads a single object. DeleteObject Deletes a single object. DeleteObjects Deletes objects in batches. GetObject Downloads a single object. ListObjects Lists all objects under a path. HeadObject Retrieves the metadata of an object. PutObjectTagging (Optional) Sets the tags of an object. GetObjectTagging (Optional) Retrieves the tags of an object. CreateMultipartUpload Initializes a multipart upload. UploadPart Uploads a single part. CompleteMultipartUpload Combines uploaded parts into a single object. AbortMultipartUpload Aborts a multipart upload and deletes the uploaded parts. ListMultipartUploads Lists the uploaded parts. ListParts Lists the information of uploaded parts in an upload task. Virtual-hosted–style object access URLs must be supported. For more information about Virtual-hosted–style requests, see AWS S3 documentation.
When selecting a backup media, you can run the test_io_device command in the ob_admin tool to verify whether the I/O interfaces and current I/O permissions provided by the backup media meet the requirements for backup and restore. You can also run the io_adapter_benchmark command in the ob_admin tool to view the read and write performance between an OBServer node and the backup media, which can serve as a reference for backup performance. For more information about the test_io_device and io_adapter_benchmark commands, see test_io_device and io_adapter_benchmark.
Directory Structure
Data backup directory
The following table describes the directories and files created in the backup destination for data backup.
data_backup_dest
├── format.obbak // The directory that stores the metadata of the backup destination.
├── check_file
│ └── 1002_connect_file_20230111T193020.obbak // The connectivity check file.
├── backup_sets // The directory that stores the metadata of all data backup sets.
│ ├── backup_set_1_full_end_success_20230111T193420.obbak // The placeholder file indicating that a full backup has ended.
│ ├── backup_set_1_full_start.obbak // The placeholder file indicating that a full backup has started.
│ ├── backup_set_2_inc_start.obbak // The placeholder file indicating that an incremental backup has started.
│ └── backup_set_2_inc_end_success_20230111T194420.obbak // The placeholder file indicating that an incremental backup has ended.
└── backup_set_1_full // The directory that stores the data of a full backup set. The directory name ends with `full` to indicate a full backup or `inc` to indicate an incremental backup.
├── backup_set_1_full_20230111T193330_20230111T193420.obbak // The placeholder file indicating the start and end time of a full backup.
├── single_backup_set_info.obbak // The file that stores the metadata of the current backup set.
├── tenant_backup_set_infos.obbak // The file that stores the metadata of all full backup sets of the current tenant.
├── infos
├── logstream_1 // The directory that stores all data of log stream 1.
└── logstream_1001 // The directory that stores all data of log stream 1001.
The data backup directory contains the following three types of data:
format.obbak: The directory that stores the metadata of the backup destination.check_file: The directory that stores the connectivity check file.backup_sets: The directory that stores the metadata of all data backup sets.backup_set_1_full: The directory that stores the data of a full backup set. The directory name ends withfullto indicate a full backup orincto indicate an incremental backup. A backup set is generated for each data backup. After a data backup is completed, the backup set will no longer be modified.The data of a data backup set includes the following:
backup_set_1_full_20230111T193330_20230111T193420.obbak: The file that stores the ID, start time, and end time of the current backup set. This file is only for display.single_backup_set_info.obbak: The file that stores the metadata of the current backup set, including the backup point and the logs that are dependent on the backup.tenant_backup_set_infos.obbak: The file that stores the metadata of all full backup sets of the current tenant.infos: The directory that stores the metadata of the data backup set.logstream_1: The directory that stores all data of log stream 1. Log stream 1 is the system log stream of an OceanBase Database tenant.logstream_1001: The directory that stores all data of log stream 1001. Log streams greater than 1000 are the user log streams of an OceanBase Database tenant.
Cluster-level parameter backup directory
Every time you initiate a backup of cluster-level parameters, the system generates a backup file in the specified directory. The directory structure is as follows:
cluster_parameters_backup_dest
├── cluster_parameter.20240710T103610.obbak # The backup file that stores the metadata of non-default cluster-level parameters. The file name is in the format of `cluster_parameter.[timestamp]`.
└── cluster_parameter.20241018T140609.obbak
Log archive directory
For backup media such as NFS, OSS, and Azure Blob, the log archive directory and the file types stored in each directory are as follows:
log_archive_dest
├── check_file
│ └── 1002_connect_file_20230111T193049.obbak // Connectivity check file
├── format.obbak // Metadata of the backup path
├── rounds // Placeholder directory for Rounds
│ └── round_d1002r1_start.obarc // Placeholder for the start of a Round
├── pieces // Placeholder directory for Pieces
│ ├── piece_d1002r1p1_start_20230111T193049.obarc // Placeholder for the start of a Piece, named as piece_DESTID_ROUNDID_PIECEID_start_DATE
│ └── piece_d1002r1p1_end_20230111T193249.obarc // Placeholder for the end of a Piece, named as piece_DESTID_ROUNDID_PIECEID_end_DATE
└── piece_d1002r1p1 // Directory for a Piece, named as piece_DESTID_ROUNDID_PIECEID
├── piece_d1002r1p1_20230111T193049_20230111T193249.obarc // Records the continuous interval of the Piece
├── checkpoint
├── single_piece_info.obarc // Metadata of the current Piece
├── tenant_archive_piece_infos.obarc // Metadata of all frozen Pieces before the current Piece
├── file_info.obarc // List of all log stream files
├── logstream_1 // Log stream 1
└── logstream_1001 // Log stream 1001
In the log archive directory above, the top-level directory contains the following data:
format.obbak: Records the metadata of the archive path, including information about the tenant using the path.check_file: Used for connectivity check of the log archive directory.rounds: A list of all Rounds for log archiving.pieces: A list of all Pieces for log archiving.piece_d1002r1p1: The directory for a Piece, named aspiece_DESTID_ROUNDID_PIECEID. Here,DESTIDrefers to the id corresponding tolog_archive_dest;ROUNDIDrefers to the id of the log archive Round, which is a monotonically increasing integer; andPIECEIDrefers to the id of the log archive Piece, which is also a monotonically increasing integer.Within a log archive Piece directory, the following data is stored:
piece_d1002r1p1_20230111T193049_20230111T193249.obarc: This file displays the id, start, and end time of the current Piece and is used only for informational purposes.checkpoint: The directory for recording the archive position of active Pieces. The ObArchiveScheduler module periodically updates the position information in this directory. Here:single_piece_info.obarc: Records the metadata of the current Piece.tenant_archive_piece_infos.obarc: Records the metadata of all frozen Pieces in the current tenant.file_info.obarc: Records the list of log stream files in the Piece.logstream_1: The directory for storing log files of log stream 1, which is the system log stream of the OceanBase Database tenant.logstream_1001: The directory for storing log files of log stream 1001. Log streams with IDs greater than 1000 are user log streams of the OceanBase Database tenant.
Differences from V3.x/V2.x features
Log archiving
| Feature | V3.x/V2.2x | V4.x |
|---|---|---|
| Archiving level | Cluster-level | Tenant-level |
| Archiving granularity | Partition-level | Log stream-level |
| Permissions | Only the sys tenant can perform operations such as setting the archiving path, enabling archiving, and checking the archiving progress. |
Both the sys tenant and the administrator of a user tenant can perform operations. |
| Usage |
|
Use the ALTER SYSTEM SET LOG_ARCHIVE_DEST statement to set the tenant-level archiving path and piece switching cycle. By default, the cycle is 1d (1 day). The log archiving path and data backup path can be configured independently. |
| Piece switching | Piece switching is not enabled by default. | Piece switching is enabled by default with a cycle of 1 day. |
| Method to set the archive lag time | Use the ALTER SYSTEM SET LOG_ARCHIVE_CHECKPOINT_INTERVAL statement. |
Use the ALTER SYSTEM SET ARCHIVE_LAG_TARGET statement. |
Result of executing the ALTER SYSTEM ARCHIVELOG statement in the sys tenant |
Enables archiving for all tenants in the current cluster. New tenants created after archiving is enabled will also have archiving enabled. | Enables archiving for all tenants in the current cluster. New tenants created after archiving is enabled will not have archiving enabled. |
| Log compression | Use the ALTER SYSTEM SET BACKUP_LOG_ARCHIVE_OPTION statement. |
Not supported. |
| Views | The following three views are related to archiving:
|
The following eight views are related to archiving:
|
| Media requirements | SSD is required. | HDD or SSD is supported. |
| Number of archive files | The number of files is proportional to the number of partitions. In scenarios with millions of partitions, this can lead to a large number of small files. | The number of files is small and independent of the number of partitions, avoiding the issue of a large number of small files. |
| Standby archiving | Not supported. | Supported. |
Data backup
| Feature | V3.x/V2.2x | V4.x |
|---|---|---|
| Backup level | Cluster level | Tenant level |
| Privileges | Only the sys tenant can perform backup operations, such as setting the backup path, initiating a backup, and viewing the backup progress. |
Both the sys tenant and the administrator of a user tenant can perform backup operations. |
| Backup path setting method | Use the ALTER SYSTEM SET BACKUP_DEST statement to set the backup path at the cluster level. |
Use the ALTER SYSTEM SET DATA_BACKUP_DEST statement to set the backup path at the tenant level. The data backup path and log archive path can be independently configured. |
| Data backup to a specified path | The sys tenant initiates data backup by executing the ALTER SYSTEM BACKUP TENANT tenant_name_list TO backup_destination; statement. |
Not supported. |
| BACKUP PLUS ARCHIVELOG feature | Not supported. | Supported. |
| Space expansion | Snapshot points are retained during backup, leading to storage space expansion during the backup process. | Snapshot points are not retained, so no space expansion occurs. |
| Standby database backup | Not supported. | Supported. |
| Views | The backup-related views are as follows:
|
The backup-related views are as follows:
|
Physical restore
| Feature | V3.x/V2.2x | V4.x |
|---|---|---|
| Data path | Provide the cluster-level backup path in the restore command. | Provide both the data backup path and the log archive path. |
| Restore concurrency setting | Before initiating the restore command, use the ALTER SYSTEM SET RESTORE_CONCURRENCY statement to set the concurrency. |
Specify the concurrecy parameter in the restore command. |
| Key management method |
|
|
| Tenant role after restore | Primary tenant, which is the primary database. | Standby tenant, which is the standby database. |
| Upgrade | The tenant is automatically upgraded during the restore process. | The tenant must be manually upgraded after the restore is completed. |
| Table-level restore | Supported. Only supports restoring tables to new tenants (tenants created during the restore process). Does not support restoring tables to existing tenants. | Supported starting from V4.2.1. Only supports restoring tables to existing tenants. Does not support restoring tables to new tenants (tenants created during the restore process). |
| Quick restore | Not supported. | Supported starting from V4.3.3. |
Restore using the ADD RESTORE SOURCE statement |
Supported. | Not supported. |
Partner certifications
For backup and restoration partners that have been certified with OceanBase Database, see Industry partners.
On the Industry partners page, under Solution type selection, choose Data integration and backup > Backup and recovery to see the list of backup and recovery partners certified for OceanBase Database, the certified versions, and certificate validity periods.
References
For more information about physical backup and restore, see Backup and restore.
