Physical backup and restore|V4.4.2|OceanBase Database| docs|Distributed Database

Backup and restore is a core component of the high availability features of OceanBase Database. It is mainly used to ensure data security, including protection against storage medium damage and user errors. If data is lost due to storage medium damage or user errors, you can restore the data through recovery.

Overview

The backup and restore module of OceanBase Database provides backup, backup validation (for V4.4.2, supported starting from V4.4.2 BP2), restore, and cleanup capabilities.

Backup

OceanBase Database supports tenant-level physical backup. Physical backup consists of data backup and log archiving, so it is composed of these two features. Here, the tenant refers to the user tenant. Physical backup is not supported for the sys tenant and Meta tenant.

Data backup refers to the feature of backing up data, which is divided into full backup and incremental backup:

Full backup backs up all macroblocks.
Incremental backup backs up macroblocks added or modified since the last backup.

Notice

Before you perform a physical backup, you must enable log archiving first.

The data backed up in data backup includes the following:

Tenant-related information, including the tenant name, cluster name, timezone, locality, and compatibility mode (MySQL or Oracle) of the tenant.
Data of all user tables

Note

Data backup backs up system variables and tenant parameters but does not back up cluster-level parameters or private system table data.

Log archiving refers to the automatic archiving of log data. The OBServer node periodically archives log data to the specified backup path. This action is fully automatic and does not require external triggering. After the log archiving service is enabled, the data dictionary is also archived.

Backup validation

The backup validation feature of OceanBase Database is mainly used to address the integrity and correctness of backup and archived data. Users can proactively verify whether backup sets and archived logs have issues such as missing files, physical data corruption, or logical inconsistencies without performing an actual recovery operation.

Backup validation is divided into Basic validation and Physical validation by validation level:

Basic validation: Verifies only the integrity of the file list.
Physical validation: Verifies the correctness of physical data and logical consistency.

By validation granularity, backup validation supports validation of entire backup and archive paths, specified backup sets or log archive pieces, and cross-cluster validation at specified paths.

Restore

The overall physical restore architecture is as follows:

Physical restore supports tenant-level restore and table-level restore.
- Tenant-level restore: Tenant-level restore is the process of rebuilding a new tenant based on existing backup data. Tenant-level restore ensures global consistency across tables and partitions.
- Table-level restore: Table-level restore restores user-specified tables from backup data to an existing tenant. The target tenant can be the same as the source tenant, a different tenant in the same cluster, or a tenant in a different cluster.
Tenant-level restore supports full restore and quick restore.

Notice

After a tenant is restored by using quick restore, the tenant does not support manually triggering major compaction or data backup, or Switchover/Failover to become the primary tenant. The tenant can only exist as a standby tenant.
- Full restore: Restores macroblock data and incremental logs. After all data is restored from the backup medium to the local environment, the restored tenant can provide services. The full restore process includes the Restore and Recover processes for tenant system tables and user tables. Restore refers to restoring the baseline data required for restore to the OBServer node of the target tenant. Recover refers to restoring the logs corresponding to the baseline to the corresponding OBServer node.
- Quick restore: Provides services to users without restoring macroblock data. This reduces the restore waiting time and lowers user costs.
Select the restore time point for your physical restore.
- Complete restore: Does not specify a restore timestamp.
- Incomplete restore with a specified SCN or timestamp: SCN is the precise version number in OceanBase Database. In Oracle-compatible mode, the timestamp is precise to the nanosecond, with no precision loss. In MySQL-compatible mode, the timestamp is precise to the microsecond, and precision after the microsecond is lost.

For the physical restore process, see Restore process.

Cleanup

OceanBase Database provides automatic and manual cleanup capabilities for backup data, allowing users to handle backup data flexibly based on business scenarios.

Automatic cleanup: An automatic cleanup mechanism based on the recovery window (recovery_window). After you set an expiration time for backup data, the system periodically triggers cleanup tasks to remove expired backup sets and archived log pieces.
Manual cleanup: Manual cleanup can be divided into two categories: manually cleaning up specified backup sets or log archive pieces, and manually clearing all backup sets or archived log pieces in an entire path (data backup path or log archive path).

Backup media requirements

OceanBase Database supports backup media such as Alibaba Cloud OSS, NFS, Azure Blob, AWS S3, and object storage services compatible with the S3 protocol (such as Huawei OBS, Google GCS, and Tencent Cloud COS). Some backup media require meeting certain basic requirements before they can be used.

SDK version requirements

The following table lists the correspondence between the object storage SDK versions and the observer versions.

	oss-c-sdk	s3-cpp-sdk
V4.3.4 and later	3.11.2	1.11.156

API requirements

Alibaba Cloud OSS:

The following table lists the interfaces required for official Alibaba Cloud OSS.

API name	Description
PutObject	Uploads a single object.
DeleteObject	Deletes a single object.
DeleteObjects	Deletes objects in batches.
GetObject	Retrieves an object.
ListObjects	Lists all objects in the bucket (strong consistency is required).
HeadObject	Retrieves the metadata of an object.
AppendObject	Uploads an object in append mode.
PutObjectTagging (optional)	Sets or updates the tags of an object.
GetObjectTagging (optional)	Retrieves the tags of an object.
InitiateMultipartUpload	Initializes a multipart upload.
UploadPart	Uploads a part.
CompleteMultipartUpload	Combines uploaded parts into a single object.
AbortMultipartUpload	Cancels a multipart upload and deletes the uploaded parts.
ListMultipartUploads	Lists the information about initialized but not completed or not terminated multipart uploads.
ListParts	Lists the information about uploaded parts in an upload task.

Only the V1 signature algorithm is supported.

NFS: The version must be NFS 3 or later.

Object storage services compatible with the S3 protocol (such as Huawei OBS, Google GCS, and Tencent Cloud COS):

The following table lists the S3 API operations that must be supported.

API name	Description
PutObject	Uploads a single object.
DeleteObject	Deletes a single object.
DeleteObjects	Deletes objects in batches.
GetObject	Downloads a single object.
ListObjects	Lists all objects under a path.
HeadObject	Retrieves the metadata of an object.
PutObjectTagging (optional)	Sets the tags of an object.
GetObjectTagging (optional)	Retrieves the tags of an object.
CreateMultipartUpload	Initializes a multipart upload.
UploadPart	Uploads a single part.
CompleteMultipartUpload	Combines uploaded parts into a single object.
AbortMultipartUpload	Aborts a multipart upload and deletes the uploaded parts.
ListMultipartUploads	Lists the uploaded parts.
ListParts	Lists the information about uploaded parts in an upload task.

The object access URL must support Virtual-hosted–style. For more information about Virtual-hosted–style requests, see AWS S3 documentation.

Before you select a backup medium, you can run the test_io_device command in the ob_admin tool to verify whether the I/O interfaces and current I/O permissions provided by the backup medium meet the requirements for backup and restore. You can also run the io_adapter_benchmark command in the ob_admin tool to view the read and write performance from OBServer nodes to the backup medium, which serves as a reference for backup performance. For more information about the test_io_device and io_adapter_benchmark commands, see test_io_device and io_adapter_benchmark.

Directory structure

Data backup directory

The directories created by the data backup feature at the backup destination and the file types saved in each directory are as follows:

data_backup_dest
├── format.obbak                                            // The metadata of the backup path.
├── check_file
│   └── 1002_connect_file_20230111T193020.obbak             // The connectivity check file.
├── backup_sets                                             // The directory that summarizes all data backup sets.
│   ├── backup_set_1_full_end_success_20230111T193420.obbak // The placeholder for the end of a full backup.
│   ├── backup_set_1_full_start.obbak                       // The placeholder for the start of a full backup.
│   ├── backup_set_2_inc_start.obbak                        // The placeholder for the start of an incremental backup.
│   └── backup_set_2_inc_end_success_20230111T194420.obbak  // The placeholder for the end of an incremental backup.
└── backup_set_1_full                                        // The directory for a full backup set. The suffix `full` indicates a full backup, and the suffix `inc` indicates an incremental backup.
    ├── file_list.0.obbak
    ├── backup_set_1_full_20230111T193330_20230111T193420.obbak // The placeholder that displays the start and end times of the full backup.
    ├── single_backup_set_info.obbak                        // The metadata of the current backup set.
    ├── tenant_backup_set_infos.obbak                       // The metadata of all full backup sets of the current tenant.
    ├── infos
    ├── logstream_1                                          // Log stream 1
    └── logstream_1001                                       // Log stream 1001

In the data backup directory, the top-level directory contains the following three types of data:

format.obbak: Records the metadata of the backup path.
check_file: Used for connectivity checks of the user data backup directory.
backup_sets: The directory that summarizes all data backup sets.
backup_set_1_full: This directory represents a data backup set. backup_set_1 indicates that the backup_set_id of the backup set is 1. The suffix full indicates a full backup, and the suffix inc indicates an incremental backup. Each data backup generates a corresponding backup set. After the data backup is completed, the backup set is no longer modified.

A data backup set mainly contains the following data:
- file_list.0.obbak: This file records the files and directories contained in the current directory.
- backup_set_1_full_20230111T193330_20230111T193420.obbak: This file displays the ID, start time, and end time of the current backup set. This file is only for information display.
- single_backup_set_info.obbak: This file records the metadata of the current backup set, including the backup position and dependent logs.
- tenant_backup_set_infos.obbak: This file records the metadata of all backup sets of the current tenant.
- infos: This directory records the metadata of the data backup set.
- logstream_1: This directory records all data of log stream 1. Log stream 1 is the system log stream of the OceanBase Database tenant.
- logstream_1001: This directory records all data of log stream 1001. Log streams with IDs greater than 1000 are user log streams of the OceanBase Database tenant.

Cluster-level parameter backup directory

Each time a cluster-level parameter backup is initiated, the system generates a cluster-level parameter backup file in the specified directory. The directory structure is as follows:

cluster_parameters_backup_dest
├── cluster_parameter.20240710T103610.obbak # The backup file for non-default cluster-level parameters. The file naming format is `cluster_parameter.[timestamp]`.
└── cluster_parameter.20241018T140609.obbak

Log archive directory

For backup media such as NFS, OSS, and Azure Blob, the directories created by the log archiving feature at the archive destination and the file types saved in each directory are as follows:

log_archive_dest
    ├── check_file
    │   └── 1002_connect_file_20230111T193049.obbak // Connectivity check file
    ├── format.obbak                                // Formatted information of the backup path
    ├── rounds                                      // Round placeholder directory
    │   └── round_d1002r1_start.obarc               // Round start placeholder
    ├── pieces                                      // Piece placeholder directory
    │   ├── piece_d1002r1p1_start_20230111T193049.obarc // Piece start placeholder: piece_DESTID_ROUNDID_PIECEID_start_DATE
    │   └── piece_d1002r1p1_end_20230111T193249.obarc   // Piece end placeholder: piece_DESTID_ROUNDID_PIECEID_end_DATE
    └── piece_d1002r1p1                             // Piece directory. Directory name format: piece_DESTID_ROUNDID_PIECEID
        ├── file_list.0.obarc
        ├── piece_d1002r1p1_20230111T193049_20230111T193249.obarc // Records the continuous interval of the Piece
        ├── checkpoint
        ├── single_piece_info.obarc                 // Records the metadata of the current Piece
        ├── tenant_archive_piece_infos.obarc        // Records the metadata of all frozen Pieces before the current Piece
        ├── file_info.obarc                         // List of all log stream files
        ├── logstream_1                             // Log stream 1
        └── logstream_1001                          // Log stream 1001

In the above log archive directory, the top-level directory contains the following three types of data:

format.obbak: Records the metadata of the archive path, including information about the tenants using the path.
check_file: Used for connectivity checks of the user log archive directory.
rounds: The summary list of Rounds in log archiving. It records the list of all Rounds.
pieces: The summary list of Pieces in log archiving. It records the list of all Pieces.
piece_d1002r1p1: The directory for a Piece in log archiving. The directory name format is piece_DESTID_ROUNDID_PIECEID. DESTID refers to the ID corresponding to log_archive_dest; ROUNDID refers to the ID of the log archive Round, which is a monotonically increasing integer; PIECEID refers to the piece_id of the log archive Piece, which is also a monotonically increasing integer.

A log archive Piece directory contains the following data:
- file_list.0.obarc: This file records the files and directories contained in the current directory.
- piece_d1002r1p1_20230111T193049_20230111T193249.obarc: This file displays the ID, start time, and end time of the current Piece and is only for information display.
- checkpoint: This directory records archive positions for active Pieces. The ObArchiveScheduler module periodically updates the position information in this directory.
- single_piece_info.obarc: This file records the metadata of the current Piece.
- tenant_archive_piece_infos.obarc: This file records the metadata of all frozen Pieces in the current tenant.
- file_info.obarc: This file records the list of log streams in the Piece.
- logstream_1: This directory records the log files of log stream 1. Log stream 1 is the system log stream of the OceanBase Database tenant.
- logstream_1001: This directory records the log files of log stream 1001. Log streams with IDs greater than 1000 are user log streams of the OceanBase Database tenant.

Differences from V3.x/V2.x features

Log archiving

Feature	V3.x/V2.2x	V4.x
Archiving level	Cluster level	Tenant level
Archiving granularity	Partition level	Log stream level
Permissions	Only the `sys` tenant can perform operations, such as setting the archiving path, enabling archiving, and viewing the archiving progress.	Both the `sys` tenant and the administrator of a user tenant can perform operations.
Usage	Use the `ALTER SYSTEM SET BACKUP_DEST` statement to set the cluster-level backup path. Use the `ALTER SYSTEM SET BACKUP_DEST_OPTION` statement to set the piece switching cycle. By default, the piece switching feature is disabled.	Use the `ALTER SYSTEM SET LOG_ARCHIVE_DEST` statement to set the tenant-level archiving path and piece switching cycle. By default, the cycle is `1d` (1 day). The archiving path and data backup path can be independently configured.
Piece switching	Piece switching can be disabled, and it is disabled by default.	Only piece switching is allowed, and the default cycle is 1 day.
Method to set the archiving delay time	Use the `ALTER SYSTEM SET LOG_ARCHIVE_CHECKPOINT_INTERVAL` statement.	Use the `ALTER SYSTEM SET ARCHIVE_LAG_TARGET` statement.
Result of executing the `ALTER SYSTEM ARCHIVELOG` statement in the `sys` tenant	Enables archiving for all tenants in the current cluster. New tenants created after archiving is enabled will also have archiving enabled.	Enables archiving for all tenants in the current cluster. New tenants created after archiving is enabled will not have archiving enabled.
Log compression	Use the `ALTER SYSTEM SET BACKUP_LOG_ARCHIVE_OPTION` statement.	Not supported.
Views	The following three views are related to archiving: CDB_OB_BACKUP_ARCHIVELOG CDB_OB_BACKUP_ARCHIVELOG_SUMMARY CDB_OB_BACKUP_PIECE_FILES	The following eight views are related to archiving: CDB_OB_ARCHIVELOG DBA_OB_ARCHIVELOG CDB_OB_ARCHIVELOG_SUMMARY DBA_OB_ARCHIVELOG_SUMMARY CDB_OB_ARCHIVE_DEST DBA_OB_ARCHIVE_DEST CDB_OB_ARCHIVELOG_PIECE_FILES DBA_OB_ARCHIVELOG_PIECE_FILES
Media requirements	SSD is required.	HDD or SSD is supported.
Number of archive files	The number of files is proportional to the number of partitions. In a scenario with millions of partitions, this will generate a large number of small files.	The number of files is small and not related to the number of partitions. This will not generate a large number of small files.
Physical standby database archiving	Not supported.	Supported.

Data backup

Feature	V3.x/V2.2x	V4.x
Backup level	Cluster level	Tenant level
Privileges	Only the `sys` tenant can perform backup operations, such as setting the backup path, initiating a backup, and checking the backup progress.	The `sys` tenant and the administrator user of a user tenant can perform backup operations.
Method for setting the backup path	You can use the `ALTER SYSTEM SET BACKUP_DEST` statement to set the backup path at the cluster level.	You can use the `ALTER SYSTEM SET DATA_BACKUP_DEST` statement to set the backup path at the tenant level. The data backup path and log archiving path can be configured independently.
Data backup to a specified path	The `sys` tenant can execute the `ALTER SYSTEM BACKUP TENANT tenant_name_list TO backup_destination;` statement to initiate a backup.	Not supported
BACKUP PLUS ARCHIVELOG feature	Not supported	Supported
Space expansion during backup	Snapshots are retained during backup, which may cause storage space expansion.	Snapshots are not retained during backup, so storage space does not expand.
Physical standby database backup	Not supported	Supported
Views	The following five views are related to backup: CDB_OB_BACKUP_JOB_DETAILS CDB_OB_BACKUP_SET_DETAILS CDB_OB_BACKUP_SET_EXPIRED CDB_OB_BACKUP_PROGRESS CDB_OB_BACKUP_SET_FILES	The following 10 views are related to backup: CDB_OB_BACKUP_JOBS DBA_OB_BACKUP_JOBS CDB_OB_BACKUP_JOB_HISTORY DBA_OB_BACKUP_JOB_HISTORY CDB_OB_BACKUP_TASKS DBA_OB_BACKUP_TASKS CDB_OB_BACKUP_TASK_HISTORY DBA_OB_BACKUP_TASK_HISTORY CDB_OB_BACKUP_SET_FILES DBA_OB_BACKUP_SET_FILES

Physical restore

Feature	V3.x/V2.2x	V4.x
Data path	You can specify the cluster-level backup path in the restore command.	You need to specify both the data backup path and the log archiving path.
Restore concurrency setting	Before initiating a restore command, you can use the `ALTER SYSTEM SET RESTORE_CONCURRENCY` statement to set the restore concurrency.	You can specify the `concurrecy` parameter in the restore command.
Key management	Internal mode does not require key information. In KMS mode, key information is not required if the original service is accessible.	The master key must be backed up separately. During restore, you must specify the master key address.
Tenant role after restore	The primary tenant, which is the primary database.	The standby tenant, which is the standby database.
Upgrade	During restore, the tenant is automatically upgraded.	After restore, you need to manually upgrade the tenant.
Table-level restore	Supported. You can restore a table only to a new tenant (a tenant created during restore). You cannot restore a table to an existing tenant.	Supported starting from V4.2.1. You can restore a table only to an existing tenant. You cannot restore a table to a new tenant (a tenant created during restore).
Quick restore	Not supported	Supported starting from V4.3.3
Restore by using the `ADD RESTORE SOURCE` statement	Supported	Not supported

References

For more information about physical backup and restore, see Backup and restore.

Overview

The backup and restore module of OceanBase Database provides backup, backup validation (for V4.4.2, supported starting from V4.4.2 BP2), restore, and cleanup capabilities.

Backup

Data backup refers to the feature of backing up data, which is divided into full backup and incremental backup:

Full backup backs up all macroblocks.
Incremental backup backs up macroblocks added or modified since the last backup.

Notice

Before you perform a physical backup, you must enable log archiving first.

The data backed up in data backup includes the following:

Tenant-related information, including the tenant name, cluster name, timezone, locality, and compatibility mode (MySQL or Oracle) of the tenant.
Data of all user tables

Note

Data backup backs up system variables and tenant parameters but does not back up cluster-level parameters or private system table data.

Backup validation

Backup validation is divided into Basic validation and Physical validation by validation level:

Basic validation: Verifies only the integrity of the file list.
Physical validation: Verifies the correctness of physical data and logical consistency.

By validation granularity, backup validation supports validation of entire backup and archive paths, specified backup sets or log archive pieces, and cross-cluster validation at specified paths.

Restore

The overall physical restore architecture is as follows:

Physical restore supports tenant-level restore and table-level restore.
- Tenant-level restore: Tenant-level restore is the process of rebuilding a new tenant based on existing backup data. Tenant-level restore ensures global consistency across tables and partitions.
- Table-level restore: Table-level restore restores user-specified tables from backup data to an existing tenant. The target tenant can be the same as the source tenant, a different tenant in the same cluster, or a tenant in a different cluster.
Tenant-level restore supports full restore and quick restore.

Notice

After a tenant is restored by using quick restore, the tenant does not support manually triggering major compaction or data backup, or Switchover/Failover to become the primary tenant. The tenant can only exist as a standby tenant.
- Full restore: Restores macroblock data and incremental logs. After all data is restored from the backup medium to the local environment, the restored tenant can provide services. The full restore process includes the Restore and Recover processes for tenant system tables and user tables. Restore refers to restoring the baseline data required for restore to the OBServer node of the target tenant. Recover refers to restoring the logs corresponding to the baseline to the corresponding OBServer node.
- Quick restore: Provides services to users without restoring macroblock data. This reduces the restore waiting time and lowers user costs.
Select the restore time point for your physical restore.
- Complete restore: Does not specify a restore timestamp.
- Incomplete restore with a specified SCN or timestamp: SCN is the precise version number in OceanBase Database. In Oracle-compatible mode, the timestamp is precise to the nanosecond, with no precision loss. In MySQL-compatible mode, the timestamp is precise to the microsecond, and precision after the microsecond is lost.

For the physical restore process, see Restore process.

Cleanup

OceanBase Database provides automatic and manual cleanup capabilities for backup data, allowing users to handle backup data flexibly based on business scenarios.

Automatic cleanup: An automatic cleanup mechanism based on the recovery window (recovery_window). After you set an expiration time for backup data, the system periodically triggers cleanup tasks to remove expired backup sets and archived log pieces.
Manual cleanup: Manual cleanup can be divided into two categories: manually cleaning up specified backup sets or log archive pieces, and manually clearing all backup sets or archived log pieces in an entire path (data backup path or log archive path).

Backup media requirements

SDK version requirements

The following table lists the correspondence between the object storage SDK versions and the observer versions.

	oss-c-sdk	s3-cpp-sdk
V4.3.4 and later	3.11.2	1.11.156

API requirements

Alibaba Cloud OSS:

The following table lists the interfaces required for official Alibaba Cloud OSS.

API name	Description
PutObject	Uploads a single object.
DeleteObject	Deletes a single object.
DeleteObjects	Deletes objects in batches.
GetObject	Retrieves an object.
ListObjects	Lists all objects in the bucket (strong consistency is required).
HeadObject	Retrieves the metadata of an object.
AppendObject	Uploads an object in append mode.
PutObjectTagging (optional)	Sets or updates the tags of an object.
GetObjectTagging (optional)	Retrieves the tags of an object.
InitiateMultipartUpload	Initializes a multipart upload.
UploadPart	Uploads a part.
CompleteMultipartUpload	Combines uploaded parts into a single object.
AbortMultipartUpload	Cancels a multipart upload and deletes the uploaded parts.
ListMultipartUploads	Lists the information about initialized but not completed or not terminated multipart uploads.
ListParts	Lists the information about uploaded parts in an upload task.

Only the V1 signature algorithm is supported.

NFS: The version must be NFS 3 or later.

Object storage services compatible with the S3 protocol (such as Huawei OBS, Google GCS, and Tencent Cloud COS):

The following table lists the S3 API operations that must be supported.

API name	Description
PutObject	Uploads a single object.
DeleteObject	Deletes a single object.
DeleteObjects	Deletes objects in batches.
GetObject	Downloads a single object.
ListObjects	Lists all objects under a path.
HeadObject	Retrieves the metadata of an object.
PutObjectTagging (optional)	Sets the tags of an object.
GetObjectTagging (optional)	Retrieves the tags of an object.
CreateMultipartUpload	Initializes a multipart upload.
UploadPart	Uploads a single part.
CompleteMultipartUpload	Combines uploaded parts into a single object.
AbortMultipartUpload	Aborts a multipart upload and deletes the uploaded parts.
ListMultipartUploads	Lists the uploaded parts.
ListParts	Lists the information about uploaded parts in an upload task.

The object access URL must support Virtual-hosted–style. For more information about Virtual-hosted–style requests, see AWS S3 documentation.

Directory structure

Data backup directory

The directories created by the data backup feature at the backup destination and the file types saved in each directory are as follows:

data_backup_dest
├── format.obbak                                            // The metadata of the backup path.
├── check_file
│   └── 1002_connect_file_20230111T193020.obbak             // The connectivity check file.
├── backup_sets                                             // The directory that summarizes all data backup sets.
│   ├── backup_set_1_full_end_success_20230111T193420.obbak // The placeholder for the end of a full backup.
│   ├── backup_set_1_full_start.obbak                       // The placeholder for the start of a full backup.
│   ├── backup_set_2_inc_start.obbak                        // The placeholder for the start of an incremental backup.
│   └── backup_set_2_inc_end_success_20230111T194420.obbak  // The placeholder for the end of an incremental backup.
└── backup_set_1_full                                        // The directory for a full backup set. The suffix `full` indicates a full backup, and the suffix `inc` indicates an incremental backup.
    ├── file_list.0.obbak
    ├── backup_set_1_full_20230111T193330_20230111T193420.obbak // The placeholder that displays the start and end times of the full backup.
    ├── single_backup_set_info.obbak                        // The metadata of the current backup set.
    ├── tenant_backup_set_infos.obbak                       // The metadata of all full backup sets of the current tenant.
    ├── infos
    ├── logstream_1                                          // Log stream 1
    └── logstream_1001                                       // Log stream 1001

In the data backup directory, the top-level directory contains the following three types of data:

format.obbak: Records the metadata of the backup path.
check_file: Used for connectivity checks of the user data backup directory.
backup_sets: The directory that summarizes all data backup sets.
backup_set_1_full: This directory represents a data backup set. backup_set_1 indicates that the backup_set_id of the backup set is 1. The suffix full indicates a full backup, and the suffix inc indicates an incremental backup. Each data backup generates a corresponding backup set. After the data backup is completed, the backup set is no longer modified.

A data backup set mainly contains the following data:
- file_list.0.obbak: This file records the files and directories contained in the current directory.
- backup_set_1_full_20230111T193330_20230111T193420.obbak: This file displays the ID, start time, and end time of the current backup set. This file is only for information display.
- single_backup_set_info.obbak: This file records the metadata of the current backup set, including the backup position and dependent logs.
- tenant_backup_set_infos.obbak: This file records the metadata of all backup sets of the current tenant.
- infos: This directory records the metadata of the data backup set.
- logstream_1: This directory records all data of log stream 1. Log stream 1 is the system log stream of the OceanBase Database tenant.
- logstream_1001: This directory records all data of log stream 1001. Log streams with IDs greater than 1000 are user log streams of the OceanBase Database tenant.

Cluster-level parameter backup directory

Each time a cluster-level parameter backup is initiated, the system generates a cluster-level parameter backup file in the specified directory. The directory structure is as follows:

cluster_parameters_backup_dest
├── cluster_parameter.20240710T103610.obbak # The backup file for non-default cluster-level parameters. The file naming format is `cluster_parameter.[timestamp]`.
└── cluster_parameter.20241018T140609.obbak

Log archive directory

For backup media such as NFS, OSS, and Azure Blob, the directories created by the log archiving feature at the archive destination and the file types saved in each directory are as follows:

log_archive_dest
    ├── check_file
    │   └── 1002_connect_file_20230111T193049.obbak // Connectivity check file
    ├── format.obbak                                // Formatted information of the backup path
    ├── rounds                                      // Round placeholder directory
    │   └── round_d1002r1_start.obarc               // Round start placeholder
    ├── pieces                                      // Piece placeholder directory
    │   ├── piece_d1002r1p1_start_20230111T193049.obarc // Piece start placeholder: piece_DESTID_ROUNDID_PIECEID_start_DATE
    │   └── piece_d1002r1p1_end_20230111T193249.obarc   // Piece end placeholder: piece_DESTID_ROUNDID_PIECEID_end_DATE
    └── piece_d1002r1p1                             // Piece directory. Directory name format: piece_DESTID_ROUNDID_PIECEID
        ├── file_list.0.obarc
        ├── piece_d1002r1p1_20230111T193049_20230111T193249.obarc // Records the continuous interval of the Piece
        ├── checkpoint
        ├── single_piece_info.obarc                 // Records the metadata of the current Piece
        ├── tenant_archive_piece_infos.obarc        // Records the metadata of all frozen Pieces before the current Piece
        ├── file_info.obarc                         // List of all log stream files
        ├── logstream_1                             // Log stream 1
        └── logstream_1001                          // Log stream 1001

In the above log archive directory, the top-level directory contains the following three types of data:

format.obbak: Records the metadata of the archive path, including information about the tenants using the path.
check_file: Used for connectivity checks of the user log archive directory.
rounds: The summary list of Rounds in log archiving. It records the list of all Rounds.
pieces: The summary list of Pieces in log archiving. It records the list of all Pieces.
piece_d1002r1p1: The directory for a Piece in log archiving. The directory name format is piece_DESTID_ROUNDID_PIECEID. DESTID refers to the ID corresponding to log_archive_dest; ROUNDID refers to the ID of the log archive Round, which is a monotonically increasing integer; PIECEID refers to the piece_id of the log archive Piece, which is also a monotonically increasing integer.

A log archive Piece directory contains the following data:
- file_list.0.obarc: This file records the files and directories contained in the current directory.
- piece_d1002r1p1_20230111T193049_20230111T193249.obarc: This file displays the ID, start time, and end time of the current Piece and is only for information display.
- checkpoint: This directory records archive positions for active Pieces. The ObArchiveScheduler module periodically updates the position information in this directory.
- single_piece_info.obarc: This file records the metadata of the current Piece.
- tenant_archive_piece_infos.obarc: This file records the metadata of all frozen Pieces in the current tenant.
- file_info.obarc: This file records the list of log streams in the Piece.
- logstream_1: This directory records the log files of log stream 1. Log stream 1 is the system log stream of the OceanBase Database tenant.
- logstream_1001: This directory records the log files of log stream 1001. Log streams with IDs greater than 1000 are user log streams of the OceanBase Database tenant.

Differences from V3.x/V2.x features

Log archiving

Feature	V3.x/V2.2x	V4.x
Archiving level	Cluster level	Tenant level
Archiving granularity	Partition level	Log stream level
Permissions	Only the `sys` tenant can perform operations, such as setting the archiving path, enabling archiving, and viewing the archiving progress.	Both the `sys` tenant and the administrator of a user tenant can perform operations.
Usage	Use the `ALTER SYSTEM SET BACKUP_DEST` statement to set the cluster-level backup path. Use the `ALTER SYSTEM SET BACKUP_DEST_OPTION` statement to set the piece switching cycle. By default, the piece switching feature is disabled.	Use the `ALTER SYSTEM SET LOG_ARCHIVE_DEST` statement to set the tenant-level archiving path and piece switching cycle. By default, the cycle is `1d` (1 day). The archiving path and data backup path can be independently configured.
Piece switching	Piece switching can be disabled, and it is disabled by default.	Only piece switching is allowed, and the default cycle is 1 day.
Method to set the archiving delay time	Use the `ALTER SYSTEM SET LOG_ARCHIVE_CHECKPOINT_INTERVAL` statement.	Use the `ALTER SYSTEM SET ARCHIVE_LAG_TARGET` statement.
Result of executing the `ALTER SYSTEM ARCHIVELOG` statement in the `sys` tenant	Enables archiving for all tenants in the current cluster. New tenants created after archiving is enabled will also have archiving enabled.	Enables archiving for all tenants in the current cluster. New tenants created after archiving is enabled will not have archiving enabled.
Log compression	Use the `ALTER SYSTEM SET BACKUP_LOG_ARCHIVE_OPTION` statement.	Not supported.
Views	The following three views are related to archiving: CDB_OB_BACKUP_ARCHIVELOG CDB_OB_BACKUP_ARCHIVELOG_SUMMARY CDB_OB_BACKUP_PIECE_FILES	The following eight views are related to archiving: CDB_OB_ARCHIVELOG DBA_OB_ARCHIVELOG CDB_OB_ARCHIVELOG_SUMMARY DBA_OB_ARCHIVELOG_SUMMARY CDB_OB_ARCHIVE_DEST DBA_OB_ARCHIVE_DEST CDB_OB_ARCHIVELOG_PIECE_FILES DBA_OB_ARCHIVELOG_PIECE_FILES
Media requirements	SSD is required.	HDD or SSD is supported.
Number of archive files	The number of files is proportional to the number of partitions. In a scenario with millions of partitions, this will generate a large number of small files.	The number of files is small and not related to the number of partitions. This will not generate a large number of small files.
Physical standby database archiving	Not supported.	Supported.

Data backup

Feature	V3.x/V2.2x	V4.x
Backup level	Cluster level	Tenant level
Privileges	Only the `sys` tenant can perform backup operations, such as setting the backup path, initiating a backup, and checking the backup progress.	The `sys` tenant and the administrator user of a user tenant can perform backup operations.
Method for setting the backup path	You can use the `ALTER SYSTEM SET BACKUP_DEST` statement to set the backup path at the cluster level.	You can use the `ALTER SYSTEM SET DATA_BACKUP_DEST` statement to set the backup path at the tenant level. The data backup path and log archiving path can be configured independently.
Data backup to a specified path	The `sys` tenant can execute the `ALTER SYSTEM BACKUP TENANT tenant_name_list TO backup_destination;` statement to initiate a backup.	Not supported
BACKUP PLUS ARCHIVELOG feature	Not supported	Supported
Space expansion during backup	Snapshots are retained during backup, which may cause storage space expansion.	Snapshots are not retained during backup, so storage space does not expand.
Physical standby database backup	Not supported	Supported
Views	The following five views are related to backup: CDB_OB_BACKUP_JOB_DETAILS CDB_OB_BACKUP_SET_DETAILS CDB_OB_BACKUP_SET_EXPIRED CDB_OB_BACKUP_PROGRESS CDB_OB_BACKUP_SET_FILES	The following 10 views are related to backup: CDB_OB_BACKUP_JOBS DBA_OB_BACKUP_JOBS CDB_OB_BACKUP_JOB_HISTORY DBA_OB_BACKUP_JOB_HISTORY CDB_OB_BACKUP_TASKS DBA_OB_BACKUP_TASKS CDB_OB_BACKUP_TASK_HISTORY DBA_OB_BACKUP_TASK_HISTORY CDB_OB_BACKUP_SET_FILES DBA_OB_BACKUP_SET_FILES

Physical restore

Feature	V3.x/V2.2x	V4.x
Data path	You can specify the cluster-level backup path in the restore command.	You need to specify both the data backup path and the log archiving path.
Restore concurrency setting	Before initiating a restore command, you can use the `ALTER SYSTEM SET RESTORE_CONCURRENCY` statement to set the restore concurrency.	You can specify the `concurrecy` parameter in the restore command.
Key management	Internal mode does not require key information. In KMS mode, key information is not required if the original service is accessible.	The master key must be backed up separately. During restore, you must specify the master key address.
Tenant role after restore	The primary tenant, which is the primary database.	The standby tenant, which is the standby database.
Upgrade	During restore, the tenant is automatically upgraded.	After restore, you need to manually upgrade the tenant.
Table-level restore	Supported. You can restore a table only to a new tenant (a tenant created during restore). You cannot restore a table to an existing tenant.	Supported starting from V4.2.1. You can restore a table only to an existing tenant. You cannot restore a table to a new tenant (a tenant created during restore).
Quick restore	Not supported	Supported starting from V4.3.3
Restore by using the `ADD RESTORE SOURCE` statement	Supported	Not supported

References

For more information about physical backup and restore, see Backup and restore.

OceanBase

Customer Stories

Documentation

Physical backup and restore

Overview

Backup

Notice

Note

Backup validation

Restore

Notice

Cleanup

Backup media requirements

SDK version requirements

API requirements

Directory structure

Data backup directory

Cluster-level parameter backup directory

Log archive directory

Differences from V3.x/V2.x features

Log archiving

Data backup

Physical restore

References

Physical backup and restore

Overview

Backup

Notice

Note

Backup validation

Restore

Notice

Cleanup

Backup media requirements

SDK version requirements

API requirements

Directory structure

Data backup directory

Cluster-level parameter backup directory

Log archive directory

Differences from V3.x/V2.x features

Log archiving

Data backup

Physical restore

References