Introduction to physical backup and restore|V3.2.4|OceanBase Database| docs|Distributed Database

Introduction to physical backup and restore

Last Updated：2023-10-27 09:57:43 Updated

As the core component to ensure the high availability of OceanBase Database, the backup and restore module protects data security by preventing misoperations and damages to the storage media. If data is lost due to misoperations or damage to the storage media, you can restore the data.

Overview

OceanBase Database supports the backup, restore, and management of data in two types of storage media: Object Storage Service (OSS) and Network File System (NFS).

OceanBase Database supports physical backup of clusters, including the physical backup of all tenants except the sys tenant within the cluster. Physical backup data includes baseline data and log archive data. Therefore, physical backup consists of log archiving and data backup.

Log archiving refers to the automatic backup of log data. OBServer nodes regularly archive log data to the specified backup path without manual triggering.

The interval for log archiving is calculated as follows:

Log archiving interval = checkpoint_interval/2

You can set the value of checkpoint_interval as needed. For more information, see Configure backup parameters.
Data backup refers to the backup of data, and includes full backup and incremental backup:
- Full backup refers to the backup of all macroblocks.
- Incremental backup refers to the backup of macroblocks that are added or modified since the last backup.

OceanBase Database supports tenant-level restore, which allows you to create a tenant based on existing data backups. You can execute the ALTER SYSTEM RESTORE TENANT statement to restore a tenant. The tenant restore process consists of the restore and recovery of system tables and user tables in the tenant. Restoring the tables means to restore the required baseline data to the OBServer node of the target tenant, whereas recovering the tables means to restore the logs corresponding to the baseline data to the OBServer node.

OceanBase Database allows you to back up clusters and tenants, and manually clear specified backup data or automatically clear expired backup data.

Physical backup architecture

The following figure shows the physical backup architecture of OceanBase Database.

Backup architecture

After you log on to the backup cluster by using the sys tenant account, you must first execute an SQL statement to initiate log archiving. You can perform a baseline backup only after log archiving is completed.

Log archiving regularly backs up logs to the backup destination. You need to execute the alter system archivelog statement only once, and the log backup will continue in the background. During log archiving, the leader of each partition group (PG) regularly archives logs of the PG to the specified path of the backup medium, and RootService regularly checks the log archiving progress and updates it to the internal table.

Data backup is user-triggered. Generally, full backup is triggered on each Saturday, and incremental backup is triggered on Tuesday and Thursday each week. When you initiate a data backup request, the request is first forwarded to the node running RootService. RootService generates a data backup task based on the current tenant and the PGs of the tenant. The backup task is then distributed to OBServer nodes for parallel execution. The OBServer nodes back up the metadata and macroblocks of the PGs to the specified backup directories, and the macroblocks are managed by PG.

OceanBase Database allows you to use OSS and NFS as the backup destination. Directory structure at the backup destination and the file types under each directory:

data
  tenant_data_backup_info // The information about the baseline backup data of a tenant.
  tenant_backup_set_file_info // The information that is more comprehensive than tenant_data_backup_info.
  backup_set_1_full_date // A full backup set, ended with a date such as backup_set_1_full_20211014.
    backup_set_info // The information about the current backup.
    single_backup_set_info // The information about the current backup, which is more comprehensive than backup_set_info.
    backup_1 // The backup_set_id is 1.
      sys_pg_list
      normal_pg_list
      sys_meta_index_file_<task_id>// The index on the system table. It points to the corresponding PG meta files based on the PG key.
      normal_meta_index_file_<task_id> // The index on a regular table.
      meta_file_<task_id> // A file that records information such as the meta data and the list of macroblocks.

    data // Versions are not differentiated.
      pgkey
        major_data // The baseline data.
          macro_block_1.<sub_task_id>   
          macro_block_index_1
          macro_block_2.<sub_task_id>
          macro_block_index_2
        minor_data // The minor compaction data.
          task_id_1
            macro_block_1.<sub_task_id>
            macro_block_index_1
          task_id_2
            macro_block_2.<sub_task_id>
            macro_block_index_2

  backup_set_2_inc_date // An incremental backup set, ended with a date such as backup_set_2_inc_20211014.
    backup_set_info // The information about the current backup.
    single_backup_set_info
    backup_2
      sys_pg_list
      normal_pg_list
      sys_meta_index_file_<task_id>
      normal_meta_index_file_<task_id>
      meta_file_<task_id>
    data
      ...

clog
  backup_piece_info        // The backup piece-related information.
  tenant_clog_backup_info
  roundid_pieceid_date   // Example: 1_1_20211014
    single_piece_info    
    archive_key
      tableid_partition_id  // Example: 1100611139403779_0
      ...
    data
      tableid               // Example: 1100611139403779
        partition_id        // Example: 0
          1                 // The data file.
          2             
          ...
    index
      tableid               // Example: 1100611139403779
        partition_id        // Example: 0
          1                 // The index file.
          2          
          ..

Physical restore architecture

The following figure shows the physical restore architecture of OceanBase Database.

Restore architecture

To start physical restore, perform the following two steps:

In the target cluster, execute the CREATE RESOURCE POOL statement to create a resource pool for tenant restore.
Execute the ALTER SYSTEM RESTORE TENANT statement to schedule a tenant restore task.

The system executes the RESTORE TENANT statement in the following process:
1. Create a tenant for restore.
2. Restore the system table data of the tenant.
3. Restore the system table logs of the tenant.
4. Modify and restore the metadata of the tenant.
5. Restore the user table data of the tenant.
6. Restore the user table logs of the tenant.
7. Complete the restore operation.
  
  The restore of a single PG is to copy the metadata and macroblock data of the PG to the specified OBServer node to create a PG with only baseline data, and then copy the logs of the PG to the MemTable of the PG on the specified OBServer node. In this process, minor compactions may be triggered if the number of logs is large.