Delete expired data

2024-08-13 01:52:31  Updated

OBKV-Table supports the deletion of expired data based on the time to live (TTL). This topic describes how to delete expired data by using commands or periodic tasks.

Background information

OBKV-Table allows you to set a expiration policy for relational tables, and then delete data that has expired using commands or periodic tasks. This helps free up storage space that would otherwise be taken up by old, irrelevant information. In addition, OBKV-Table automatically filters out this expired data to prevent them from being visible.

Use the expired data deletion feature

OceanBase Database uses an LSM-Tree architecture (where deletion is also an append-write operation), so even if expired data is deleted using a TTL task, the actual storage space may not decrease and could even increase. The final release of storage space still relies on major compaction. Therefore, we recommend that you use it in conjunction with major freeze. After the TTL task completes the deletion, perform a major freeze. For example, this can be achieved by setting major_freeze_duty_time after kv_ttl_duty_duration.

The expiration deletion TTL task is a low-priority background task and currently cannot guarantee the immediate deletion of expired data. To avoid significantly impacting system performance, if the MemStore memory MEMSTORE_USED exceeds the threshold FREEZE_TRIGGER (which can be queried through the gv$ob_memstore view) during the execution of the task, the current task will be paused until the memory drops below the threshold. Additionally, to avoid affecting the system's recovery process, for tenants undergoing normal physical recovery, the TTL task will also pause the execution of the TTL task for that tenant until the tenant completes the recovery.

Definition of TTL tables

  • Create a TTL table t1, where each row in the table expires after b + INTERVAL 12 HOUR.

    CREATE TABLE t1 (
      a INT PRIMARY KEY, 
      b TIMESTAMP DEFAULT current_timestamp ON UPDATE current_timestamp)
    TTL (b + INTERVAL 12 HOUR);
    
  • Modify the TTL of table t1 to b + INTERVAL 1 DAY.

    ALTER TABLE t1 TTL (b + INTERVAL 1 DAY);
    
  • Remove the TTL attribute from table t1.

    ALTER TABLE t1 REMOVE TTL;
    

The TTL attribute of a table is specified in the format COLUMN_NAME + INTERVAL NUM TTL_UNIT, where:

  • COLUMN_NAME is the name of a column defined in the table. The column must be of the TIMESTAMP or DATETIME data type.
  • + INTERVAL is a fixed part.
  • NUM is an integer that indicates the number of TTL units.
  • TTL_UNIT can be SECOND, MINUTE, HOUR, DAY, MONTH, or YEAR.

Parallelism of TTL tasks

You can configure the number of TTL worker threads using the tenant-level parameter ttl_thread_score. The default value is 2.

The example shows how to modify the parallelism of TTL tasks to 10:

ALTER SYSTEM SET ttl_thread_score = 10;

Trigger a TTL task

You can trigger a TTL task in two ways: by using a control command or by setting up a periodic task.

Before triggering a TTL task, make sure that the tenant's TTL task switch is enabled.

ALTER SYSTEM SET enable_kv_ttl= true; -- The default value is false. 
  • Trigger a TTL task by using a control command:

    Log on to the system tenant and run the following command to trigger a TTL task across all tenants:

    ALTER SYSTEM trigger TTL;
    
  • Trigger a TTL task by setting a periodic task:

    Log on to the desired tenant and execute the following command to set the time range for TTL task triggers:

    ALTER SYSTEM SET kv_ttl_duty_duration = '[22:00:00, 24:00:00]'; // The default value is "", indicating that periodic TTL tasks are not triggered.
    

You can log on to the system tenant and run the following system commands to suspend, resume, or abort an ongoing TTL task.

ALTER SYSTEM SUSPEND TTL; -- Suspends the current running TTL task.
ALTER SYSTEM RESUME TTL; -- Resumes the suspended TTL task.
ALTER SYSTEM CANCEL TTL; -- Cancels the running TTL task.

This TTL command can only be executed in the context of either the user tenant or the system tenant; executing it in the user tenant only allows managing that specific tenant's TTL tasks, whereas executing it in the system tenant permits managing all users' TTL tasks.

Below is an example using the TRIGGER command. The behavior for other commands such as SUSPEND, RESUME, and CANCEL is consistent with the tenant behavior:

Tenant Command Semantics
System tenant ALTER SYSTEM SUSPEND TTL All user tenants
System tenant ALTER SYSTEM TRIGGER TTL TENANT = all All user tenants
System tenant ALTER SYSTEM TRIGGER TTL TENANT = all_user All user tenants
System tenant ALTER SYSTEM TRIGGER TTL TENANT = obkv OBKV tenant
System tenant ALTER SYSTEM TRIGGER TTL TENANT = obkv1, obkv2 User tenants obkv1 and obkv2
User tenant ALTER SYSTEM TRIGGER TTL The current tenant

View task status

You can use views to view the status and related information of current and historical TTL tasks. The following views are related to TTL tasks:

View Description
DBA_OB_KV_TTL_TASKS View the TTL tasks that are currently being executed in the current tenant.
DBA_OB_KV_TTL_TASK_HISTORY View the historical TTL tasks in the current tenant.
CDB_OB_KV_TTL_TASKS View the TTL tasks that are currently being executed in all tenants in the system tenant.
CDB_OB_KV_TTL_TASK_HISTORY View the historical TTL tasks in all tenants in the system tenant.

The following table describes the fields in the DBA_OB_KV_TTL_TASKS and DBA_OB_KV_TTL_TASK_HISTORY views:

Field Description
TABLE_NAME The table name.
TABLE_ID The table ID.
TABLET_ID Tablet ID
TASK_ID The task ID, which starts from 1.
START_TIME The start time of the task.
END_TIME The end time of the task.
TRIGGER_TYPE The trigger type of the task, which includes PERIODIC and USER.
STATUS The status of the task. For more information, see the task status definitions.
TTL_DEL_CNT The number of records deleted from related tables.
MAX_VERSION_DEL_CNT The number of records deleted from HBase tables based on the MaxVersions feature, which is 0 in the relational table model.
SCAN_CNT The number of records scanned by the task.
RET_CODE The return code of the task.

The following table describes the fields of the CDB_OB_KV_TTL_TASKS and CDB_OB_KV_TTL_TASK_HISTORY views:

Note

Compared with DBA views, the CDB views contain an additional tenant ID field.

Field Description
TENANT_ID The tenant ID.
TABLE_NAME The table name.
TABLE_ID The table ID.
TABLET_ID The tablet ID.
TASK_ID The task ID, which starts from 1.
START_TIME The start time of the task.
END_TIME The end time of the task.
TRIGGER_TYPE The task trigger type, which can be either PERIODIC or USER.
STATUS The status of the task. For more information, see the task status definitions.
TTL_DEL_CNT The number of records deleted from related tables.
MAX_VERSION_DEL_CNT The number of records deleted from HBase tables based on the MaxVersions feature, which is 0 in the relational table model.
SCAN_CNT The number of records scanned by the task.
RET_CODE The return code of the task.

Visibility of expired data

If you use the OBKV-Table interface to perform an operation on a record that has expired, the operation is performed as if the record does not exist. For example, if a record with the primary key k1 has expired in a table, you can insert a new record with the primary key k1 without encountering a primary key conflict.

Operation Expired Return code Affected rows Other
insert N -5024
OB_ERR_PRIMARY_KEY_DUPLICATE
0
insert Y 0
OB_SUCCESS
1
delete N 0
OB_SUCCESS
1
delete Y 0
OB_SUCCESS
0
update N 0
OB_SUCCESS
1
update Y 0
OB_SUCCESS
0
replace N 0
OB_SUCCESS
2
replace Y 0
OB_SUCCESS
1
insert_or_update N 0
OB_SUCCESS
1
insert_or_update Y 0
OB_SUCCESS
1
increment N 0
OB_SUCCESS
1
increment Y 0
OB_SUCCESS
1 Except for the increment column, other columns are written with default values.
append N 0
OB_SUCCESS
1
append Y 0
OB_SUCCESS
1 Except for the append column, other columns are written with default values.
query/get N 0
OB_SUCCESS
/ Data is successfully read.
query/get Y 0
OB_SUCCESS
/ No data is read.

Note

SQL interfaces do not currently filter out expired data. Expired data is visible until it is deleted by the expired data cleanup task.

Task status

For each tenant's TTL task, it is split into a tablet-level task for each TTL table during actual execution. The tenant tasks and their tablet tasks are recorded in the task table. You can view the tenant tasks and their tablet tasks in the task view. The historical partition tasks are moved to the history table. You can view the historical tasks in the history view.

Notice

For tenant-level tasks, TABLE_NAME is NULL, and TABLET_ID and TABLE_ID are -1.

  • In the task views DBA_OB_KV_TTL_TASKS and CDB_OB_KV_TTL_TASKS, there are two types of tasks: tenant-level tasks and tablet-level tasks. They correspond to two different statuses.
  • In the history task views DBA_OB_KV_TTL_TASK_HISTORY and CDB_OB_KV_TTL_TASK_HISTORY, historical tenant-level tasks and tablet-level tasks are saved.

Tablet task status:

Task status Description
PREPARED The task is in the prepared state and has not started to delete expired data.
RUNNING The task is in the running state and is deleting expired data in the partition.
PENDING The task is in the pending state and can be resumed by a command.
CANCELED The task is in the canceled state and cannot be resumed.
FINISHED The task is in the finished state and is waiting to be moved to the history table.
INVALID The task is in an invalid state.

Tenant task status:

Task status Description
RS_TRIGGERING The tenant's TTL task is being executed.
RS_SUSPENDING The tenant's TTL task has been suspended.
RS_CANCELING The tenant's TTL task has been canceled.
RS_MOVING All TTL tasks are being moved to the history table.
INVALID The task is in an invalid state.

Set the retention period for historical TTL tasks

To avoid unnecessary storage space waste caused by the accumulation of historical TTL task records, you can set the retention period for historical TTL task records of a tenant by using the kv_ttl_history_recycle_interval parameter. The default retention period is 7 days.

ALTER SYSTEM SET kv_ttl_history_recycle_interval= '30d'; -- Set the retention period for historical TTL task records to 30 days.

Example: Scheduled deletion of expired data

  1. Create a TTL table and specify that each record expires 10 seconds after the c timestamp.

    CREATE TABLE ttl_table(a VARCHAR(1024) PRIMARY KEY,
      b VARCHAR(1024),
      c TIMESTAMP)
      TTL(c + INTERVAL 10 SECOND) PARTITION BY KEY(a) PARTITIONS 3;
    
  2. Insert data.

    you can use OBKV-Table/SQL, or use the OBKV-Table API to insert data. For more information, see Client Introduction and Usage Instructions.

    INSERT INTO ttl_table VALUES("k1", "hello obkv", now());
    INSERT INTO ttl_table VALUES("k2", "hello obkv", now());
    INSERT INTO ttl_table VALUES("k3", "hello obkv", now());
    
  3. Wait for 10 seconds, and then execute the following commands to trigger the TTL task:

    ALTER SYSTEM SET enable_kv_ttl= true; -- Enable the TTL task.
    ALTER SYSTEM TRIGGER TTL; -- Trigger a TTL task starting from midnight (00:00).
    
  4. Check the execution status of the task.

    obclient> SELECT * FROM OCEANBASE.DBA_OB_KV_TTL_TASKS;
    

    The following result is returned:

    +------------+----------+-----------+---------+----------------------------+----------------------------+--------------+---------------+-------------+---------------------+----------+------------+
    | TABLE_NAME | TABLE_ID | TABLET_ID | TASK_ID | START_TIME                 | END_TIME                   | TRIGGER_TYPE | STATUS        | TTL_DEL_CNT | MAX_VERSION_DEL_CNT | SCAN_CNT | RET_CODE   |
    +------------+----------+-----------+---------+----------------------------+----------------------------+--------------+---------------+-------------+---------------------+----------+------------+
    | ttl_table  |   500002 |    200003 |       1 | 2023-09-27 23:31:30.300276 | 2023-09-27 23:31:35.315848 | USER         | FINISHED      |           2 |                   0 |        2 | OB_SUCCESS |
    | ttl_table  |   500002 |    200002 |       1 | 2023-09-27 23:31:30.300276 | 2023-09-27 23:31:35.320258 | USER         | FINISHED      |           1 |                   0 |        1 | OB_SUCCESS |
    | ttl_table  |   500002 |    200001 |       1 | 2023-09-27 23:31:30.300271 | 2023-09-27 23:31:35.321879 | USER         | FINISHED      |           0 |                   0 |        0 | OB_SUCCESS |
    | NULL       |       -1 |        -1 |       1 | 2023-09-27 23:31:28.675583 | 2023-09-27 23:31:28.675583 | USER         | RS_TRIGGERING |           0 |                   0 |        0 | OB_SUCCESS |
    +------------+----------+-----------+---------+----------------------------+----------------------------+--------------+---------------+-------------+---------------------+----------+------------+
    4 rows in set (0.043 sec)
    
  5. After the task is executed, the task is moved to the history table. You can query the OCEANBASE.DBA_OB_KV_TTL_TASK_HISTORY table to view the task execution history.

    obclient> SELECT * FROM OCEANBASE.DBA_OB_KV_TTL_TASK_HISTORY;
    

    The following result is returned:

    +------------+----------+-----------+---------+----------------------------+----------------------------+--------------+----------+-------------+---------------------+----------+------------+
    | TABLE_NAME | TABLE_ID | TABLET_ID | TASK_ID | START_TIME                 | END_TIME                   | TRIGGER_TYPE | STATUS   | TTL_DEL_CNT | MAX_VERSION_DEL_CNT | SCAN_CNT | RET_CODE   |
    +------------+----------+-----------+---------+----------------------------+----------------------------+--------------+----------+-------------+---------------------+----------+------------+
    | ttl_table  |   500002 |    200003 |       1 | 2023-09-27 23:31:30.300276 | 2023-09-27 23:31:35.315848 | USER         | FINISHED |           2 |                   0 |        2 | OB_SUCCESS |
    | ttl_table  |   500002 |    200002 |       1 | 2023-09-27 23:31:30.300276 | 2023-09-27 23:31:35.320258 | USER         | FINISHED |           1 |                   0 |        1 | OB_SUCCESS |
    | ttl_table  |   500002 |    200001 |       1 | 2023-09-27 23:31:30.300271 | 2023-09-27 23:31:35.321879 | USER         | FINISHED |           0 |                   0 |        0 | OB_SUCCESS |
    | NULL       |       -1 |        -1 |       1 | 2023-09-27 23:31:28.675583 | 2023-09-27 23:31:42.526853 | USER         | FINISHED |           0 |                   0 |        0 | OB_SUCCESS |
    +------------+----------+-----------+---------+----------------------------+----------------------------+--------------+----------+-------------+---------------------+----------+------------+
    4 rows in set (0.058 sec)
    
  6. After the task is executed, all expired data is deleted.

    obclient> SELECT * FROM ttl_table;
    Empty set (0.060 sec)
    

Contact Us