OceanBase Database deletes expired data based on the time to live (TTL). This topic describes how to delete expired data by running a command or scheduled task.
Background information
OceanBase Database allows you to delete expired data. You can define an expiration strategy for relational tables. You can also specify the TTL of data or the maximum number of versions for each column family in an HBase table. After you define an expiration strategy, you can run commands or configure scheduled tasks to trigger deletion tasks, thereby reducing the storage usage. In addition, OceanBase Database filters out expired data in TableAPI and HBaseAPI, to mask the visibility of expired data.
Considerations
OceanBase Database uses the log-structured merge-tree (LSM-tree) architecture, where deletion is also a type of appending operation. Even if expired data is deleted based on the TTL, the actual used storage space may increase instead of decreasing. The release of storage space depends on major compactions. Therefore, we recommend that you perform a major freeze after expired data is deleted based on the TTL. For example, you can set major_freeze_duty_time to be later than kv_ttl_duty_duration.
The TTL-based expired data deletion task runs in the background with a low priority. Therefore, it cannot ensure that expired data is immediately deleted. During task execution, if the memory usage, indicated by the MEMSTORE_USED parameter, of the MemStore exceeds the threshold specified for the FREEZE_TRIGGER parameter, the current task will be suspended until the memory usage of the MemStore decreases to a value lower than the threshold, to avoid affecting the system performance. You can query the value of the FREEZE_TRIGGER parameter from the gv$ob_memstore view. In addition, to avoid affecting the system recovery process, the task will skip a tenant that is undergoing normal physical restore until the tenant is restored.
Definitions of TTL tables
TTL supports two data models: relational table model and HBase table model. Definitions of TTL in the two models are different.
Relational table
Create a TTL table named
t1and set the validity period for each row of data tob + INTERVAL 12 HOUR.create table t1 ( a int primary key, b timestamp default current_timestamp on update current_timestamp) TTL (b + INTERVAL 12 HOUR);Change the validity period for table
t1tob + INTERVAL 1 DAY.alter table t1 TTL (b + INTERVAL 1 DAY);Remove the TTL attribute from table
t1.alter table t1 remove TTL;
The TTL attribute of a table is defined by COLUMN_NAME + INTERVAL NUM TTL_UNIT.
COLUMN_NAMEis a defined attribute of the table, which is of the TIMESTAMP or DATETIME type.+ INTERVALis a fixed part.NUMspecifies the validity period, which is an integer.TTL_UNITspecifies the unit of the validity period. Valid values includeSECOND,MINUTE,HOUR,DAY,MONTH, andYEAR.
HBase table
Create an HBase table named
t1$cf1and definekv_attributesas{"Hbase": {"TimeToLive": 3600, "MaxVersions": 3}}, which indicates that the validity period for each row of data is 3600s and the maximum number of versions for each row of data is 3.create table t1$cf1 ( K varbinary(1024), Q varbinary(256), T bigint, V varbinary(1048576) NOT NULL, primary key(K, Q, T)) kv_attributes ='{"Hbase": {"TimeToLive": 3600, "MaxVersions": 3}}' partition by key(K) partitions 97Remove the TTL attribute from table
t1$cf1and change the maximum number of versions to 4.alter table t1$cf1 kv_attributes ='{"Hbase": {"MaxVersions": 4}}';Remove the expiration attribute from table
t1$cf1.alter table t1$cf1 kv_attributes ='{"Hbase": {}}';
DOP of TTL-based deletion tasks
You can set the number of TTL worker threads for a tenant by using the ttl_thread_score parameter. The default value is 2.
For example, you can change the degree of parallelism (DOP) of TTL-based deletion tasks to 10:
alter system set ttl_thread_score = 10;
Trigger tasks
You can run commands or configure scheduled tasks to trigger TTL-based deletion tasks.
You must enable TTL-based deletion of a tenant before you trigger the task.
ALTER SYSTEM SET enable_kv_ttl= true; // The default value is `false`.
Run commands
Log on to the sys tenant and run the following command to trigger the TTL-based deletion tasks of all tenants:
ALTER SYSTEM trigger ttl;Configure scheduled tasks
Log on to the tenant for which a TTL-based deletion task is to be executed and run the following command to set the time for triggering the task every day:
alter system set kv_ttl_duty_duration = '[22:00:00, 24:00:00]'; // The default value is [00:00:00, 24:00:00].
Log on to the sys tenant and run the following commands to suspend, resume, or cancel a TTL-based deletion task:
ALTER SYSTEM suspend ttl; // Suspend the ongoing TTL-based deletion task.
ALTER SYSTEM resume ttl; // Resume the suspended TTL-based deletion task.
ALTER SYSTEM cancel ttl; // Cancel the ongoing TTL-based deletion task.
The TTL commands are valid only in user tenants or the sys tenant. If you run a TTL command in a user tenant, it applies only to TTL-based deletion tasks of the current user tenant. If you run a TTL command in the sys tenant, it applies to TTL-based deletion tasks of all user tenants.
The following table describes the affected scopes of the trigger commands. The affected scopes of the suspend, resume, and cancel commands are the same.
| Tenant | Command | Affected scope |
|---|---|---|
| sys tenant | alter system trigger ttl |
All user tenants |
| sys tenant | alter system trigger ttl tenant = all |
All user tenants |
| sys tenant | alter system trigger ttl tenant = all_user |
All user tenants |
| sys tenant | alter system trigger ttl tenant = obkv |
obkv tenants |
| sys tenant | alter system trigger ttl tenant = obkv1, obkv2 |
User tenants obkv1 and obkv2 |
| User tenant | alter system trigger ttl |
Current tenant |
View the task status
You can view the status and related information of the current and historical TTL-based deletion tasks from related views.
| View | Description |
|---|---|
| DBA_OB_KV_TTL_TASKS | Allows you to view the ongoing TTL-based deletion task of the current tenant. |
| DBA_OB_KV_TTL_TASK_HISTORY | Allows you to view the historical TTL-based deletion tasks of the current tenant. |
| CDB_OB_KV_TTL_TASKS | Allows you to view the ongoing TTL-based deletion tasks of all tenants in the sys tenant. |
| CDB_OB_KV_TTL_TASK_HISTORY | Allows you to view the historical TTL-based deletion tasks of all tenants in the sys tenant. |
The following table describes the fields in the DBA_OB_KV_TTL_TASKS and DBA_OB_KV_TTL_TASK_HISTORY views:
| Field | Description |
|---|---|
| TABLE_NAME | The name of the table. |
| TABLE_ID | The ID of the table. |
| TABLET_ID | The ID of the tablet. |
| TASK_ID | The task ID, which starts from 1. |
| START_TIME | The start time of the task. |
| END_TIME | The end time of the task. |
| TRIGGER_TYPE | The triggering type of the task. Valid values include PERIODIC and USER. |
| STATUS | The current status of the task. For more information, see the following section. |
| TTL_DEL_CNT | The number of records deleted from a relational table, or the number of records deleted from an HBase table based on TTL. |
| MAX_VERSION_DEL_CNT | The number of records deleted from an HBase table based on the maximum number of versions. |
| SCAN_CNT | The number of scanned records for the task. |
| RET_CODE | The return code of the task. |
The following table describes the fields in the CDB_OB_KV_TTL_TASKS and CDB_OB_KV_TTL_TASK_HISTORY views.
| Field | Description |
|---|---|
| TENANT_ID | The ID of the tenant. |
| TABLE_NAME | The name of the table. |
| TABLE_ID | The ID of the table. |
| TABLET_ID | The ID of the tablet. |
| TASK_ID | The task ID, which starts from 1. |
| START_TIME | The start time of the task. |
| END_TIME | The end time of the task. |
| TRIGGER_TYPE | The triggering type of the task. Valid values include PERIODIC and USER. |
| STATUS | The current status of the task. For more information, see the following section. |
| TTL_DEL_CNT | The number of records deleted from a relational table, or the number of records deleted from an HBase table based on TTL. |
| MAX_VERSION_DEL_CNT | The number of records deleted from an HBase table based on the maximum number of versions. |
| SCAN_CNT | The number of scanned records for the task. |
| RET_CODE | The return code of the task. |
Note
The CDB views contain the tenant ID, which is not contained in the DBA views.
Visibility of expired data
If you perform a TableAPI or HBaseAPI operation on a record that has expired, the result is the same as that in the case where the record does not exist. For example, if a record with the primary key k1 in a relational table has expired, you can insert another record with the primary key k1 without a primary key conflict.
| Operation | Expired | Return code | Affected rows | Remarks |
|---|---|---|---|---|
| insert | N | -5024 OB_ERR_PRIMARY_KEY_DUPLICATE |
0 | |
| insert | Y | 0 OB_SUCCESS |
1 | |
| delete | N | 0 OB_SUCCESS |
1 | |
| delete | Y | 0 OB_SUCCESS |
0 | |
| update | N | 0 OB_SUCCESS |
1 | |
| update | Y | 0 OB_SUCCESS |
0 | |
| replace | N | 0 OB_SUCCESS |
2 | |
| replace | Y | 0 OB_SUCCESS |
1 | |
| insert_or_update | N | 0 OB_SUCCESS |
1 | |
| insert_or_update | Y | 0 OB_SUCCESS |
1 | |
| increment | N | 0 OB_SUCCESS |
1 | |
| increment | Y | 0 OB_SUCCESS |
1 | Columns other than the incremented column are set to default values. |
| append | N | 0 OB_SUCCESS |
1 | |
| append | Y | 0 OB_SUCCESS |
1 | Columns other than the appended column are set to default values. |
| query/get | N | 0 OB_SUCCESS |
None | Data is read. |
| query/get | Y | 0 OB_SUCCESS |
None | No data is read. |
Note
The SQL API does not mask expired data. The expired data is visible until it is cleared by deletion tasks.
Task status
The TTL-based deletion task of a tenant is split into tablet-level tasks for a TTL table during execution. Ongoing tenant tasks and their tablet tasks are recorded in the task table. You can query the tasks from the task view. Historical partition-level tasks are moved to the history table. You can query the tasks from the history view.
Notice
TABLE_NAME of a tenant-level task is NULL, and TABLET_ID and TABLE_ID of a tenant-level task are both -1.
- The
DBA_OB_KV_TTL_TASKSandCDB_OB_KV_TTL_TASKSviews both record tenant-level tasks and tablet-level tasks, which respectively correspond to two different types of status definitions. - The
DBA_OB_KV_TTL_TASK_HISTORYandCDB_OB_KV_TTL_TASK_HISTORYviews both record historical tenant-level tasks and tablet-level tasks.
The following table lists the status of tablet-level tasks:
| Task status | Description |
|---|---|
| PREPARED | The prepared state, in which the task has not started to delete expired data. |
| RUNNING | The running state, in which the task is deleting expired data of the corresponding partition. |
| PENDING | The pending state. You can resume a pending task by using commands. |
| CANCELED | The canceled state. A task in the canceled state cannot be executed. |
| FINISHED | The finished state. A task in the finished state will be moved to the historical task table. |
| INVALID | The invalid state. |
The following table lists the status of tenant-level tasks:
| Task status | Description |
|---|---|
| RS_TRIGGERING | The TTL-based deletion task of the tenant is being executed. |
| RS_SUSPENDING | The TTL-based deletion task of the tenant is suspended. |
| RS_CANCELING | The TTL-based deletion task of the tenant is canceled. |
| RS_MOVING | All TTL-based deletion tasks are being moved to the history table. |
| INVALID | The invalid state. |
Set the period for clearing historical tasks
To prevent too many historical TTL-based deletion tasks from occupying storage space, you can specify the kv_ttl_history_recycle_interval parameter to define the retention period for historical TTL-based deletion tasks of a tenant. The default retention period is 7 days.
ALTER SYSTEM SET kv_ttl_history_recycle_interval= '30d'; // Specify to automatically clear historical TTL-based deletion tasks that have been retained for more than 30 days.
Examples: Delete expired data periodically
Example 1: Delete expired data from a relational table
Create a TTL-based relational table and set the validity period for each record to c + 10s.
create table ttl_table(a varchar(1024) primary key, b varchar(1024), c timestamp) TTL(c + INTERVAL 10 SECOND) partition by key(a) partitions 3;Insert data through an TableAPI or SQL operation. For more information about how to insert data through an TableAPI operation, see Introduction to the client and instructions for use.
insert into ttl_table values("k1", "hello obkv", now()); insert into ttl_table values("k2", "hello obkv", now()); insert into ttl_table values("k3", "hello obkv", now());Wait for 10s and run the following commands to trigger the TTL-based deletion task:
ALTER SYSTEM SET enable_kv_ttl= true; // Enable the TTL-based deletion task. ALTER SYSTEM TRIGGER TTL; // Trigger the TTL-based deletion task at 00:00.View the task execution status.
obclient> SELECT * FROM OCEANBASE.DBA_OB_KV_TTL_TASKS;The return result is as follows:
+------------+----------+-----------+---------+----------------------------+----------------------------+--------------+---------------+-------------+---------------------+----------+------------+ | TABLE_NAME | TABLE_ID | TABLET_ID | TASK_ID | START_TIME | END_TIME | TRIGGER_TYPE | STATUS | TTL_DEL_CNT | MAX_VERSION_DEL_CNT | SCAN_CNT | RET_CODE | +------------+----------+-----------+---------+----------------------------+----------------------------+--------------+---------------+-------------+---------------------+----------+------------+ | ttl_table | 500002 | 200003 | 1 | 2023-09-27 23:31:30.300276 | 2023-09-27 23:31:35.315848 | USER | FINISHED | 2 | 0 | 2 | OB_SUCCESS | | ttl_table | 500002 | 200002 | 1 | 2023-09-27 23:31:30.300276 | 2023-09-27 23:31:35.320258 | USER | FINISHED | 1 | 0 | 1 | OB_SUCCESS | | ttl_table | 500002 | 200001 | 1 | 2023-09-27 23:31:30.300271 | 2023-09-27 23:31:35.321879 | USER | FINISHED | 0 | 0 | 0 | OB_SUCCESS | | NULL | -1 | -1 | 1 | 2023-09-27 23:31:28.675583 | 2023-09-27 23:31:28.675583 | USER | RS_TRIGGERING | 0 | 0 | 0 | OB_SUCCESS | +------------+----------+-----------+---------+----------------------------+----------------------------+--------------+---------------+-------------+---------------------+----------+------------+ 4 rows in set (0.043 sec)After the task is completed, it is moved to the history table. You can query the task from the
OCEANBASE.DBA_OB_KV_TTL_TASK_HISTORYview.obclient> SELECT * FROM OCEANBASE.DBA_OB_KV_TTL_TASK_HISTORY;The return result is as follows:
+------------+----------+-----------+---------+----------------------------+----------------------------+--------------+----------+-------------+---------------------+----------+------------+ | TABLE_NAME | TABLE_ID | TABLET_ID | TASK_ID | START_TIME | END_TIME | TRIGGER_TYPE | STATUS | TTL_DEL_CNT | MAX_VERSION_DEL_CNT | SCAN_CNT | RET_CODE | +------------+----------+-----------+---------+----------------------------+----------------------------+--------------+----------+-------------+---------------------+----------+------------+ | ttl_table | 500002 | 200003 | 1 | 2023-09-27 23:31:30.300276 | 2023-09-27 23:31:35.315848 | USER | FINISHED | 2 | 0 | 2 | OB_SUCCESS | | ttl_table | 500002 | 200002 | 1 | 2023-09-27 23:31:30.300276 | 2023-09-27 23:31:35.320258 | USER | FINISHED | 1 | 0 | 1 | OB_SUCCESS | | ttl_table | 500002 | 200001 | 1 | 2023-09-27 23:31:30.300271 | 2023-09-27 23:31:35.321879 | USER | FINISHED | 0 | 0 | 0 | OB_SUCCESS | | NULL | -1 | -1 | 1 | 2023-09-27 23:31:28.675583 | 2023-09-27 23:31:42.526853 | USER | FINISHED | 0 | 0 | 0 | OB_SUCCESS | +------------+----------+-----------+---------+----------------------------+----------------------------+--------------+----------+-------------+---------------------+----------+------------+ 4 rows in set (0.058 sec)After the task is completed, all expired data is deleted.
obclient> SELECT * FROM ttl_table; Empty set (0.060 sec)
Example 2: Delete expired data from an HBase table
Create an HBase table.
create table t1$cf1 ( K varbinary(1024), Q varbinary(256), T bigint, V varbinary(1048576) NOT NULL, primary key(K, Q, T)) kv_attributes ='{"Hbase": {"TimeToLive": 3600, "MaxVersions": 3}}' partition by key(K) partitions 3;Insert data into the table by simulating HBaseAPI. For more information about the data insertion process, see Usage instructions of the HBaseAPI client.
INSERT INTO t1$cf1 VALUES ("row1", "cq1", -truncate(time_to_usec(date_sub(now(), interval +1 hour))/1000, 0), "del"), ("row1", "cq1", -truncate(time_to_usec(now())/1000, 0), "no del"), ("row2", "cq1", -truncate(time_to_usec(date_sub(now(), interval +1 hour))/1000, 0), "del"), ("row2", "cq1", -truncate(time_to_usec(now())/1000, 0), "no del"), ("row3", "cq1", -truncate(time_to_usec(date_sub(now(), interval +1 hour))/1000, 0), "del"), ("row3", "cq1", -truncate(time_to_usec(now())/1000, 0), "no del"), ("row4", "cq1", -truncate(time_to_usec(date_sub(now(), interval +1 hour))/1000, 0), "del"), ("row4", "cq1", -truncate(time_to_usec(now())/1000, 0), "no del");Set the daily task execution time and enable scheduled execution for the tenant.
ALTER SYSTEM SET enable_kv_ttl= true; // Enable the TTL-based deletion task. ALTER SYSTEM SET kv_ttl_duty_duration = '[00:00:00, 23:59:59]'; // Trigger the TTL-based deletion task at 00:00.View the task execution status.
View the current task (the execution result depends on the task execution status).
obclient> select * from oceanbase.dba_ob_kv_ttl_tasks;The return result is as follows:
+------------+----------+-----------+---------+----------------------------+----------------------------+--------------+---------------+-------------+---------------------+----------+------------+ | TABLE_NAME | TABLE_ID | TABLET_ID | TASK_ID | START_TIME | END_TIME | TRIGGER_TYPE | STATUS | TTL_DEL_CNT | MAX_VERSION_DEL_CNT | SCAN_CNT | RET_CODE | +------------+----------+-----------+---------+----------------------------+----------------------------+--------------+---------------+-------------+---------------------+----------+------------+ | NULL | -1 | -1 | 1 | 2023-09-28 11:35:02.695292 | 2023-09-28 11:35:17.720627 | PERIODIC | RS_TRIGGERING | 0 | 0 | 0 | OB_SUCCESS | | t1$cf1 | 500002 | 200001 | 1 | 2023-09-28 11:35:06.612708 | 2023-09-28 11:35:06.612708 | PERIODIC | PREPARED | 0 | 0 | 0 | OB_SUCCESS | | t1$cf1 | 500002 | 200002 | 1 | 2023-09-28 11:35:06.612711 | 2023-09-28 11:35:06.612711 | PERIODIC | PREPARED | 0 | 0 | 0 | OB_SUCCESS | | t1$cf1 | 500002 | 200003 | 1 | 2023-09-28 11:35:06.612712 | 2023-09-28 11:35:06.612712 | PERIODIC | PREPARED | 0 | 0 | 0 | OB_SUCCESS | +------------+----------+-----------+---------+----------------------------+----------------------------+--------------+---------------+-------------+---------------------+----------+------------+ 4 rows in setView the historical tasks (the execution result depends on the task execution status).
obclient> select * from oceanbase.dba_ob_kv_ttl_task_history;The return result is as follows:
+------------+----------+-----------+---------+----------------------------+----------------------------+--------------+----------+-------------+---------------------+----------+------------+ | TABLE_NAME | TABLE_ID | TABLET_ID | TASK_ID | START_TIME | END_TIME | TRIGGER_TYPE | STATUS | TTL_DEL_CNT | MAX_VERSION_DEL_CNT | SCAN_CNT | RET_CODE | +------------+----------+-----------+---------+----------------------------+----------------------------+--------------+----------+-------------+---------------------+----------+------------+ | NULL | -1 | -1 | 1 | 2023-09-28 11:35:02.695292 | 2023-09-28 11:35:17.720627 | PERIODIC | FINISHED | 0 | 0 | 0 | OB_SUCCESS | | t1$cf1 | 500002 | 200001 | 1 | 2023-09-28 11:35:06.612708 | 2023-09-28 11:35:11.634029 | PERIODIC | FINISHED | 1 | 0 | 2 | OB_SUCCESS | | t1$cf1 | 500002 | 200002 | 1 | 2023-09-28 11:35:06.612711 | 2023-09-28 11:35:11.632463 | PERIODIC | FINISHED | 1 | 0 | 2 | OB_SUCCESS | | t1$cf1 | 500002 | 200003 | 1 | 2023-09-28 11:35:06.612712 | 2023-09-28 11:35:11.627921 | PERIODIC | FINISHED | 2 | 0 | 4 | OB_SUCCESS | +------------+----------+-----------+---------+----------------------------+----------------------------+--------------+----------+-------------+---------------------+----------+------------+ 4 rows in set
View the data in the current HBase table.
obclient> select * from t1$cf1;The return result is as follows:
+------+-----+----------------+--------+ | K | Q | T | V | +------+-----+----------------+--------+ | row3 | cq1 | -1695872081000 | no del | | row4 | cq1 | -1695872081000 | no del | | row1 | cq1 | -1695872081000 | no del | | row2 | cq1 | -1695872081000 | no del | +------+-----+----------------+--------+ 4 rows in set