OBKV-Table supports the deletion of expired data based on the time to live (TTL). This topic describes how to delete expired data by using commands or periodic tasks.
Background information
OBKV-Table allows you to set a expiration policy for relational tables, and then delete data that has expired using commands or periodic tasks. This helps free up storage space that would otherwise be taken up by old, irrelevant information. In addition, OBKV-Table automatically filters out this expired data to prevent them from being visible.
Use the expired data deletion feature
OceanBase Database uses an LSM-Tree architecture (where deletion is also an append-write operation), so even if expired data is deleted using a TTL task, the actual storage space may not decrease and could even increase. The final release of storage space still relies on major compaction. Therefore, we recommend that you use it in conjunction with major freeze. After the TTL task completes the deletion, perform a major freeze. For example, this can be achieved by setting major_freeze_duty_time after kv_ttl_duty_duration.
The expiration deletion TTL task is a low-priority background task and currently cannot guarantee the immediate deletion of expired data. To avoid significantly impacting system performance, if the MemStore memory MEMSTORE_USED exceeds the threshold FREEZE_TRIGGER (which can be queried through the gv$ob_memstore view) during the execution of the task, the current task will be paused until the memory drops below the threshold. Additionally, to avoid affecting the system's recovery process, for tenants undergoing normal physical recovery, the TTL task will also pause the execution of the TTL task for that tenant until the tenant completes the recovery.
Definition of TTL tables
Create a TTL table
t1, where each row in the table expires afterb + INTERVAL 12 HOUR.CREATE TABLE t1 ( a INT PRIMARY KEY, b TIMESTAMP DEFAULT current_timestamp ON UPDATE current_timestamp) TTL (b + INTERVAL 12 HOUR);Modify the TTL of table
t1tob + INTERVAL 1 DAY.ALTER TABLE t1 TTL (b + INTERVAL 1 DAY);Remove the TTL attribute from table
t1.ALTER TABLE t1 REMOVE TTL;
The TTL attribute of a table is specified in the format COLUMN_NAME + INTERVAL NUM TTL_UNIT, where:
COLUMN_NAMEis the name of a column defined in the table. The column must be of theTIMESTAMPorDATETIMEdata type.+ INTERVALis a fixed part.NUMis an integer that indicates the number of TTL units.TTL_UNITcan beSECOND,MINUTE,HOUR,DAY,MONTH, orYEAR.
Parallelism of TTL tasks
You can configure the number of TTL worker threads using the tenant-level parameter ttl_thread_score. The default value is 2.
The example shows how to modify the parallelism of TTL tasks to 10:
ALTER SYSTEM SET ttl_thread_score = 10;
Trigger a TTL task
You can trigger a TTL task in two ways: by using a control command or by setting up a periodic task.
Before triggering a TTL task, make sure that the tenant's TTL task switch is enabled.
ALTER SYSTEM SET enable_kv_ttl= true; -- The default value is false.
Trigger a TTL task by using a control command:
Log on to the system tenant and run the following command to trigger a TTL task across all tenants:
ALTER SYSTEM trigger TTL;Trigger a TTL task by setting a periodic task:
Log on to the desired tenant and execute the following command to set the time range for TTL task triggers:
ALTER SYSTEM SET kv_ttl_duty_duration = '[22:00:00, 24:00:00]'; // The default value is "", indicating that periodic TTL tasks are not triggered.
You can log on to the system tenant and run the following system commands to suspend, resume, or abort an ongoing TTL task.
ALTER SYSTEM SUSPEND TTL; -- Suspends the current running TTL task.
ALTER SYSTEM RESUME TTL; -- Resumes the suspended TTL task.
ALTER SYSTEM CANCEL TTL; -- Cancels the running TTL task.
This TTL command can only be executed in the context of either the user tenant or the system tenant; executing it in the user tenant only allows managing that specific tenant's TTL tasks, whereas executing it in the system tenant permits managing all users' TTL tasks.
Below is an example using the TRIGGER command. The behavior for other commands such as SUSPEND, RESUME, and CANCEL is consistent with the tenant behavior:
| Tenant | Command | Semantics |
|---|---|---|
| System tenant | ALTER SYSTEM SUSPEND TTL |
All user tenants |
| System tenant | ALTER SYSTEM TRIGGER TTL TENANT = all |
All user tenants |
| System tenant | ALTER SYSTEM TRIGGER TTL TENANT = all_user |
All user tenants |
| System tenant | ALTER SYSTEM TRIGGER TTL TENANT = obkv |
OBKV tenant |
| System tenant | ALTER SYSTEM TRIGGER TTL TENANT = obkv1, obkv2 |
User tenants obkv1 and obkv2 |
| User tenant | ALTER SYSTEM TRIGGER TTL |
The current tenant |
View task status
You can use views to view the status and related information of current and historical TTL tasks. The following views are related to TTL tasks:
| View | Description |
|---|---|
| DBA_OB_KV_TTL_TASKS | View the TTL tasks that are currently being executed in the current tenant. |
| DBA_OB_KV_TTL_TASK_HISTORY | View the historical TTL tasks in the current tenant. |
| CDB_OB_KV_TTL_TASKS | View the TTL tasks that are currently being executed in all tenants in the system tenant. |
| CDB_OB_KV_TTL_TASK_HISTORY | View the historical TTL tasks in all tenants in the system tenant. |
The following table describes the fields in the DBA_OB_KV_TTL_TASKS and DBA_OB_KV_TTL_TASK_HISTORY views:
| Field | Description |
|---|---|
| TABLE_NAME | The table name. |
| TABLE_ID | The table ID. |
| TABLET_ID | Tablet ID |
| TASK_ID | The task ID, which starts from 1. |
| START_TIME | The start time of the task. |
| END_TIME | The end time of the task. |
| TRIGGER_TYPE | The trigger type of the task, which includes PERIODIC and USER. |
| STATUS | The status of the task. For more information, see the task status definitions. |
| TTL_DEL_CNT | The number of records deleted from related tables. |
| MAX_VERSION_DEL_CNT | The number of records deleted from HBase tables based on the MaxVersions feature, which is 0 in the relational table model. |
| SCAN_CNT | The number of records scanned by the task. |
| RET_CODE | The return code of the task. |
The following table describes the fields of the CDB_OB_KV_TTL_TASKS and CDB_OB_KV_TTL_TASK_HISTORY views:
Note
Compared with DBA views, the CDB views contain an additional tenant ID field.
| Field | Description |
|---|---|
| TENANT_ID | The tenant ID. |
| TABLE_NAME | The table name. |
| TABLE_ID | The table ID. |
| TABLET_ID | The tablet ID. |
| TASK_ID | The task ID, which starts from 1. |
| START_TIME | The start time of the task. |
| END_TIME | The end time of the task. |
| TRIGGER_TYPE | The task trigger type, which can be either PERIODIC or USER. |
| STATUS | The status of the task. For more information, see the task status definitions. |
| TTL_DEL_CNT | The number of records deleted from related tables. |
| MAX_VERSION_DEL_CNT | The number of records deleted from HBase tables based on the MaxVersions feature, which is 0 in the relational table model. |
| SCAN_CNT | The number of records scanned by the task. |
| RET_CODE | The return code of the task. |
Visibility of expired data
If you use the OBKV-Table interface to perform an operation on a record that has expired, the operation is performed as if the record does not exist. For example, if a record with the primary key k1 has expired in a table, you can insert a new record with the primary key k1 without encountering a primary key conflict.
| Operation | Expired | Return code | Affected rows | Other |
|---|---|---|---|---|
| insert | N | -5024 OB_ERR_PRIMARY_KEY_DUPLICATE |
0 | |
| insert | Y | 0 OB_SUCCESS |
1 | |
| delete | N | 0 OB_SUCCESS |
1 | |
| delete | Y | 0 OB_SUCCESS |
0 | |
| update | N | 0 OB_SUCCESS |
1 | |
| update | Y | 0 OB_SUCCESS |
0 | |
| replace | N | 0 OB_SUCCESS |
2 | |
| replace | Y | 0 OB_SUCCESS |
1 | |
| insert_or_update | N | 0 OB_SUCCESS |
1 | |
| insert_or_update | Y | 0 OB_SUCCESS |
1 | |
| increment | N | 0 OB_SUCCESS |
1 | |
| increment | Y | 0 OB_SUCCESS |
1 | Except for the increment column, other columns are written with default values. |
| append | N | 0 OB_SUCCESS |
1 | |
| append | Y | 0 OB_SUCCESS |
1 | Except for the append column, other columns are written with default values. |
| query/get | N | 0 OB_SUCCESS |
/ | Data is successfully read. |
| query/get | Y | 0 OB_SUCCESS |
/ | No data is read. |
Note
SQL interfaces do not currently filter out expired data. Expired data is visible until it is deleted by the expired data cleanup task.
Task status
For each tenant's TTL task, it is split into a tablet-level task for each TTL table during actual execution. The tenant tasks and their tablet tasks are recorded in the task table. You can view the tenant tasks and their tablet tasks in the task view. The historical partition tasks are moved to the history table. You can view the historical tasks in the history view.
Notice
For tenant-level tasks, TABLE_NAME is NULL, and TABLET_ID and TABLE_ID are -1.
- In the task views
DBA_OB_KV_TTL_TASKSandCDB_OB_KV_TTL_TASKS, there are two types of tasks: tenant-level tasks and tablet-level tasks. They correspond to two different statuses. - In the history task views
DBA_OB_KV_TTL_TASK_HISTORYandCDB_OB_KV_TTL_TASK_HISTORY, historical tenant-level tasks and tablet-level tasks are saved.
Tablet task status:
| Task status | Description |
|---|---|
| PREPARED | The task is in the prepared state and has not started to delete expired data. |
| RUNNING | The task is in the running state and is deleting expired data in the partition. |
| PENDING | The task is in the pending state and can be resumed by a command. |
| CANCELED | The task is in the canceled state and cannot be resumed. |
| FINISHED | The task is in the finished state and is waiting to be moved to the history table. |
| INVALID | The task is in an invalid state. |
Tenant task status:
| Task status | Description |
|---|---|
| RS_TRIGGERING | The tenant's TTL task is being executed. |
| RS_SUSPENDING | The tenant's TTL task has been suspended. |
| RS_CANCELING | The tenant's TTL task has been canceled. |
| RS_MOVING | All TTL tasks are being moved to the history table. |
| INVALID | The task is in an invalid state. |
Set the retention period for historical TTL tasks
To avoid unnecessary storage space waste caused by the accumulation of historical TTL task records, you can set the retention period for historical TTL task records of a tenant by using the kv_ttl_history_recycle_interval parameter. The default retention period is 7 days.
ALTER SYSTEM SET kv_ttl_history_recycle_interval= '30d'; -- Set the retention period for historical TTL task records to 30 days.
Example: Scheduled deletion of expired data
Create a TTL table and specify that each record expires 10 seconds after the
ctimestamp.CREATE TABLE ttl_table(a VARCHAR(1024) PRIMARY KEY, b VARCHAR(1024), c TIMESTAMP) TTL(c + INTERVAL 10 SECOND) PARTITION BY KEY(a) PARTITIONS 3;Insert data.
you can use OBKV-Table/SQL, or use the OBKV-Table API to insert data. For more information, see Client Introduction and Usage Instructions.
INSERT INTO ttl_table VALUES("k1", "hello obkv", now()); INSERT INTO ttl_table VALUES("k2", "hello obkv", now()); INSERT INTO ttl_table VALUES("k3", "hello obkv", now());Wait for 10 seconds, and then execute the following commands to trigger the TTL task:
ALTER SYSTEM SET enable_kv_ttl= true; -- Enable the TTL task. ALTER SYSTEM TRIGGER TTL; -- Trigger a TTL task starting from midnight (00:00).Check the execution status of the task.
obclient> SELECT * FROM OCEANBASE.DBA_OB_KV_TTL_TASKS;The following result is returned:
+------------+----------+-----------+---------+----------------------------+----------------------------+--------------+---------------+-------------+---------------------+----------+------------+ | TABLE_NAME | TABLE_ID | TABLET_ID | TASK_ID | START_TIME | END_TIME | TRIGGER_TYPE | STATUS | TTL_DEL_CNT | MAX_VERSION_DEL_CNT | SCAN_CNT | RET_CODE | +------------+----------+-----------+---------+----------------------------+----------------------------+--------------+---------------+-------------+---------------------+----------+------------+ | ttl_table | 500002 | 200003 | 1 | 2023-09-27 23:31:30.300276 | 2023-09-27 23:31:35.315848 | USER | FINISHED | 2 | 0 | 2 | OB_SUCCESS | | ttl_table | 500002 | 200002 | 1 | 2023-09-27 23:31:30.300276 | 2023-09-27 23:31:35.320258 | USER | FINISHED | 1 | 0 | 1 | OB_SUCCESS | | ttl_table | 500002 | 200001 | 1 | 2023-09-27 23:31:30.300271 | 2023-09-27 23:31:35.321879 | USER | FINISHED | 0 | 0 | 0 | OB_SUCCESS | | NULL | -1 | -1 | 1 | 2023-09-27 23:31:28.675583 | 2023-09-27 23:31:28.675583 | USER | RS_TRIGGERING | 0 | 0 | 0 | OB_SUCCESS | +------------+----------+-----------+---------+----------------------------+----------------------------+--------------+---------------+-------------+---------------------+----------+------------+ 4 rows in set (0.043 sec)After the task is executed, the task is moved to the history table. You can query the
OCEANBASE.DBA_OB_KV_TTL_TASK_HISTORYtable to view the task execution history.obclient> SELECT * FROM OCEANBASE.DBA_OB_KV_TTL_TASK_HISTORY;The following result is returned:
+------------+----------+-----------+---------+----------------------------+----------------------------+--------------+----------+-------------+---------------------+----------+------------+ | TABLE_NAME | TABLE_ID | TABLET_ID | TASK_ID | START_TIME | END_TIME | TRIGGER_TYPE | STATUS | TTL_DEL_CNT | MAX_VERSION_DEL_CNT | SCAN_CNT | RET_CODE | +------------+----------+-----------+---------+----------------------------+----------------------------+--------------+----------+-------------+---------------------+----------+------------+ | ttl_table | 500002 | 200003 | 1 | 2023-09-27 23:31:30.300276 | 2023-09-27 23:31:35.315848 | USER | FINISHED | 2 | 0 | 2 | OB_SUCCESS | | ttl_table | 500002 | 200002 | 1 | 2023-09-27 23:31:30.300276 | 2023-09-27 23:31:35.320258 | USER | FINISHED | 1 | 0 | 1 | OB_SUCCESS | | ttl_table | 500002 | 200001 | 1 | 2023-09-27 23:31:30.300271 | 2023-09-27 23:31:35.321879 | USER | FINISHED | 0 | 0 | 0 | OB_SUCCESS | | NULL | -1 | -1 | 1 | 2023-09-27 23:31:28.675583 | 2023-09-27 23:31:42.526853 | USER | FINISHED | 0 | 0 | 0 | OB_SUCCESS | +------------+----------+-----------+---------+----------------------------+----------------------------+--------------+----------+-------------+---------------------+----------+------------+ 4 rows in set (0.058 sec)After the task is executed, all expired data is deleted.
obclient> SELECT * FROM ttl_table; Empty set (0.060 sec)