OBKV-HBase provides the time-to-live (TTL) feature for you to delete expired data. This topic describes how to run a command or configure a scheduled task to delete expired data.
Background information
OBKV-HBase allows you to delete expired data. For an HBase table, you can specify the TTL (TimeToLive) and maximum number of versions (MaxVersions) for each column family. After you define an expiration strategy, you can run a command or configure a scheduled task to trigger a deletion task, thereby reducing the storage usage. In addition, OceanBase Database filters out expired data in OBKV-HBase so that the data will be invisible.
Notice
At present, OBKV-HBase supports only table-level TTL.
Considerations
OceanBase Database uses the log-structured merge-tree (LSM-tree) architecture, where deletion is also a type of appending operation. Even if expired data is deleted based on the TTL, the actual used storage space may increase instead of decreasing. The release of storage space depends on major compactions. Therefore, we recommend that you perform a major freeze after expired data is deleted based on the TTL. For example, you can set major_freeze_duty_time to be later than kv_ttl_duty_duration.
A TTL-based expired data deletion task runs in the background with a low priority. Therefore, it cannot ensure that expired data is immediately deleted. During task execution, if the memory usage, indicated by the MEMSTORE_USED parameter, of the MemStore exceeds the threshold specified by the FREEZE_TRIGGER parameter, the current task will be suspended until the memory usage of the MemStore decreases to a value lower than the threshold, to avoid affecting the system performance. You can query the value of the FREEZE_TRIGGER parameter from the gv$ob_memstore view. In addition, to avoid affecting the system recovery process, the task will skip a tenant that is undergoing normal physical restore until the tenant is restored.
Definitions of TTL tables
Create an HBase table named t1$cf1 and define kv_attributes as {"Hbase": {"TimeToLive": 3600, "MaxVersions": 3}}, which indicates that the TTL for each row of data is 3600s and the maximum number of versions for each row of data is 3.
create table t1$cf1 (
K varbinary(1024),
Q varbinary(256),
T bigint,
V varbinary(1048576) NOT NULL,
primary key(K, Q, T))
kv_attributes ='{"Hbase": {"TimeToLive": 3600, "MaxVersions": 3}}'
partition by key(K) partitions 97
Remove the TimeToLive attribute from table t1$cf1 and change the MaxVersions attribute to 4.
alter table t1$cf1 kv_attributes ='{"Hbase": {"MaxVersions": 4}}';
Remove the TimeToLive and MaxVersions attributes from table t1$cf1.
alter table t1$cf1 kv_attributes ='{"Hbase": {}}';
DOP of TTL-based deletion tasks
You can set the number of TTL worker threads for a tenant by using the ttl_thread_score parameter. The default value is 2.
For example, you can change the degree of parallelism (DOP) of TTL-based deletion tasks to 10:
alter system set ttl_thread_score = 10;
Trigger a task
You can run a command or configure a scheduled task to trigger a TTL-based deletion task.
You must enable TTL-based deletion for a tenant before you trigger a task.
ALTER SYSTEM SET enable_kv_ttl= true; // The default value is false.
Run a command
Log on to the sys tenant and run the following command to trigger a TTL-based deletion task of all tenants:
ALTER SYSTEM trigger ttl;Configure a scheduled task
Log on to the tenant for which a TTL-based deletion task is to be executed and run the following command to set the time for triggering the task every day:
alter system set kv_ttl_duty_duration = '[22:00:00, 24:00:00]'; // The default value is an empty string (""), which specifies not to trigger a scheduled TTL-based deletion task.
Log on to the sys tenant and run the following commands to suspend, resume, or cancel a TTL-based deletion task:
ALTER SYSTEM suspend ttl; // Suspend the ongoing TTL-based deletion task.
ALTER SYSTEM resume ttl; // Resume the suspended TTL-based deletion task.
ALTER SYSTEM cancel ttl; // Cancel the ongoing TTL-based deletion task.
The TTL commands are valid only in user tenants or the sys tenant. If you run a TTL command in a user tenant, it applies only to TTL-based deletion tasks of the current user tenant. If you run a TTL command in the sys tenant, it applies to TTL-based deletion tasks of all user tenants.
The following table describes the affected scopes of the trigger command. The affected scopes of the suspend, resume, and cancel commands are the same.
| Tenant | Command | Semantics |
|---|---|---|
| sys tenant | alter system trigger ttl |
All user tenants |
| sys tenant | alter system trigger ttl tenant = all |
All user tenants |
| sys tenant | alter system trigger ttl tenant = all_user |
All user tenants |
| sys tenant | alter system trigger ttl tenant = obkv |
obkv tenants |
| sys tenant | alter system trigger ttl tenant = obkv1, obkv2 |
User tenants obkv1 and obkv2 |
| User tenant | alter system trigger ttl |
Current tenant |
View the task status
You can query the status and related information of the current and historical TTL-based deletion tasks from related views.
| View | Description |
|---|---|
| DBA_OB_KV_TTL_TASKS | Allows you to view the ongoing TTL-based deletion task of the current tenant. |
| DBA_OB_KV_TTL_TASK_HISTORY | Allows you to view the historical TTL-based deletion tasks of the current tenant. |
| CDB_OB_KV_TTL_TASKS | Allows you to view the ongoing TTL-based deletion tasks of all tenants in the sys tenant. |
| CDB_OB_KV_TTL_TASK_HISTORY | Allows you to view the historical TTL-based deletion tasks of all tenants in the sys tenant. |
The following table describes the columns in the DBA_OB_KV_TTL_TASKS and DBA_OB_KV_TTL_TASK_HISTORY views.
| Column | Description |
|---|---|
| TABLE_NAME | The name of the table. |
| TABLE_ID | The ID of the table. |
| TABLET_ID | The ID of the tablet. |
| TASK_ID | The ID of the task, starting from 1. |
| START_TIME | The start time of the task. |
| END_TIME | The end time of the task. |
| TRIGGER_TYPE | The triggering type of the task. Valid values: PERIODIC and USER. |
| STATUS | The current status of the task. |
| TTL_DEL_CNT | The number of records deleted from a relational table, or the number of records deleted from an HBase table based on TimeToLive. |
| MAX_VERSION_DEL_CNT | The number of records deleted from an HBase table based on MaxVersions. |
| SCAN_CNT | The number of records scanned for the task. |
| RET_CODE | The return code of the task. |
The following table describes the columns in the CDB_OB_KV_TTL_TASKS and CDB_OB_KV_TTL_TASK_HISTORY views.
Note
The CDB views contain the tenant ID, which is not contained in the DBA views.
| Column | Description |
|---|---|
| TENANT_ID | The ID of the tenant. |
| TABLE_NAME | The name of the table. |
| TABLE_ID | The ID of the table. |
| TABLET_ID | The ID of the tablet. |
| TASK_ID | The ID of the task, starting from 1. |
| START_TIME | The start time of the task. |
| END_TIME | The end time of the task. |
| TRIGGER_TYPE | The triggering type of the task. Valid values: PERIODIC and USER. |
| STATUS | The current status of the task. |
| TTL_DEL_CNT | The number of records deleted from a relational table, or the number of records deleted from an HBase table based on TimeToLive. |
| MAX_VERSION_DEL_CNT | The number of records deleted from an HBase table based on MaxVersions. |
| SCAN_CNT | The number of records scanned for the task. |
| RET_CODE | The return code of the task. |
Visibility of expired data
If you call an API of OBKV-Table or OBKV-HBase to operate a data record that has expired, the result is the same as that in the case where the record does not exist. For example, if a record with the primary key k1 in a relational table has expired, you can insert another record with the primary key k1 without a primary key conflict.
| Operation | Expired | Return code | Affected rows | Others |
|---|---|---|---|---|
| insert | N | -5024 OB_ERR_PRIMARY_KEY_DUPLICATE |
0 | |
| insert | Y | 0 OB_SUCCESS |
1 | |
| delete | N | 0 OB_SUCCESS |
1 | |
| delete | Y | 0 OB_SUCCESS |
0 | |
| update | N | 0 OB_SUCCESS |
1 | |
| update | Y | 0 OB_SUCCESS |
0 | |
| replace | N | 0 OB_SUCCESS |
2 | |
| replace | Y | 0 OB_SUCCESS |
1 | |
| insert_or_update | N | 0 OB_SUCCESS |
1 | |
| insert_or_update | Y | 0 OB_SUCCESS |
1 | |
| increment | N | 0 OB_SUCCESS |
1 | |
| increment | Y | 0 OB_SUCCESS |
1 | Columns other than the incremented column are set to default values. |
| append | N | 0 OB_SUCCESS |
1 | |
| append | Y | 0 OB_SUCCESS |
1 | Columns other than the appended column are set to default values. |
| query/get | N | 0 OB_SUCCESS |
/ | Data is read. |
| query/get | Y | 0 OB_SUCCESS |
/ | No data is read. |
Note
The SQL API does not mask expired data. The expired data is visible until it is cleared by deletion tasks.
Task status
The TTL-based deletion task of a tenant is split into tablet-level tasks for a TTL table during execution. Ongoing tenant tasks and their tablet tasks are recorded in the task table. You can query the tasks from the task view. Historical partition-level tasks are moved to the history table. You can query the tasks from the history view.
Notice
TABLE_NAME of a tenant-level task is NULL, and TABLET_ID and TABLE_ID of a tenant-level task are both -1.
- The
DBA_OB_KV_TTL_TASKSandCDB_OB_KV_TTL_TASKSviews both record tenant-level tasks and tablet-level tasks, which respectively correspond to two different types of status definitions. - The
DBA_OB_KV_TTL_TASK_HISTORYandCDB_OB_KV_TTL_TASK_HISTORYviews both record historical tenant-level tasks and tablet-level tasks.
The following table describes the possible states of tablet-level tasks.
| State | Description |
|---|---|
| PREPARED | The prepared state, in which the task has not started to delete expired data. |
| RUNNING | The running state, in which the task is deleting expired data of the corresponding partition. |
| PENDING | The pending state. You can resume a pending task by using commands. |
| CANCELED | The canceled state. A task in the canceled state cannot be executed. |
| FINISHED | The finished state. A task in the finished state will be moved to the historical task table. |
| INVALID | The invalid state. |
The following table describes the possible states of tenant-level tasks.
| State | Description |
|---|---|
| RS_TRIGGERING | The TTL-based deletion task of the tenant is being executed. |
| RS_SUSPENDING | The TTL-based deletion task of the tenant is suspended. |
| RS_CANCELING | The TTL-based deletion task of the tenant is canceled. |
| RS_MOVING | All TTL-based deletion tasks are being moved to the history table. |
| INVALID | The invalid state. |
Set the period for clearing historical tasks
To prevent too many historical TTL-based deletion tasks from occupying storage space, you can specify the kv_ttl_history_recycle_interval parameter to define the retention period for historical TTL-based deletion tasks of a tenant. The default retention period is 7 days.
ALTER SYSTEM SET kv_ttl_history_recycle_interval= '30d'; // Specify to automatically clear historical TTL-based deletion tasks that have been retained for more than 30 days.
Example: Delete expired data periodically
Create an HBase table.
create table t1$cf1 ( K varbinary(1024), Q varbinary(256), T bigint, V varbinary(1048576) NOT NULL, primary key(K, Q, T)) kv_attributes ='{"Hbase": {"TimeToLive": 3600, "MaxVersions": 3}}' partition by key(K) partitions 3;Insert data into the table as in OBKV-HBase.
For more information about how to insert data in OBKV-HBase, see Connect to a cluster by using the OBKV-HBase client.
INSERT INTO t1$cf1 VALUES ("row1", "cq1", -truncate(time_to_usec(date_sub(now(), interval +1 hour))/1000, 0), "del"), ("row1", "cq1", -truncate(time_to_usec(now())/1000, 0), "no del"), ("row2", "cq1", -truncate(time_to_usec(date_sub(now(), interval +1 hour))/1000, 0), "del"), ("row2", "cq1", -truncate(time_to_usec(now())/1000, 0), "no del"), ("row3", "cq1", -truncate(time_to_usec(date_sub(now(), interval +1 hour))/1000, 0), "del"), ("row3", "cq1", -truncate(time_to_usec(now())/1000, 0), "no del"), ("row4", "cq1", -truncate(time_to_usec(date_sub(now(), interval +1 hour))/1000, 0), "del"), ("row4", "cq1", -truncate(time_to_usec(now())/1000, 0), "no del");Set the daily task execution time and enable scheduled execution for the tenant.
ALTER SYSTEM SET enable_kv_ttl= true; // Enable TTL-based deletion. ALTER SYSTEM SET kv_ttl_duty_duration = '[00:00:00, 23:59:59]'; // Trigger the TTL-based deletion task at 00:00.View the task execution status.
View the current task (the execution result depends on the task execution status).
obclient> select * from oceanbase.dba_ob_kv_ttl_tasks;The return result is as follows:
+------------+----------+-----------+---------+----------------------------+----------------------------+--------------+---------------+-------------+---------------------+----------+------------+ | TABLE_NAME | TABLE_ID | TABLET_ID | TASK_ID | START_TIME | END_TIME | TRIGGER_TYPE | STATUS | TTL_DEL_CNT | MAX_VERSION_DEL_CNT | SCAN_CNT | RET_CODE | +------------+----------+-----------+---------+----------------------------+----------------------------+--------------+---------------+-------------+---------------------+----------+------------+ | NULL | -1 | -1 | 1 | 2023-09-28 11:35:02.695292 | 2023-09-28 11:35:17.720627 | PERIODIC | RS_TRIGGERING | 0 | 0 | 0 | OB_SUCCESS | | t1$cf1 | 500002 | 200001 | 1 | 2023-09-28 11:35:06.612708 | 2023-09-28 11:35:06.612708 | PERIODIC | PREPARED | 0 | 0 | 0 | OB_SUCCESS | | t1$cf1 | 500002 | 200002 | 1 | 2023-09-28 11:35:06.612711 | 2023-09-28 11:35:06.612711 | PERIODIC | PREPARED | 0 | 0 | 0 | OB_SUCCESS | | t1$cf1 | 500002 | 200003 | 1 | 2023-09-28 11:35:06.612712 | 2023-09-28 11:35:06.612712 | PERIODIC | PREPARED | 0 | 0 | 0 | OB_SUCCESS | +------------+----------+-----------+---------+----------------------------+----------------------------+--------------+---------------+-------------+---------------------+----------+------------+ 4 rows in setView the historical tasks (the execution result depends on the task execution status).
obclient> select * from oceanbase.dba_ob_kv_ttl_task_history;The return result is as follows:
+------------+----------+-----------+---------+----------------------------+----------------------------+--------------+----------+-------------+---------------------+----------+------------+ | TABLE_NAME | TABLE_ID | TABLET_ID | TASK_ID | START_TIME | END_TIME | TRIGGER_TYPE | STATUS | TTL_DEL_CNT | MAX_VERSION_DEL_CNT | SCAN_CNT | RET_CODE | +------------+----------+-----------+---------+----------------------------+----------------------------+--------------+----------+-------------+---------------------+----------+------------+ | NULL | -1 | -1 | 1 | 2023-09-28 11:35:02.695292 | 2023-09-28 11:35:17.720627 | PERIODIC | FINISHED | 0 | 0 | 0 | OB_SUCCESS | | t1$cf1 | 500002 | 200001 | 1 | 2023-09-28 11:35:06.612708 | 2023-09-28 11:35:11.634029 | PERIODIC | FINISHED | 1 | 0 | 2 | OB_SUCCESS | | t1$cf1 | 500002 | 200002 | 1 | 2023-09-28 11:35:06.612711 | 2023-09-28 11:35:11.632463 | PERIODIC | FINISHED | 1 | 0 | 2 | OB_SUCCESS | | t1$cf1 | 500002 | 200003 | 1 | 2023-09-28 11:35:06.612712 | 2023-09-28 11:35:11.627921 | PERIODIC | FINISHED | 2 | 0 | 4 | OB_SUCCESS | +------------+----------+-----------+---------+----------------------------+----------------------------+--------------+----------+-------------+---------------------+----------+------------+ 4 rows in set
View the data in the current HBase table.
obclient> select * from t1$cf1;The return result is as follows:
+------+-----+----------------+--------+ | K | Q | T | V | +------+-----+----------------+--------+ | row3 | cq1 | -1695872081000 | no del | | row4 | cq1 | -1695872081000 | no del | | row1 | cq1 | -1695872081000 | no del | | row2 | cq1 | -1695872081000 | no del | +------+-----+----------------+--------+ 4 rows in set