Data skipping is an optimization technique that performs calculations at the storage layer to skip unnecessary I/O operations. A skip index is a sparse index structure that stores pre-aggregated data to enable data skipping, thereby improving query efficiency. In OceanBase Database, skip indexes are generated in two ways: baseline data generation and incremental data generation.
Generate baseline data for Skip Index
Skip Index extends the metadata stored in the index tree by adding column-level metadata fields. It aggregates and stores the maximum and minimum values, null count, and sum of specified column data within the range corresponding to each index node. During expression evaluation, it dynamically prunes data using the aggregated data from the index, reducing scan overhead. The essence of pre-aggregation is to move computation from the query execution phase to the data writing phase by storing precomputed results, thereby improving query efficiency. However, this approach requires additional computation during compaction and consumes storage space. Skip Index is stored in baseline data. If the data in the corresponding range is updated, the pre-aggregated data becomes invalid. Frequent random updates can render Skip Index ineffective, reducing optimization benefits. Skip Index is a column attribute that can be viewed using the DESC table_name or SHOW CREATE TABLE table_name commands.
DDL behavior of Skip Index
The maintenance of Skip Index data is completed on the baseline data during major compactions. All DDL actions for updating aggregated data currently rely on major compactions. This means that Skip Index can be partially effective. For example, if a Skip Index is created for a specific column, it will be effective for data written after each major compaction. After a full major compaction, the Skip Index will be effective for all data in that column.
Skip Index is a column attribute that can be applied using online DDL operations.
The Skip Index attribute of a column is limited by the data type and characteristics of the column. Columns with cascading relationships inherit corresponding aggregate attributes, such as indexed columns.
When adding the Skip Index attribute to a column, if the system detects that the size of the Skip Index for a single table may exceed the maximum supported storage limit, an error will be reported. Using Skip Index is an optimization strategy that trades space for time. Therefore, when deciding to add the Skip Index attribute to a specific column, ensure that this operation has a meaningful impact on query performance to avoid wasting storage resources.
Limitations of Skip Index
Skip Index cannot be created for columns of JSON or spatial data types.
Skip Index of the
SUMtype cannot be created for non-numeric columns. Numeric types include integers, fixed-point numbers, and floating-point numbers (bit-value types are not supported).Skip Index cannot be created for generated columns.
Identify Skip Index
Note
- Rowstore tables do not have any type of Skip Index by default. Columnstore tables have a
MIN_MAXtype of Skip Index by default. - For the Skip Index attributes created by default, the Skip Index attribute will not be displayed when you run the
DESC table_nameorSHOW CREATE TABLE table_namecommand to view the column attributes of the table. - Columnstore tables have a
MIN_MAXtype of Skip Index by default, but do not have aSUMtype of Skip Index by default. This is becauseSUMtype of Skip Index may have some impact on the performance of direct load and major compaction. If creating aSUMtype of Skip Index can optimize query performance, you can explicitly create aSUMtype of Skip Index to accelerate queries. If not, we recommend that you delete theSUMtype of Skip Index.
Specify the SKIP_INDEX(skip_index_option) option to identify the Skip Index attribute of a column. The value of skip_index_option is as follows:
MIN_MAX: This is the most commonly used aggregate data type in Skip Index. It stores the maximum value, minimum value, and null count of the indexed column at the index node level. This type of data can accelerate the filtering andMIN/MAXaggregation operations pushed down.SUM: This is used to accelerate theSUMaggregation operations pushed down for numeric types.MIN_MAX, SUM: This specifies the Skip Index type that uses bothMIN_MAXandSUMaggregate data types.
For more information about changing the Skip Index attribute, see Change a table.
Examples
Specify the Skip Index attribute of a column when you create a table.
CREATE TABLE test_skidx(
col1 NUMBER SKIP_INDEX(MIN_MAX, SUM),
col2 FLOAT SKIP_INDEX(MIN_MAX),
col3 VARCHAR2(1024) SKIP_INDEX(MIN_MAX),
col4 CHAR(10)
);
Incremental data generation Skip Index
OceanBase Database allows you to specify that incremental SSTables are generated based on the baseline behavior when you create a table, to improve the query performance of incremental SSTables.
Generation strategy for Skip Index aggregation information
- If the update model is
partial_update, Skip Index aggregation information is not generated. Skip Index aggregation information is generated only for tables with thedelete_insertorappend_onlyupdate model. - Due to the limited storage space for Skip Index aggregation information, which is fixed at 1024 bytes, not all columns can have Skip Index aggregation information generated. Skip Index aggregation information is not generated for
virtualcolumns,jsoncolumns,geo typecolumns, andoutrow lobcolumns.
Note
Since SUM type aggregation information is rarely used and occupies a large amount of space, the system automatically generates only MIN_MAX and NULL_COUNT Skip Index aggregation information for columns specified with Skip Index in the schema.
Syntax
To specify that incremental SSTables are generated based on the baseline behavior when you create a table, add the SKIP_INDEX_LEVEL table option after the CREATE TABLE statement. The SQL statement is as follows:
CREATE TABLE table_name column_definition
SKIP_INDEX_LEVEL [=] {1 | 0};
The value of SKIP_INDEX_LEVEL can be as follows:
0: Skip Index aggregation information is generated for all columnar baselines of the table, and for rowstore baselines based on the schema. Skip Index aggregation information is not generated for incremental SSTables.
1: In addition to 0, incremental SSTables of tables in the
DELETE_INSERTorAPPEND_ONLYmode generate Skip Index aggregation information based on the baseline behavior.- For columnar baselines: Skip Index aggregation information is generated for all columnar baselines, and for incremental SSTables based on the Skip Index aggregation information generation strategy.
- For rowstore baselines: Skip Index aggregation information is generated for rowstore baselines based on the schema, and for incremental SSTables based on the schema.
- For hybrid columnar and rowstore baselines: Skip Index aggregation information is generated for columnar baselines, and for incremental SSTables based on the Skip Index aggregation information generation strategy.
If you do not specify the SKIP_INDEX_LEVEL parameter when you create a table, the system determines the default value of SKIP_INDEX_LEVEL based on the value of the tenant-level parameter default_skip_index_level. For more information, see default_skip_index_level.
You can modify SKIP_INDEX_LEVEL by using the ALTER TABLE statement. The SQL statement is as follows:
ALTER TABLE table_name SET SKIP_INDEX_LEVEL [=] {1 | 0};
After you modify SKIP_INDEX_LEVEL, the generation and deletion of Skip Index aggregation information for incremental SSTables of the table take effect during the next major compaction or minor compaction.
Examples
Create a table named
test_skidx_lev1. Incremental SSTables of this table can generate Skip Index aggregation information.obclient> CREATE TABLE test_skidx_lev1(col1 NUMBER) SKIP_INDEX_LEVEL = 1;Create a table named
test_skidx_lev2. Incremental SSTables of this table cannot generate Skip Index aggregation information.obclient> CREATE TABLE test_skidx_lev2(col1 NUMBER) SKIP_INDEX_LEVEL = 0;Create a table named
test_skidx_lev3without specifyingSKIP_INDEX_LEVEL.obclient> CREATE TABLE test_skidx_lev3(col1 NUMBER);Modify the
SKIP_INDEX_LEVELvalue of thetest_skidx_lev2table to 1.obclient> ALTER TABLE test_skidx_lev2 SET SKIP_INDEX_LEVEL = 1;
