Data skipping is an optimization method that performs calculations at the storage layer to try to skip unnecessary I/O operations. A skip index is a sparse index structure that provides data skipping capabilities by storing pre-aggregated data, thereby improving query efficiency. OceanBase Database supports two types of skip indexes: baseline data skip indexes and incremental data skip indexes.
Generate baseline data for Skip Index
Skip Index extends the metadata stored in the index tree by adding column-level metadata fields. It aggregates and stores the maximum and minimum values, null count, and sum of specified column data within the range corresponding to each index node. During expression evaluation, the system dynamically prunes data based on the aggregated data stored in the index, reducing the overhead of scanning. The essence of pre-aggregation is to move the computation from the query execution phase to the data writing phase. By storing precomputed results, the system improves query efficiency. However, this approach requires additional computation during compaction and consumes storage space. Skip Index is stored in baseline data. If the data in the range corresponding to the pre-aggregated data is updated, the pre-aggregated data becomes invalid. Frequent random updates can cause Skip Index to become invalid, reducing the optimization effect. Skip Index is a column attribute. You can use the DESC table_name or SHOW CREATE TABLE table_name command to view the column attributes of a table.
DDL operations for Skip Index
The maintenance of Skip Index data is completed on the baseline data during major compactions. All DDL operations for updating aggregated data currently rely on major compactions. In other words, Skip Index can be partially effective. For example, if you create a Skip Index on a column, the Skip Index is effective for the data written after the major compaction. After a full major compaction, the Skip Index becomes effective for all data in the column.
Skip Index is a column attribute that can be applied through online DDL operations.
The number of Skip Indexes that a column can have is limited by the data type and characteristics of the column. Columns with cascading relationships inherit the corresponding aggregated attributes, such as indexed columns.
If the size of the Skip Index for a single table may exceed the maximum supported storage limit, an error is returned. Using Skip Index is an optimization strategy that trades space for time. Therefore, when deciding to add a Skip Index to a specific column, ensure that this operation has a meaningful impact on query performance to avoid wasting storage resources.
Limitations on using Skip Index
Skip Index cannot be created for columns of the JSON or spatial data type.
Skip Index of the
SUMtype cannot be created for non-numeric columns. Numeric columns include integer, decimal, and floating-point columns (Bit-Value columns are not supported).Skip Index cannot be created for generated columns.
Identify Skip Index
Note
- Rowstore tables do not have any Skip Index by default. Columnstore tables have a
MIN_MAXtype of Skip Index by default. - For the Skip Index attributes created by default, the Skip Index attributes are not displayed when you run the
DESC table_nameorSHOW CREATE TABLE table_namecommand to view the column attributes of a table. - Columnstore tables have a
MIN_MAXtype of Skip Index by default, but not aSUMtype of Skip Index. This is because theSUMtype of Skip Index may affect the performance of direct load and major compaction tasks. If theSUMtype of Skip Index can optimize query performance, you can explicitly create aSUMtype of Skip Index to accelerate query. If theSUMtype of Skip Index cannot optimize query performance, we recommend that you delete theSUMtype of Skip Index.
You can specify the SKIP_INDEX(skip_index_option) option to identify the Skip Index attribute of a column. Valid values of skip_index_option are as follows:
MIN_MAX: The most commonly used aggregated data type in Skip Index. It stores the maximum value, minimum value, and null count of the indexed column at the index node level. This type of data can accelerate the pushdown of filters andMIN/MAXaggregations.SUM: Accelerates the pushdown ofSUMaggregations for numeric columns.MIN_MAX, SUM: The Skip Index type that combines bothMIN_MAXandSUMaggregated data types.
For more information about how to modify the Skip Index attribute, see Modify a table.
Example
Specify the Skip Index attribute of a column when you create a table.
CREATE TABLE test_skidx(
col1 INT SKIP_INDEX(MIN_MAX, SUM),
col2 FLOAT SKIP_INDEX(MIN_MAX),
col3 VARCHAR(1024) SKIP_INDEX(MIN_MAX),
col4 CHAR(10)
);
Generate Skip Index for incremental data
OceanBase Database supports specifying that Skip Index aggregate information is generated for incremental SSTables in the same way as baseline SSTables when creating a table. This improves the query performance of incremental SSTables in OceanBase Database.
Generation strategy for Skip Index aggregate information
- If the update model of the table is
partial_update, Skip Index aggregate information is not generated. Skip Index aggregate information is generated only for tables with thedelete_insertorappend_onlyupdate model. - Due to the limited storage space for Skip Index aggregate information, which is fixed at 1024 bytes, Skip Index aggregate information cannot be generated for all columns. Skip Index aggregate information is not generated for
virtualcolumns,jsoncolumns,geo typecolumns, andoutrow lobcolumns.
Note
Since SUM aggregate information is rarely used and occupies a large amount of space, the system automatically generates only MIN_MAX and NULL_COUNT Skip Index aggregate information for columns that are not explicitly specified to have Skip Indexes in the schema.
Syntax
To specify that Skip Index aggregate information is generated for incremental SSTables in the same way as baseline SSTables when creating a table, add the SKIP_INDEX_LEVEL table option to the CREATE TABLE statement. The SQL statement is as follows:
CREATE TABLE table_name column_definition
SKIP_INDEX_LEVEL [=] {1 | 0};
The value of SKIP_INDEX_LEVEL can be set as follows:
0: Skip Index aggregate information is generated for all columnar baseline SSTables and for row-based baseline SSTables based on the schema. Skip Index aggregate information is not generated for incremental SSTables.
1: In addition to the settings for 0, incremental SSTables of tables in the
DELETE_INSERTorAPPEND_ONLYmode generate Skip Index aggregate information based on the behavior of the baseline SSTables:- For columnar SSTables: Skip Index aggregate information is generated for all baseline SSTables. Incremental SSTables generate Skip Index aggregate information based on the generation strategy for Skip Index aggregate information.
- For row-based SSTables: Skip Index aggregate information is generated for baseline SSTables based on the schema. Incremental SSTables also generate Skip Index aggregate information based on the schema.
- For hybrid columnar and row-based SSTables: Skip Index aggregate information is generated for columnar baseline SSTables. Incremental SSTables generate Skip Index aggregate information based on the generation strategy for Skip Index aggregate information.
If you do not specify the SKIP_INDEX_LEVEL parameter when creating a table, the system determines the default value of SKIP_INDEX_LEVEL based on the value of the tenant-level parameter default_skip_index_level. For more information, see default_skip_index_level.
You can use the ALTER TABLE statement to modify SKIP_INDEX_LEVEL. The SQL statement is as follows:
ALTER TABLE table_name SET SKIP_INDEX_LEVEL [=] {1 | 0};
After you modify SKIP_INDEX_LEVEL, the generation and deletion of Skip Index aggregate information for incremental SSTables of the table take effect during the next major compaction or minor compaction.
Examples
Create a table named
test_skidx_lev1. This table allows the generation of Skip Index aggregate information for incremental SSTables.obclient> CREATE TABLE test_skidx_lev1(col1 INT) SKIP_INDEX_LEVEL = 1;Create a table named
test_skidx_lev2. This table does not allow the generation of Skip Index aggregate information for incremental SSTables.obclient> CREATE TABLE test_skidx_lev2(col1 INT) SKIP_INDEX_LEVEL = 0;Create a table named
test_skidx_lev3without specifyingSKIP_INDEX_LEVEL.obclient> CREATE TABLE test_skidx_lev3(col1 INT);Modify the
SKIP_INDEX_LEVELvalue of thetest_skidx_lev2table to 1.obclient> ALTER TABLE test_skidx_lev2 SET SKIP_INDEX_LEVEL = 1;
