Data skipping is an optimization method that calculates data at the storage layer to skip unnecessary I/O. A skip index is a sparse index structure that provides the data skipping capability by storing pre-aggregated data, aiming to enhance the query efficiency. A skip index extends the metadata stored in the index tree to add column-level metadata fields for aggregating and storing the maximum value, minimum value, null count, and sum of the specified column data in the range corresponding to the index node. The aggregated data on the index is then used to dynamically prune the data during the calculation of pushed-down expressions, thereby reducing scanning overheads.
Note
The essence of pre-aggregation is to move calculation in the query execution phase ahead to the data writing phase. The pre-calculated results are stored to improve the query efficiency. This method requires extra calculation in the compaction task, and pre-aggregated data consumes storage space. Skip indexes are stored in the baseline data. Data updates in the pre-aggregation range can invalidate the pre-aggregated data. Therefore, frequent random updates can make skip indexes invalid and undermine the optimization effect.
Skip indexes are a column attribute. In a columnstore table, OceanBase Database creates skip indexes of the MIN_MAX and SUM types by default for columns whose type meets the skip index requirements. An explicit setting of the skip index attribute takes effect mainly on rowstore columns, and is currently invalid for columnstore columns. In addition, when you query the column attributes of a table by using the DESC table_name or SHOW CREATE TABLE table_name statement, the skip index attribute is not displayed for a columnstore table, and only the explicitly set skip index attribute is displayed.
Skip index DDL behavior
The maintenance of skip index data is completed on the baseline data during the major compaction. All DDL operations for updating aggregated data depend on the progressive major compaction. That is, a skip index can be partially effective. For example, when a skip index is created on a column, each time a major compaction is completed, the skip index takes effect on the newly written data. After a full major compaction is completed and all data is rewritten, the skip index takes effect on all data in this column.
Skip indexes are a column attribute that can be applied by online DDL operations.
The skip index attribute of a column is restricted by the data type and characteristics of the column. A column with a cascading relationship, such as an indexed column, can inherit the corresponding aggregation attribute.
When you add the skip index attribute to a column, if the skip index size of the table may exceed the maximum storage size, the system will report an error. Using skip indexes is an optimization strategy that trades storage space for query performance. Therefore, when you attempt to add the skip index attribute to a column, make sure that your operation can improve the query performance, so as not to waste storage resources.
By default, the system creates a skip index that stores aggregated data of the
MIN_MAXandSUMtypes for columnstore columns whose type meets the requirements.
Skip index limitations
You cannot create a skip index for a JSON column or a spatial column.
You cannot create a skip index of the
SUMtype for a non-numeric column. Numeric types include integer types, fixed-point types, and floating-point types. The bit value type is not supported.You cannot create a skip index for a generated column.
Skip index identification method
You can use SKIP_INDEX(skip_index_option) to specify the skip index attribute for a column. Values of skip_index_option are as follows:
MIN_MAX: the most common skip index type. A skip index of this type stores the maximum value, minimum value, and null count of the indexed column at the index node granularity. This type of skip indexes can accelerate the pushdown of filters andMIN/MAXaggregation.SUM: the skip index type that is used to accelerate the pushdown ofSUMaggregation for numeric values.MIN_MAX, SUM: the skip index type that uses bothMIN_MAXandSUMaggregation.
For information about how to modify the skip index attribute, see Modify a table.
Examples
Create a table and specify the skip index attribute for columns.
CREATE TABLE test_skidx(
col1 NUMBER SKIP_INDEX(MIN_MAX, SUM),
col2 FLOAT SKIP_INDEX(MIN_MAX),
col3 VARCHAR2(1024) SKIP_INDEX(MIN_MAX),
col4 CHAR(10)
);