Data skipping is an optimization method that calculates data at the storage layer to skip unnecessary I/O. A skip index is a sparse index structure that provides the data skipping capability by storing pre-aggregated data, aiming to enhance the query efficiency. A skip index extends the metadata stored in the index tree to add column-level metadata fields for aggregating and storing the maximum value, minimum value, number of null values, and sum of the specified column data in the range corresponding to the index node. The aggregated data on the index is then used to dynamically prune the data during the calculation of pushed-down expressions, thereby reducing scanning overheads. Skip indexes are a column attribute. You can use the DESC table_name or SHOW CREATE TABLE table_name statement to query the column attributes of a table.
Note
The essence of pre-aggregation is to move calculation in the query execution phase ahead to the data writing phase. The pre-calculated results are stored to improve the query efficiency. This method requires extra calculation in the compaction task, and pre-aggregated data consumes storage space. Skip indexes are stored in the baseline data. Data updates in the pre-aggregation range can invalidate the pre-aggregated data. Therefore, frequent random updates can make skip indexes invalid and undermine the optimization effect.
DDL behaviors of skip indexes
The maintenance of skip index data is completed on the baseline data during major compactions. All DDL operations for updating aggregated data depend on progressive major compactions. That is, a skip index can be partially effective. For example, when a skip index is created on a column, each time a major compaction is completed, the skip index takes effect on the newly written data. After a full major compaction is completed and all data is rewritten, the skip index takes effect on all data in this column.
Skip indexes are a column attribute that can be applied by online DDL operations.
The skip index attribute of a column is restricted by the data type and characteristics of the column. A column with a cascading relationship, such as an indexed column, can inherit the corresponding aggregation attribute.
When you add the skip index attribute to a column, if the skip index size of the table may exceed the maximum storage size, the system reports an error. Using skip indexes is an optimization strategy that trades storage space for query performance. Therefore, when you attempt to add the skip index attribute to a column, make sure that your operation can improve the query performance, so as not to waste storage resources.
Limitations on skip indexes
You cannot create a skip index for a JSON column or a spatial column.
You cannot create a skip index of the
SUMtype for a non-numeric column. Numeric data types include integer types, fixed-point types, and floating-point types. The bit value type is not supported.You cannot create a skip index for a generated column.
Identification method of skip indexes
Note
- By default, no skip index is created for a rowstore table, whereas skip indexes of the
MIN_MAXtype are created for a columnstore table. - When you use the
DESC table_nameorSHOW CREATE TABLE table_namestatement to query the column attributes of a table, the skip index attribute created by default for the table is not displayed. - By default, a skip index of the
MIN_MAXtype is created for a columnstore table. A skip index of theSUMtype is not created by default, because it compromises the performance of direct load and major compaction tasks. If a skip index of theSUMtype can improve the query performance, you can create such an index to accelerate queries. Otherwise, we recommend that you drop the index. - In OceanBase Database V4.3.0 and V4.3.1, a skip index of the
SUMtype is created for a columnstore table by default. After an upgrade to V4.3.2, such an index may become invalid. If the query performance of a columnstore table withSUMaggregation deteriorates, you can create a skip index of theSUMtype to accelerate queries.
You can use SKIP_INDEX(skip_index_option) to specify the skip index attribute for a column. Valid values are as follows:
MIN_MAX: a skip index type that stores the maximum value, minimum value, and number of null values of the indexed column at the index node granularity. This is the most common skip index type. This type of skip index can accelerate the pushdown of filters and MIN/MAX aggregation.SUM: a skip index type that is used to accelerate the pushdown ofSUMaggregation for numeric values.MIN_MAX, SUM: a skip index type that uses bothMIN_MAXandSUMaggregation.
For information about how to modify the skip index attribute, see Modify a table.
Example
Create a table and specify the skip index attribute for a column.
CREATE TABLE test_skidx(
col1 NUMBER SKIP_INDEX(MIN_MAX, SUM),
col2 FLOAT SKIP_INDEX(MIN_MAX),
col3 VARCHAR2(1024) SKIP_INDEX(MIN_MAX),
col4 CHAR(10)
);