Purpose
This statement is used to create an index. An index is a structure created on a table that sorts the values of one or more columns in the table. Its main purpose is to improve query speed and reduce the performance overhead of the database system.
OceanBase Database supports unique and non-unique indexes. When creating an index, you can use the STORING(column_name,...) clause to specify redundant columns in the index table. In addition to the user-specified columns, the index table automatically includes the primary key columns of the primary table (or the hidden primary key if the table has no primary key). Storing additional columns in the index provides the database optimizer with more options.
Limitations and considerations
- If all indexed columns are deleted from a table, the indexes created on the table become invalid.
- Full-text indexes can be created only on
CHAR,VARCHAR, andTEXTcolumns. - Search indexes can be created on JSON, single-level array, and various scalar type columns. For more information, see Data types supported by search indexes.
- Only local (
LOCAL) full-text or search indexes can be created in the current version. - The
UNIQUEkeyword cannot be specified when you create a full-text index. - When you create a full-text index that involves multiple columns, the columns must have the same character set.
- The
WITH PARSER {SPACE | NGRAM | BENG | IK}clause can be specified only when you create a full-text index.
Privilege requirements
To create an index by executing the CREATE INDEX statement, the current user must have the INDEX privilege on the corresponding object. For more information about the privileges of OceanBase Database, see Privilege classification in MySQL mode.
Syntax
General index syntax
CREATE [hint_options] [index_type] INDEX [IF NOT EXISTS] index_name
[USING BTREE] ON table_name (sort_column_key [, sort_column_key... ])
[WITH PARSER tokenizer_option]
[PARSER_PROPERTIES[=](parser_properties_list)]
[FTS_INDEX_TYPE [=] {MATCH | PHRASE_MATCH}]
[index_option...] [partition_option] [index_column_group_option]
index_type:
SPATIAL
| UNIQUE
| FULLTEXT
| SEARCH -- For specific syntax, see the syntax for search indexes (Search Index) below.
sort_column_key:
column_name [(integer)] [ASC] [ID id]
| (index_expr) [ASC] [ID id]
tokenizer_option:
SPACE
| NGRAM
| BENG
| IK
| NGRAM2
parser_properties_list:
parser_properties, [parser_properties]
parser_properties:
min_token_size = int_value
| max_token_size = int_value
| ngram_token_size = int_value
| ik_mode = 'char_value'
| min_ngram_size = int_value
| max_ngram_size = int_value
index_option:
GLOBAL
| LOCAL
| BLOCK_SIZE integer
| COMMENT STRING_VALUE
| STORING (column_name [, column_name...])
| WITH_ROWID
| WITH PARSER STRING_VALUE
| index_using_algorithm
| visibility_option
| DATA_TABLE_ID data_table_id
| INDEX_TABLE_ID index_table_id
| VIRTUAL_COLUMN_ID virtual_column_id
| MAX_USED_PART_ID used_part_id
| STORAGE_CACHE_POLICY(storage_cache_policy_option)
index_column_group_option:
WITH COLUMN GROUP([all columns, ]each column)
storage_cache_policy_option:
GLOBAL = {"hot" | "auto" | "cold" | "none"}
| timeline_strategy_list
timeline_strategy_list:
BOUNDARY_COLUMN = column_name
| BOUNDARY_COLUMN_UNIT = {"s" | "ms"}
| HOT_RETENTION = intnum retention_time_unit
| MIXED_RETENTION = intnum retention_time_unit
| GRANULARITY = {"partition" | "block"}
retention_time_unit:
YEAR
| MONTH
| WEEK
| DAY
| HOUR
| MINUTE
Search index (Search Index) syntax
When index_type is specified as SEARCH, you can use the following syntax to create a search index:
CREATE SEARCH INDEX index_name ON table_name (
search_index_column
[, search_index_column ...]
) [LOCAL];
search_index_column:
column_name [WITH (search_col_option_list)]
search_col_option_list:
[INCLUDE_PATHS = ('path1'[, 'path2']...)]
[| EXCLUDE_PATHS = ('path1'[, 'path2']...)]
[| INCLUDE_TYPES = (type1[, type2]...)]
In addition to CREATE INDEX, you can also create a search index by using the CREATE TABLE or ALTER TABLE statement.
In the preceding two methods, the search_index_column and WITH configurations have the same semantics as those in CREATE SEARCH INDEX. For more information, see CREATE TABLE and ALTER TABLE.
Parameters
Common index parameters
| Parameter | Description |
|---|---|
| hint_options | The hint options. This parameter is optional. Currently, only the parallel option is supported. For example, /*+ parallel(10) */. |
| index_type | The type of the index. This parameter is optional. For more information about the index types, see index_type. |
| IF NOT EXISTS | If specified, the index will not be created if it already exists. If the index already exists and this option is not specified, an error will be returned. |
| index_name | The name of the index to be created. |
| USING BTREE | The index type. This parameter is optional. For more information, see USING BTREE. |
| table_name | The name of the table on which the index is to be created. You can specify the table name directly or use the schema_name.table_name format to specify the table name and the name of the database to which the table belongs. |
| sort_column_key | The key of the sort column. You can specify multiple sort columns. Separate the sort columns with commas (,). For more information, see sort_column_key. |
| WITH PARSER tokenizer_option | The tokenizer for the full-text index. This parameter is optional. For more information, see tokenizer_option. |
| PARSER_PROPERTIES[=](parser_properties_list) | The properties of the tokenizer. This parameter is optional. For more information, see parser_properties. |
| FTS_INDEX_TYPE [=] {MATCH | PHRASE_MATCH} | The type of the full-text index. This parameter is optional. MATCH specifies the term-matching index, which supports matching and scoring based on the BM25 algorithm. The stored content includes tokens, document IDs, term frequencies, and document lengths. PHRASE_MATCH specifies the phrase-matching index, which is a superset of the MATCH index and supports phrase queries more efficiently. The stored content also includes the position lists of the terms. This parameter takes effect only when used with a FULLTEXT index. |
| index_option | The index options. You can specify multiple index options. Separate the index options with spaces. For more information, see index_option. |
| partition_option | The options for creating index partitions. You can specify hash partitions, key partitions, range partitions, list partitions, or external table partitions. |
| index_column_group_option | The index options. For more information, see index_column_group_option. |
| STORAGE_CACHE_POLICY | The cache policy for index hotspots in shared storage mode. For more information, see storage_cache_policy . |
| LOCAL | Indicates that the index is a local index. This parameter is optional. |
Search index parameters
| Parameter | Description | Value range |
|---|---|---|
| search_index_column | The definition of the search index column. You can specify one or more columns. For each column, you can specify the column-level path and type options by using WITH (...). |
|
| INCLUDE_PATHS / EXCLUDE_PATHS | The column-level path whitelist or blacklist. You can specify only one of these two parameters. This parameter is used to control the JSON paths that are included in the index. By default, all paths are included. The paths can be configured in the following two ways:
|
|
| INCLUDE_TYPES | The set of data types to be included in the index. The value of this parameter depends on the supported data types in the actual version. For example, INCLUDE_TYPES = (JSON_STRING) specifies that only the JSON_STRING data type is included in the index. By default, all data types are included in the index. |
JSON_STRING, JSON_NUMBER |
Data types supported by search indexes
A search index can be created on the following data types:
| Category | Supported data types |
|---|---|
| Integer types | TINYINT, SMALLINT, MEDIUMINT, INT, BIGINT, and their UNSIGNED variants |
| Floating-point and fixed-point types | FLOAT, DOUBLE, DECIMAL, and NUMBER |
| String types | VARCHAR and CHAR |
| Text types | TINYTEXT, TEXT, MEDIUMTEXT, and LONGTEXT |
| Binary types | BINARY and VARBINARY |
| Binary large objects | TINYBLOB, BLOB, MEDIUMBLOB, and LONGBLOB |
| Date and time types | DATE, TIME, DATETIME, TIMESTAMP, and YEAR |
| JSON types | JSON |
| Array types | ARRAY(element_type) — Only single-level arrays are supported, such as ARRAY(INT) and ARRAY(VARCHAR(256)) |
The BIT, ENUM, and SET data types are not supported for search indexes.
index_type
SPATIAL: specifies a spatial index.UNIQUE: specifies a unique index.FULLTEXT: specifies a full-text index.SEARCH: specifies a search index.
For more information about creating indexes of different types, see Create an index.
sort_column_key
column_name [(integer)] [ASC] [ID id]: specifies a column name as the sort key.column_name: specifies the column name to be sorted.integer: an optional parameter that specifies the maximum length of the sort key.ASC: an optional parameter that specifies ascending order. Descending order (DESC) is not supported.ID id: an optional parameter that specifies the ID of the sort key.
For example, create an index named
index3on thet3table and specify that the index is sorted in ascending order on thec1column.CREATE INDEX index3 ON t3 (c1 ASC);(index_expr) [ASC] [ID id]: specifies an index expression as the sort key. An index expression can be defined by using an expression or a function. It can contain the following options:(index_expr): specifies the index expression, which can be a Boolean expression, such asc1=c1. OceanBase Database does not support creating function indexes on generated columns. For more information about the expressions supported by function indexes, see Supported system functions for function indexes.ASC: an optional parameter that specifies ascending order. Descending order (DESC) is not supported.ID id: an optional parameter that specifies the ID of the sort key.
For example, create an index named
index4on thet4table, specifyc1+c2as the index expression, and specify ascending order.CREATE INDEX index4 ON t4 ((c1 + c2) ASC);
You can specify multiple sort columns when creating an index. Separate multiple sort columns with commas (,). For example, create an index named index5 on the t5 table and specify c1 and c2+c3 as the sort keys.
CREATE INDEX index5 ON t5 (c1, (c2+c3));
tokenizer_option
SPACE: the default value. Specifies that the text is split by spaces. You can specify the following properties:Property Value Range min_token_size [1, 16] max_token_size [10, 84] NGRAM: specifies the N-Gram (Chinese) tokenizer. You can specify the following properties:Property Value Range ngram_token_size [1, 10] NGRAM2: specifies the tokenizer that splits the text into continuous characters ranging frommin_ngram_sizetomax_ngram_size. You can specify the following properties:Property Value Range min_ngram_size [1, 16] max_ngram_size [1, 16] BENG: specifies the Beng (basic English) tokenizer. You can specify the following properties:Property Value Range min_token_size [1, 16] max_token_size [10, 84] IK: specifies the IK (Chinese) tokenizer. Only theutf-8character set is supported. You can specify the following properties:Property Value Range ik_mode smartmax_word
You can use the TOKENIZE function to view the tokenization result of the text in the specified tokenizer and JSON format parameters.
parser_properties
min_token_size: specifies the minimum token length. The default value is 3, and the value ranges from 1 to 16.max_token_size: specifies the maximum token length. The default value is 84, and the value ranges from 10 to 84.ngram_token_size: specifies the token length for theNGRAMtokenizer. This parameter is valid only for theNGRAMtokenizer. The default value is 2, and the value ranges from 1 to 10.ik_mode: specifies the tokenization mode for theIKtokenizer. Valid values:smart: the default value. The dictionary is used to improve the accuracy of tokenization. The boundaries of the dictionary words are prioritized, which may reduce unnecessary expansions.max_word: the dictionary is used to identify the words, but it does not affect the maximum expansion of tokenization. Even if the dictionary is defined, themax_wordmode still tries to split the text into more words.
Here is an example:
Create a table named
tbl1.CREATE TABLE tbl1 (col1 VARCHAR(200), col2 TEXT);Create a full-text index on the
col2column of thetbl1table. Specify theIKtokenizer and set the tokenizer properties by using thePARSER_PROPERTIESclause.CREATE FULLTEXT INDEX fidx_tbl1 ON tbl1(col2) WITH PARSER IK PARSER_PROPERTIES=(ik_mode='max_word');
index_option
GLOBAL: specifies that the index is a global index.LOCAL: specifies that the index is a local index.BLOCK_SIZE integer: specifies the size of the index block, that is, the number of bytes in each index block.COMMENT STRING_VALUE: specifies a comment for the index.STORING (column_name [, column_name...]): specifies the columns to be stored in the index. Separate multiple columns with commas (,).WITH_ROWID: specifies that the index contains the row identifier.WITH PARSER STRING_VALUE: specifies the parser required for the index.index_using_algorithm: specifies the algorithm used for the index. You can specifyUSING BTREEorUSING HASH. However, you cannot create a multivalued index by usingUSING HASH.visibility_option: specifies the visibility of the index.DATA_TABLE_ID data_table_id: specifies the ID of the data table to which the index belongs.INDEX_TABLE_ID index_table_id: specifies the ID of the index table.VIRTUAL_COLUMN_ID virtual_column_id: specifies the ID of the virtual column.MAX_USED_PART_ID used_part_id: specifies the maximum partition ID used by the index.
index_column_group_option
WITH COLUMN GROUP(all columns, each column): specifies that the index is a rowstore-redundant index.WITH COLUMN GROUP(each column): specifies that the index is a columnstore index.
storage_cache_policy
Notice
You can specify the cold and hot storage strategy (STORAGE_CACHE_POLICY) when you create an index only in shared storage mode.
You can configure the cold and hot storage strategy for an index table.
For a local index, you can specify the
STORAGE_CACHE_POLICYat the table level. If you do not specify theSTORAGE_CACHE_POLICYat the table level, the default value isnone, which indicates that the cold and hot storage strategy for the index partitions follows that of the data table partitions.For a global index, you can specify the
STORAGE_CACHE_POLICYat the table level. If the index table is partitioned, you can also specify theSTORAGE_CACHE_POLICYat the partition level. If you do not specify theSTORAGE_CACHE_POLICYat the table level, the default value isnone.
The storage_cache_policy_option parameter defines different attributes in the key-value format. The semantics of each attribute are as follows:
GLOBAL = {"hot" | "auto" | "cold" | "none"}: specifies the cold and hot storage strategy for the index data. Valid values:hot: specifies the index as a hot index. If the cache disk space is sufficient, all data in the index table is cached to the local cloud disk.auto: specifies that the system automatically identifies the hot data in the index table.cold: specifies the data in the index table as cold data. The data is not cached to the local cache disk. For data that has been persisted to the cache, the system actively triggers cache eviction. Subsequent access to the table data can be stored in the macroblock memory cache, without being persisted to the macroblock cache and microblock cache.none: the default value. The strategy for the index follows that of the primary tableSTORAGE_CACHE_POLICY.
timeline_strategy_list: specifies the list of time-axis strategy parameters. Parameters are separated by commas. The time-axis strategy determines whether partition or macroblock data is hot or cold based on time. The system automatically adjusts the data on the local cache disk based on the configured strategy.BOUNDARY_COLUMN = column_name: specifies the column used to determine the hot data. Supported data types are integer types (BIGINTorINT, in Unix timestamp format) and time types (TIMESTAMP,DATE,DATETIME, orYEAR). IfBOUNDARY_COLUMNis an integer type, the partitioning method must be Range or Range Columns. IfBOUNDARY_COLUMNis a time type, the partitioning method must be Range Columns.BOUNDARY_COLUMN_UNIT = {"s" | "us" | "ms"}: specifies the time unit for theBOUNDARY_COLUMNparameter. This parameter is valid only whenBOUNDARY_COLUMNis an integer type. Valid values:- When the partitioning column is
INT,BOUNDARY_COLUMN_UNITcan only bes. - When the partitioning column is
BIGINT,BOUNDARY_COLUMN_UNITcan besorms.
- When the partitioning column is
HOT_RETENTION = intnum retention_time_unit: specifies the time range for hot data. You can specify the time unit asMINUTE,HOUR,DAY,WEEK,MONTH, orYEAR.MIXED_RETENTION = intnum retention_time_unit:- If you set
MIXED_RETENTION, data beyondHOT_RETENTION + MIXED_RETENTIONis consideredcolddata. - If you do not set
MIXED_RETENTION, data beyondHOT_RETENTIONis consideredautodata. - If you do not set
HOT_RETENTION, data in the range [0, MIXED_RETENTION] is consideredautodata, and data beyondMIXED_RETENTIONis consideredcolddata.
- If you set
GRANULARITY = {"partition" | "block"}: optional. Specifies the granularity for the cold and hot storage strategy. Valid values:partition: specifies that the cache strategy is applied at the partition level.block: specifies that the cache strategy is applied at the macroblock level.Note
If you do not partition the table and want to manually specify a period of time as hot data, you can set
block. For more information about the macroblock-level cold and hot storage strategy, see Create a macroblock-level cold and hot storage strategy.
Examples
Example 1
Run the following SQL statement to create a columnstore index for a table.
Run the following SQL statement to create a table named
test_tbl1.CREATE TABLE test_tbl1 (col1 INT, col2 VARCHAR(50));Run the following SQL statement to create a columnstore index named
idx1_test_tbl1for thetest_tbl1table, and specify thecol1column as the indexed column.CREATE INDEX idx1_test_tbl1 ON test_tbl1 (col1) WITH COLUMN GROUP(each column);
Example 2
In shared storage mode, run the following SQL statement to create a cold-hot storage strategy for an index table.
When you create an index table, specify the data of the index table
idx1_tbl1as hot data.CREATE INDEX idx1_tbl1 ON tbl1 (c1) LOCAL STORAGE_CACHE_POLICY (GLOBAL = "hot");Create a table named
tbl2and specify the data of the index tableidx2on thetbl2table as hot data.CREATE TABLE tbl2 (col1 INT, col2 INT, INDEX idx2 ((col1 + 1)) STORAGE_CACHE_POLICY (GLOBAL = "hot"), UNIQUE KEY ((col1 + col2)));When you create a global index, configure the cache strategy for the global index partitions and the time axis strategy.
CREATE INDEX idx_global ON tbl1 (c1) GLOBAL PARTITION BY RANGE (c1) (PARTITION p1 VALUES LESS THAN (100), PARTITION p2 VALUES LESS THAN (200), PARTITION p3 VALUES LESS THAN MAXVALUE);
Example 3
Run the following SQL statement to create a search index.
Create a table named
tbl.CREATE TABLE tbl (col1 JSON, col2 DECIMAL(10,2), col3 JSON, col4 JSON, col5 JSON, col6 varchar(512));Create a search index named
search_idx.CREATE SEARCH INDEX search_idx ON tbl( -- path must be replaced with the actual path. col1 WITH (INCLUDE_PATHS = ('$.path3', '$.path4')), col2 WITH (EXCLUDE_PATHS = ('$.path1', '$.path3')), col3 WITH (INCLUDE_TYPES = (JSON_STRING)), col4 );
