Purpose
This statement is used to create an index. An index is a structure created on a table that sorts the values of one or more columns in the database table. Its main purpose is to improve query speed and reduce the performance overhead of the database system.
OceanBase Database currently supports unique and non-unique indexes. When creating an index, you can use the STORING(column_name,...) clause to specify redundant columns in the index table. In addition to the columns specified by the user, the index table will automatically include the primary key columns of the primary table (or the hidden primary key columns if no primary key exists). Storing additional columns in the index provides the database optimizer with more options.
Limitations and considerations
- If all indexed columns are deleted from a table, the indexes created on the table become invalid.
- Full-text indexes can be created only on columns of the
CHAR,VARCHAR, andTEXTdata types. - Only local (
LOCAL) full-text indexes can be created in the current version. - The
UNIQUEkeyword cannot be specified when creating a full-text index. - When creating a full-text index that involves multiple columns, ensure that the columns have the same character set.
- The
WITH PARSER {SPACE | NGRAM | BENG | IK}option can be specified only when creating a full-text index.
Privilege requirements
To execute the CREATE INDEX statement, the current user must have at least the INDEX privilege on the corresponding object. For more information about the privileges of OceanBase Database, see Privilege types in MySQL-compatible mode.
Syntax
CREATE [hint_options] [index_type] INDEX [IF NOT EXISTS] index_name
[USING BTREE] ON table_name (sort_column_key [, sort_column_key... ])
[WITH PARSER tokenizer_option]
[PARSER_PROPERTIES[=](parser_properties_list)]
[index_option...] [partition_option] [index_column_group_option]
index_type:
SPATIAL
| UNIQUE
| FULLTEXT
sort_column_key:
column_name [(integer)] [ASC] [ID id]
| (index_expr) [ASC] [ID id]
tokenizer_option:
SPACE
| NGRAM
| BENG
| IK
| NGRAM2
parser_properties_list:
parser_properties, [parser_properties]
parser_properties:
min_token_size = int_value
| max_token_size = int_value
| ngram_token_size = int_value
| ik_mode = 'char_value'
| min_ngram_size = int_value
| max_ngram_size = int_value
index_option:
GLOBAL
| LOCAL
| BLOCK_SIZE integer
| COMMENT STRING_VALUE
| STORING (column_name [, column_name...])
| WITH_ROWID
| WITH PARSER STRING_VALUE
| index_using_algorithm
| visibility_option
| DATA_TABLE_ID data_table_id
| INDEX_TABLE_ID index_table_id
| VIRTUAL_COLUMN_ID virtual_column_id
| MAX_USED_PART_ID used_part_id
| STORAGE_CACHE_POLICY(storage_cache_policy_option)
index_column_group_option:
WITH COLUMN GROUP([all columns, ]each column)
storage_cache_policy_option:
GLOBAL = {"hot"|"auto"|"none"}
| timeline_strategy_list
timeline_strategy_list:
BOUNDARY_COLUMN = column_name
[, BOUNDARY_COLUMN_UNIT = {"s"|"ms"}]
[, HOT_RETENTION = intnum retention_time_unit]
retention_time_unit:
YEAR
| MONTH
| WEEK
| DAY
| HOUR
| MINUTE
Parameters
| Parameter | Description |
|---|---|
| hint_options | The hint options. This parameter is optional. Currently, only the parallel option is supported. For example, /*+ parallel(10) */. |
| index_type | The type of the index. This parameter is optional. For more information, see index_type. |
| IF NOT EXISTS | Indicates whether to skip index creation if the index already exists. If the index exists and the IF NOT EXISTS option is not specified, an error is returned. |
| index_name | The name of the index to be created. |
| USING BTREE | The index structure to be used. This parameter is optional. For more information, see USING BTREE. |
| table_name | The name of the table on which to create the index. You can directly specify the table name or use the schema_name.table_name format to specify the table name and the name of the database to which the table belongs. |
| sort_column_key | The key of the sort column. You can specify multiple sort columns when you create an index. Separate the sort columns with commas (,). For more information, see sort_column_key. |
| WITH PARSER tokenizer_option | The tokenizer to be used for a full-text index. This parameter is optional. For more information, see tokenizer_option. |
| PARSER_PROPERTIES[=](parser_properties_list) | The properties of the tokenizer. This parameter is optional. For more information, see parser_properties. |
| index_option | The index options. You can specify multiple index options when you create an index. Separate the index options with spaces. For more information, see index_option. |
| index_column_group_option | The index options. For more information, see index_column_group_option. |
| STORAGE_CACHE_POLICY | The cache strategy for index hotspots in shared storage mode. For more information, see storage_cache_policy . |
index_type
SPATIAL: specifies the creation of a spatial index.UNIQUE: specifies the creation of a unique index.FULLTEXT: specifies the creation of a full-text index.
For more information about creating indexes, see Create an index.
sort_column_key
column_name [(integer)] [ASC] [ID id]: specifies a column name as a sort key.column_name: the name of the column to be sorted.integer: an optional parameter that specifies the maximum length of the sort key.ASC: an optional parameter that specifies ascending order. Descending order (DESC) is not supported.ID id: an optional parameter that specifies the ID of the sort key.
For example, create an index named
index3on thec1column of thet3table, and sort the index in ascending order.CREATE INDEX index3 ON t3 (c1 ASC);(index_expr) [ASC] [ID id]: specifies an index expression as a sort key. An index expression can be defined using an expression or a function. It can include the following options:(index_expr): the index expression, which can be a boolean expression, such asc1=c1. OceanBase Database does not support creating function indexes on generated columns. For information about the expressions supported by function indexes, see Supported system functions for function indexes.ASC: an optional parameter that specifies ascending order. Descending order (DESC) is not supported.ID id: an optional parameter that specifies the ID of the sort key.
For example, create an index named
index4on thet4table, specifyc1+c2as the index expression, and sort the index in ascending order.CREATE INDEX index4 ON t4 ((c1 + c2) ASC);
You can specify multiple sort columns when creating an index. Separate multiple sort columns with commas (,). For example, create an index named index5 on the t5 table, and specify c1 and c2+c3 as the sort keys.
CREATE INDEX index5 ON t5 (c1, (c2+c3));
tokenizer_option
SPACE: the default value. Specifies the use of space-based tokenization. You can specify the following parameters:Parameter Value range min_token_size [1, 16] max_token_size [10, 84] NGRAM: specifies the use of N-gram-based tokenization for Chinese text. You can specify the following parameters:Parameter Value range ngram_token_size [1, 10] NGRAM2: specifies the use of N-gram-based tokenization for text. You can specify the following parameters:Parameter Value range min_ngram_size [1, 16] max_ngram_size [1, 16] BENG: specifies the use of basic English-based tokenization. You can specify the following parameters:Parameter Value range min_token_size [1, 16] max_token_size [10, 84] IK: specifies the use of IK-based tokenization for Chinese text. Only theutf-8character set is supported. You can specify the following parameter:Parameter Value range ik_mode smartmax_word
You can use the TOKENIZE function to view the tokenization results of text based on the specified tokenizer and JSON-formatted parameters.
parser_properties
min_token_size: the minimum token length. Default value: 3. Value range: 1 to 16.max_token_size: the maximum token length. Default value: 84. Value range: 10 to 84.ngram_token_size: the token length for theNGRAMtokenizer. This parameter is valid only for theNGRAMtokenizer. Default value: 2. Value range: 1 to 10.ik_mode: the tokenization mode for theIKtokenizer. Valid values:smart: the default value. The dictionary is used to improve tokenization accuracy. The boundaries of dictionary words are prioritized, which may reduce unnecessary extensions.max_word: the dictionary is used to identify words, but it does not affect the maximum extension of tokenization. Even if the dictionary defines words, themax_wordmode will still attempt to split the text into more words.
Here is an example:
Create a table named
tbl1.CREATE TABLE tbl1 (col1 VARCHAR(200), col2 TEXT);Create a full-text index on the
col2column of thetbl1table, specify theIKtokenizer, and set the tokenizer properties usingPARSER_PROPERTIES.CREATE FULLTEXT INDEX fidx_tbl1 ON tbl1(col2) WITH PARSER IK PARSER_PROPERTIES=(ik_mode='max_word');
index_option
GLOBAL: specifies to create a global index.LOCAL: specifies to create a local index.BLOCK_SIZE integer: specifies the size of an index block, that is, the number of bytes in each index block.COMMENT STRING_VALUE: specifies a comment for the index.STORING (column_name [, column_name...]): specifies the columns to be stored in the index. Separate multiple columns with commas (,).WITH_ROWID: specifies to create an index that contains row identifiers.WITH PARSER STRING_VALUE: specifies the parser required for the index.index_using_algorithm: specifies the algorithm used for the index. You can specify USING BTREE or USING HASH. However, you cannot create a multivalued index by using the USING HASH option.visibility_option: specifies the visibility of the index.DATA_TABLE_ID data_table_id: specifies the data table ID to which the index belongs.INDEX_TABLE_ID index_table_id: specifies the index table ID.VIRTUAL_COLUMN_ID virtual_column_id: specifies the virtual column ID.MAX_USED_PART_ID used_part_id: specifies the maximum partition ID used by the index.
index_column_group_option
WITH COLUMN GROUP(all columns, each column): specifies to create a rowstore columnstore redundant index.WITH COLUMN GROUP(each column): specifies to create a columnstore index.
storage_cache_policy
Notice
You can specify the hot cache strategy (STORAGE_CACHE_POLICY) when you create an index only in shared storage mode.
You can configure the hot cache strategy for an index table.
For a local index, you can specify the table-level
STORAGE_CACHE_POLICY. If you do not specify the table-levelSTORAGE_CACHE_POLICY, the default value isnone, which indicates that the hot cache strategy of the local index partitions follows that of the data table partitions.For a global index, you can specify the table-level
STORAGE_CACHE_POLICYand the partition-levelSTORAGE_CACHE_POLICYof the index table if the index table is partitioned. If you do not specify the table-levelSTORAGE_CACHE_POLICY, the default value isnone.
The following example creates a local index idx1_tbl1 on the c1 column of the tbl1 table in the Key-Value format. The storage_cache_policy_option specifies the hot cache strategy of the index. The semantics of each attribute are as follows:
GLOBAL = {"hot" | "auto" | "none"}: specifies whether the data of the index is hot data. Valid values:hot: specifies that the index is stored in hot storage. If the cache disk space is sufficient, all data in the index table will be cached to the local cloud disk.auto: specifies that the system automatically identifies hot data in the index table.none: the default value. This value indicates that the hot cache strategy of the index follows that of the primary tableSTORAGE_CACHE_POLICY.
timeline_strategy_list: specifies the time axis strategy parameters. Separate the parameters with commas (,). The hot cache time axis strategy supports a mechanism to determine whether partition data is hot data based on time. The system automatically adjusts the partition data on the local cache disk based on the configured strategy. This parameter is supported only for global indexes.BOUNDARY_COLUMN = column_name: specifies the column used to determine hot data. Supported types include integers (BIGINTorINT, in the Unix timestamp format) and time types (TIMESTAMP,DATE,DATETIME, orYEAR). IfBOUNDARY_COLUMNis an integer, the table can be partitioned by range or range columns. IfBOUNDARY_COLUMNis a time type, the table can be partitioned only by range columns.BOUNDARY_COLUMN_UNIT = {"s" | "us" | "ms"}: specifies the time unit of theBOUNDARY_COLUMNparameter. This parameter is supported only whenBOUNDARY_COLUMNis an integer. Valid values:- If the partition column is of the
INTtype,BOUNDARY_COLUMN_UNITcan only bes. - If the partition column is of the
BIGINTtype,BOUNDARY_COLUMN_UNITcan besorms.
- If the partition column is of the
HOT_RETENTION = intnum retention_time_unit: specifies the time range for hot data. You can specify the time range inMINUTE,HOUR,DAY,WEEK,MONTH, orYEAR.
Examples
Example 1
Run the following SQL statements to create a columnstore index for a table.
Run the following SQL statement to create a table named
test_tbl1.CREATE TABLE test_tbl1 (col1 INT, col2 VARCHAR(50));Run the following SQL statement to create a columnstore index named
idx1_test_tbl1for thetest_tbl1table, and specify thecol1column as the reference column.CREATE INDEX idx1_test_tbl1 ON test_tbl1 (col1) WITH COLUMN GROUP(each column);
Example 2
In shared storage mode, run the following SQL statements to configure a hot data caching strategy for an index table.
Create an index table named
idx1_tbl1and specify its data as hot data.CREATE INDEX idx1_tbl1 ON tbl1 (c1) LOCAL STORAGE_CACHE_POLICY (GLOBAL = "hot");Create a table named
tbl2, and configure the data of the index table namedidx2on thetbl2table as hot data.CREATE TABLE tbl2 (col1 INT, col2 INT, INDEX idx2 ((col1 + 1)) STORAGE_CACHE_POLICY (GLOBAL = "hot"), UNIQUE KEY ((col1 + col2)));Configure the caching strategy for global index partitions and the time axis strategy when you create a global index.
CREATE INDEX idx_global ON tbl1 (c1) GLOBAL PARTITION BY RANGE (c1) (PARTITION p1 VALUES LESS THAN (100), PARTITION p2 VALUES LESS THAN (200), PARTITION p3 VALUES LESS THAN MAXVALUE);