CREATE INDEX|V4.6.0|OceanBase Database| docs|Distributed Database

CREATE INDEX

Last Updated：2026-05-07 11:26:25 Updated

Purpose

This statement is used to create an index. An index is a structure created on a table that sorts the values of one or more columns in the table. Its main purpose is to improve query speed and reduce the performance overhead of the database system.

OceanBase Database supports unique and non-unique indexes. When creating an index, you can use the STORING(column_name,...) clause to specify redundant columns in the index table. In addition to the user-specified columns, the index table automatically includes the primary key columns of the primary table (or the hidden primary key if the table has no primary key). Storing additional columns in the index provides the database optimizer with more options.

Limitations and considerations

If all indexed columns are deleted from a table, the indexes created on the table become invalid.
Full-text indexes can be created only on CHAR, VARCHAR, and TEXT columns.
Search indexes can be created on JSON, single-level array, and various scalar type columns. For more information, see Data types supported by search indexes.
Only local (LOCAL) full-text or search indexes can be created in the current version.
The UNIQUE keyword cannot be specified when you create a full-text index.
When you create a full-text index that involves multiple columns, the columns must have the same character set.
The WITH PARSER {SPACE | NGRAM | BENG | IK} clause can be specified only when you create a full-text index.

Privilege requirements

To create an index by executing the CREATE INDEX statement, the current user must have the INDEX privilege on the corresponding object. For more information about the privileges of OceanBase Database, see Privilege classification in MySQL mode.

Syntax

General index syntax

CREATE [hint_options] [index_type] INDEX [IF NOT EXISTS] index_name
    [USING BTREE] ON table_name (sort_column_key [, sort_column_key... ])
    [WITH PARSER tokenizer_option]
    [PARSER_PROPERTIES[=](parser_properties_list)]
    [FTS_INDEX_TYPE [=] {MATCH | PHRASE_MATCH}]
    [index_option...] [partition_option] [index_column_group_option]

index_type:
    SPATIAL
    | UNIQUE
    | FULLTEXT
    | SEARCH -- For specific syntax, see the syntax for search indexes (Search Index) below.

sort_column_key:
      column_name [(integer)] [ASC] [ID id]
    | (index_expr) [ASC] [ID id]

tokenizer_option:
    SPACE
    | NGRAM
    | BENG
    | IK
    | NGRAM2

parser_properties_list:
    parser_properties, [parser_properties]

parser_properties:
    min_token_size = int_value
    | max_token_size = int_value
    | ngram_token_size = int_value
    | ik_mode = 'char_value'
    | min_ngram_size = int_value
    | max_ngram_size = int_value

index_option:
      GLOBAL
    | LOCAL
    | BLOCK_SIZE integer
    | COMMENT STRING_VALUE
    | STORING (column_name [, column_name...])
    | WITH_ROWID
    | WITH PARSER STRING_VALUE
    | index_using_algorithm
    | visibility_option
    | DATA_TABLE_ID data_table_id
    | INDEX_TABLE_ID index_table_id
    | VIRTUAL_COLUMN_ID virtual_column_id
    | MAX_USED_PART_ID used_part_id
    | STORAGE_CACHE_POLICY(storage_cache_policy_option)

index_column_group_option:
    WITH COLUMN GROUP([all columns, ]each column)

storage_cache_policy_option:
    GLOBAL = {"hot" | "auto" | "cold" | "none"}
    | timeline_strategy_list

timeline_strategy_list:
    BOUNDARY_COLUMN = column_name
    | BOUNDARY_COLUMN_UNIT = {"s" | "ms"}
    | HOT_RETENTION = intnum retention_time_unit
    | MIXED_RETENTION = intnum retention_time_unit
    | GRANULARITY = {"partition" | "block"}

retention_time_unit:
    YEAR
    | MONTH
    | WEEK
    | DAY
    | HOUR
    | MINUTE

Search index (Search Index) syntax

When index_type is specified as SEARCH, you can use the following syntax to create a search index:

CREATE SEARCH INDEX index_name ON table_name (
    search_index_column
    [, search_index_column ...]
) [LOCAL];

search_index_column:
    column_name [WITH (search_col_option_list)]

search_col_option_list:
    [INCLUDE_PATHS = ('path1'[, 'path2']...)]
    [| EXCLUDE_PATHS = ('path1'[, 'path2']...)]
    [| INCLUDE_TYPES = (type1[, type2]...)]

In addition to CREATE INDEX, you can also create a search index by using the CREATE TABLE or ALTER TABLE statement.

In the preceding two methods, the search_index_column and WITH configurations have the same semantics as those in CREATE SEARCH INDEX. For more information, see CREATE TABLE and ALTER TABLE.

Parameters

Common index parameters

Parameter	Description
hint_options	The hint options. This parameter is optional. Currently, only the parallel option is supported. For example, `/+ parallel(10) /`.
index_type	The type of the index. This parameter is optional. For more information about the index types, see index_type.
IF NOT EXISTS	If specified, the index will not be created if it already exists. If the index already exists and this option is not specified, an error will be returned.
index_name	The name of the index to be created.
USING BTREE	The index type. This parameter is optional. For more information, see USING BTREE.
table_name	The name of the table on which the index is to be created. You can specify the table name directly or use the `schema_name.table_name` format to specify the table name and the name of the database to which the table belongs.
sort_column_key	The key of the sort column. You can specify multiple sort columns. Separate the sort columns with commas (,). For more information, see sort_column_key.
WITH PARSER tokenizer_option	The tokenizer for the full-text index. This parameter is optional. For more information, see tokenizer_option.
PARSER_PROPERTIES[=](parser_properties_list)	The properties of the tokenizer. This parameter is optional. For more information, see parser_properties.
FTS_INDEX_TYPE [=] {MATCH \| PHRASE_MATCH}	The type of the full-text index. This parameter is optional. `MATCH` specifies the term-matching index, which supports matching and scoring based on the BM25 algorithm. The stored content includes tokens, document IDs, term frequencies, and document lengths. `PHRASE_MATCH` specifies the phrase-matching index, which is a superset of the `MATCH` index and supports phrase queries more efficiently. The stored content also includes the position lists of the terms. This parameter takes effect only when used with a `FULLTEXT` index.
index_option	The index options. You can specify multiple index options. Separate the index options with spaces. For more information, see index_option.
partition_option	The options for creating index partitions. You can specify hash partitions, key partitions, range partitions, list partitions, or external table partitions.
index_column_group_option	The index options. For more information, see index_column_group_option.
STORAGE_CACHE_POLICY	The cache policy for index hotspots in shared storage mode. For more information, see storage_cache_policy .
LOCAL	Indicates that the index is a local index. This parameter is optional.

Search index parameters

Parameter	Description	Value range
search_index_column	The definition of the search index column. You can specify one or more columns. For each column, you can specify the column-level path and type options by using `WITH (...)`.
INCLUDE_PATHS / EXCLUDE_PATHS	The column-level path whitelist or blacklist. You can specify only one of these two parameters. This parameter is used to control the JSON paths that are included in the index. By default, all paths are included. The paths can be configured in the following two ways: Simple deterministic paths. For example, `$.key` and `$.a`. In this case, only scalar and array data under `$.key` and `$.a` are indexed, and data under `$.a.b` is not indexed. Prefix-matching paths. For example, `$.a.`. In this case, JSON data under `$.a` is indexed. For example, data under `$.a.b` is indexed. `$.` matches all paths.
INCLUDE_TYPES	The set of data types to be included in the index. The value of this parameter depends on the supported data types in the actual version. For example, `INCLUDE_TYPES = (JSON_STRING)` specifies that only the JSON_STRING data type is included in the index. By default, all data types are included in the index.	JSON_STRING, JSON_NUMBER

Data types supported by search indexes

A search index can be created on the following data types:

Category	Supported data types
Integer types	TINYINT, SMALLINT, MEDIUMINT, INT, BIGINT, and their UNSIGNED variants
Floating-point and fixed-point types	FLOAT, DOUBLE, DECIMAL, and NUMBER
String types	VARCHAR and CHAR
Text types	TINYTEXT, TEXT, MEDIUMTEXT, and LONGTEXT
Binary types	BINARY and VARBINARY
Binary large objects	TINYBLOB, BLOB, MEDIUMBLOB, and LONGBLOB
Date and time types	DATE, TIME, DATETIME, TIMESTAMP, and YEAR
JSON types	JSON
Array types	ARRAY(element_type) — Only single-level arrays are supported, such as ARRAY(INT) and ARRAY(VARCHAR(256))

The BIT, ENUM, and SET data types are not supported for search indexes.

index_type

SPATIAL: specifies a spatial index.
UNIQUE: specifies a unique index.
FULLTEXT: specifies a full-text index.
SEARCH: specifies a search index.

For more information about creating indexes of different types, see Create an index.

sort_column_key

column_name [(integer)] [ASC] [ID id]: specifies a column name as the sort key.
- column_name: specifies the column name to be sorted.
- integer: an optional parameter that specifies the maximum length of the sort key.
- ASC: an optional parameter that specifies ascending order. Descending order (DESC) is not supported.
- ID id: an optional parameter that specifies the ID of the sort key.
For example, create an index named index3 on the t3 table and specify that the index is sorted in ascending order on the c1 column.
```
CREATE INDEX index3 ON t3 (c1 ASC);
```
(index_expr) [ASC] [ID id]: specifies an index expression as the sort key. An index expression can be defined by using an expression or a function. It can contain the following options:
- (index_expr): specifies the index expression, which can be a Boolean expression, such as c1=c1. OceanBase Database does not support creating function indexes on generated columns. For more information about the expressions supported by function indexes, see Supported system functions for function indexes.
- ASC: an optional parameter that specifies ascending order. Descending order (DESC) is not supported.
- ID id: an optional parameter that specifies the ID of the sort key.
For example, create an index named index4 on the t4 table, specify c1+c2 as the index expression, and specify ascending order.
```
CREATE INDEX index4 ON t4 ((c1 + c2) ASC);
```

You can specify multiple sort columns when creating an index. Separate multiple sort columns with commas (,). For example, create an index named index5 on the t5 table and specify c1 and c2+c3 as the sort keys.

CREATE INDEX index5 ON t5 (c1, (c2+c3));

tokenizer_option

SPACE: the default value. Specifies that the text is split by spaces. You can specify the following properties:

Property

Value Range

min_token_size [1, 16]

max_token_size [10, 84]
NGRAM: specifies the N-Gram (Chinese) tokenizer. You can specify the following properties:

Property

Value Range

ngram_token_size [1, 10]
NGRAM2: specifies the tokenizer that splits the text into continuous characters ranging from min_ngram_size to max_ngram_size. You can specify the following properties:

Property

Value Range

min_ngram_size [1, 16]

max_ngram_size [1, 16]
BENG: specifies the Beng (basic English) tokenizer. You can specify the following properties:

Property

Value Range

min_token_size [1, 16]

max_token_size [10, 84]
IK: specifies the IK (Chinese) tokenizer. Only the utf-8 character set is supported. You can specify the following properties:
Property

Value Range
ik_mode

smart

max_word

Property	Value Range
min_token_size	[1, 16]
max_token_size	[10, 84]

Property	Value Range
ngram_token_size	[1, 10]

Property	Value Range
min_ngram_size	[1, 16]
max_ngram_size	[1, 16]

Property	Value Range
min_token_size	[1, 16]
max_token_size	[10, 84]

Property	Value Range
ik_mode	`smart` `max_word`

You can use the TOKENIZE function to view the tokenization result of the text in the specified tokenizer and JSON format parameters.

parser_properties

min_token_size: specifies the minimum token length. The default value is 3, and the value ranges from 1 to 16.
max_token_size: specifies the maximum token length. The default value is 84, and the value ranges from 10 to 84.
ngram_token_size: specifies the token length for the NGRAM tokenizer. This parameter is valid only for the NGRAM tokenizer. The default value is 2, and the value ranges from 1 to 10.
ik_mode: specifies the tokenization mode for the IK tokenizer. Valid values:
- smart: the default value. The dictionary is used to improve the accuracy of tokenization. The boundaries of the dictionary words are prioritized, which may reduce unnecessary expansions.
- max_word: the dictionary is used to identify the words, but it does not affect the maximum expansion of tokenization. Even if the dictionary is defined, the max_word mode still tries to split the text into more words.

Here is an example:

Create a table named tbl1.

CREATE TABLE tbl1 (col1 VARCHAR(200), col2 TEXT);

Create a full-text index on the col2 column of the tbl1 table. Specify the IK tokenizer and set the tokenizer properties by using the PARSER_PROPERTIES clause.
```
CREATE FULLTEXT INDEX fidx_tbl1 ON tbl1(col2) WITH PARSER IK PARSER_PROPERTIES=(ik_mode='max_word');
```

index_option

GLOBAL: specifies that the index is a global index.
LOCAL: specifies that the index is a local index.
BLOCK_SIZE integer: specifies the size of the index block, that is, the number of bytes in each index block.
COMMENT STRING_VALUE: specifies a comment for the index.
STORING (column_name [, column_name...]): specifies the columns to be stored in the index. Separate multiple columns with commas (,).
WITH_ROWID: specifies that the index contains the row identifier.
WITH PARSER STRING_VALUE: specifies the parser required for the index.
index_using_algorithm: specifies the algorithm used for the index. You can specify USING BTREE or USING HASH. However, you cannot create a multivalued index by using USING HASH.
visibility_option: specifies the visibility of the index.
DATA_TABLE_ID data_table_id: specifies the ID of the data table to which the index belongs.
INDEX_TABLE_ID index_table_id: specifies the ID of the index table.
VIRTUAL_COLUMN_ID virtual_column_id: specifies the ID of the virtual column.
MAX_USED_PART_ID used_part_id: specifies the maximum partition ID used by the index.

index_column_group_option

WITH COLUMN GROUP(all columns, each column): specifies that the index is a rowstore-redundant index.
WITH COLUMN GROUP(each column): specifies that the index is a columnstore index.

storage_cache_policy

Notice

You can specify the cold and hot storage strategy (STORAGE_CACHE_POLICY) when you create an index only in shared storage mode.

You can configure the cold and hot storage strategy for an index table.

For a local index, you can specify the STORAGE_CACHE_POLICY at the table level. If you do not specify the STORAGE_CACHE_POLICY at the table level, the default value is none, which indicates that the cold and hot storage strategy for the index partitions follows that of the data table partitions.
For a global index, you can specify the STORAGE_CACHE_POLICY at the table level. If the index table is partitioned, you can also specify the STORAGE_CACHE_POLICY at the partition level. If you do not specify the STORAGE_CACHE_POLICY at the table level, the default value is none.

The storage_cache_policy_option parameter defines different attributes in the key-value format. The semantics of each attribute are as follows:

GLOBAL = {"hot" | "auto" | "cold" | "none"}: specifies the cold and hot storage strategy for the index data. Valid values:
- hot: specifies the index as a hot index. If the cache disk space is sufficient, all data in the index table is cached to the local cloud disk.
- auto: specifies that the system automatically identifies the hot data in the index table.
- cold: specifies the data in the index table as cold data. The data is not cached to the local cache disk. For data that has been persisted to the cache, the system actively triggers cache eviction. Subsequent access to the table data can be stored in the macroblock memory cache, without being persisted to the macroblock cache and microblock cache.
- none: the default value. The strategy for the index follows that of the primary table STORAGE_CACHE_POLICY.
timeline_strategy_list: specifies the list of time-axis strategy parameters. Parameters are separated by commas. The time-axis strategy determines whether partition or macroblock data is hot or cold based on time. The system automatically adjusts the data on the local cache disk based on the configured strategy.
- BOUNDARY_COLUMN = column_name: specifies the column used to determine the hot data. Supported data types are integer types (BIGINT or INT, in Unix timestamp format) and time types (TIMESTAMP, DATE, DATETIME, or YEAR). If BOUNDARY_COLUMN is an integer type, the partitioning method must be Range or Range Columns. If BOUNDARY_COLUMN is a time type, the partitioning method must be Range Columns.
- BOUNDARY_COLUMN_UNIT = {"s" | "us" | "ms"}: specifies the time unit for the BOUNDARY_COLUMN parameter. This parameter is valid only when BOUNDARY_COLUMN is an integer type. Valid values:
  - When the partitioning column is INT, BOUNDARY_COLUMN_UNIT can only be s.
  - When the partitioning column is BIGINT, BOUNDARY_COLUMN_UNIT can be s or ms.
- HOT_RETENTION = intnum retention_time_unit: specifies the time range for hot data. You can specify the time unit as MINUTE, HOUR, DAY, WEEK, MONTH, or YEAR.
- MIXED_RETENTION = intnum retention_time_unit:
  - If you set MIXED_RETENTION, data beyond HOT_RETENTION + MIXED_RETENTION is considered cold data.
  - If you do not set MIXED_RETENTION, data beyond HOT_RETENTION is considered auto data.
  - If you do not set HOT_RETENTION, data in the range [0, MIXED_RETENTION] is considered auto data, and data beyond MIXED_RETENTION is considered cold data.
- GRANULARITY = {"partition" | "block"}: optional. Specifies the granularity for the cold and hot storage strategy. Valid values:
  - partition: specifies that the cache strategy is applied at the partition level.
  - block: specifies that the cache strategy is applied at the macroblock level.
    
    Note
    
    If you do not partition the table and want to manually specify a period of time as hot data, you can set block. For more information about the macroblock-level cold and hot storage strategy, see Create a macroblock-level cold and hot storage strategy.

Examples

Example 1

Run the following SQL statement to create a columnstore index for a table.

Run the following SQL statement to create a table named test_tbl1.
```
CREATE TABLE test_tbl1 (col1 INT, col2 VARCHAR(50));
```
Run the following SQL statement to create a columnstore index named idx1_test_tbl1 for the test_tbl1 table, and specify the col1 column as the indexed column.
```
CREATE INDEX idx1_test_tbl1 ON test_tbl1 (col1) WITH COLUMN GROUP(each column);
```

Example 2

In shared storage mode, run the following SQL statement to create a cold-hot storage strategy for an index table.

When you create an index table, specify the data of the index table idx1_tbl1 as hot data.

CREATE INDEX idx1_tbl1 ON tbl1 (c1) LOCAL 
  STORAGE_CACHE_POLICY (GLOBAL = "hot");

Create a table named tbl2 and specify the data of the index table idx2 on the tbl2 table as hot data.

CREATE TABLE tbl2 (col1 INT, col2 INT,
  INDEX idx2 ((col1 + 1)) STORAGE_CACHE_POLICY (GLOBAL = "hot"), UNIQUE KEY ((col1 + col2)));

When you create a global index, configure the cache strategy for the global index partitions and the time axis strategy.

CREATE INDEX idx_global ON tbl1 (c1) GLOBAL
  PARTITION BY RANGE (c1)
  (PARTITION p1 VALUES LESS THAN (100),
   PARTITION p2 VALUES LESS THAN (200),
   PARTITION p3 VALUES LESS THAN MAXVALUE);

Example 3

Run the following SQL statement to create a search index.

Create a table named tbl.

CREATE TABLE tbl (col1 JSON, col2 DECIMAL(10,2), col3 JSON, col4 JSON, col5 JSON, col6 varchar(512));

Create a search index named search_idx.

CREATE SEARCH INDEX search_idx ON tbl(
  -- path must be replaced with the actual path.
  col1 WITH (INCLUDE_PATHS = ('$.path3', '$.path4')),
  col2 WITH (EXCLUDE_PATHS = ('$.path1', '$.path3')),
  col3 WITH (INCLUDE_TYPES = (JSON_STRING)),
  col4
);

OceanBase

Customer Stories

Documentation