CREATE INDEX|V4.4.2|OceanBase Database| docs|Distributed Database

CREATE INDEX

Last Updated：2026-04-02 06:23:57 Updated

Purpose

This statement is used to create an index. An index is a structure created on a table that sorts the values of one or more columns in the database table. Its main purpose is to improve query speed and reduce the performance overhead of the database system.

OceanBase Database currently supports unique and non-unique indexes. When creating an index, you can use the STORING(column_name,...) clause to specify redundant columns in the index table. In addition to the columns specified by the user, the index table will automatically include the primary key columns of the primary table (or the hidden primary key columns if no primary key exists). Storing additional columns in the index provides the database optimizer with more options.

Limitations and considerations

If all indexed columns are deleted from a table, the indexes created on the table become invalid.
Full-text indexes can be created only on columns of the CHAR, VARCHAR, and TEXT data types.
Only local (LOCAL) full-text indexes can be created in the current version.
The UNIQUE keyword cannot be specified when creating a full-text index.
When creating a full-text index that involves multiple columns, ensure that the columns have the same character set.
The WITH PARSER {SPACE | NGRAM | BENG | IK} option can be specified only when creating a full-text index.

Privilege requirements

To execute the CREATE INDEX statement, the current user must have at least the INDEX privilege on the corresponding object. For more information about the privileges of OceanBase Database, see Privilege types in MySQL-compatible mode.

Syntax

CREATE [hint_options] [index_type] INDEX [IF NOT EXISTS] index_name
    [USING BTREE] ON table_name (sort_column_key [, sort_column_key... ])
    [WITH PARSER tokenizer_option]
    [PARSER_PROPERTIES[=](parser_properties_list)]
    [index_option...] [partition_option] [index_column_group_option]

index_type:
    SPATIAL
    | UNIQUE
    | FULLTEXT

sort_column_key:
      column_name [(integer)] [ASC] [ID id]
    | (index_expr) [ASC] [ID id]

tokenizer_option:
    SPACE
    | NGRAM
    | BENG
    | IK
    | NGRAM2

parser_properties_list:
    parser_properties, [parser_properties]

parser_properties:
    min_token_size = int_value
    | max_token_size = int_value
    | ngram_token_size = int_value
    | ik_mode = 'char_value'
    | min_ngram_size = int_value
    | max_ngram_size = int_value

index_option:
      GLOBAL
    | LOCAL
    | BLOCK_SIZE integer
    | COMMENT STRING_VALUE
    | STORING (column_name [, column_name...])
    | WITH_ROWID
    | WITH PARSER STRING_VALUE
    | index_using_algorithm
    | visibility_option
    | DATA_TABLE_ID data_table_id
    | INDEX_TABLE_ID index_table_id
    | VIRTUAL_COLUMN_ID virtual_column_id
    | MAX_USED_PART_ID used_part_id
    | STORAGE_CACHE_POLICY(storage_cache_policy_option)

index_column_group_option:
    WITH COLUMN GROUP([all columns, ]each column)

storage_cache_policy_option:
    GLOBAL = {"hot"|"auto"|"none"}
    | timeline_strategy_list

timeline_strategy_list:
    BOUNDARY_COLUMN = column_name
    [, BOUNDARY_COLUMN_UNIT = {"s"|"ms"}]
    [, HOT_RETENTION = intnum retention_time_unit]


retention_time_unit:
    YEAR
    | MONTH
    | WEEK
    | DAY
    | HOUR
    | MINUTE

Parameters

Parameter	Description
hint_options	The hint options. This parameter is optional. Currently, only the `parallel` option is supported. For example, `/+ parallel(10) /`.
index_type	The type of the index. This parameter is optional. For more information, see index_type.
IF NOT EXISTS	Indicates whether to skip index creation if the index already exists. If the index exists and the `IF NOT EXISTS` option is not specified, an error is returned.
index_name	The name of the index to be created.
USING BTREE	The index structure to be used. This parameter is optional. For more information, see USING BTREE.
table_name	The name of the table on which to create the index. You can directly specify the table name or use the `schema_name.table_name` format to specify the table name and the name of the database to which the table belongs.
sort_column_key	The key of the sort column. You can specify multiple sort columns when you create an index. Separate the sort columns with commas (`,`). For more information, see sort_column_key.
WITH PARSER tokenizer_option	The tokenizer to be used for a full-text index. This parameter is optional. For more information, see tokenizer_option.
PARSER_PROPERTIES[=](parser_properties_list)	The properties of the tokenizer. This parameter is optional. For more information, see parser_properties.
index_option	The index options. You can specify multiple index options when you create an index. Separate the index options with spaces. For more information, see index_option.
index_column_group_option	The index options. For more information, see index_column_group_option.
STORAGE_CACHE_POLICY	The cache strategy for index hotspots in shared storage mode. For more information, see storage_cache_policy .

index_type

SPATIAL: specifies the creation of a spatial index.
UNIQUE: specifies the creation of a unique index.
FULLTEXT: specifies the creation of a full-text index.

For more information about creating indexes, see Create an index.

sort_column_key

column_name [(integer)] [ASC] [ID id]: specifies a column name as a sort key.
- column_name: the name of the column to be sorted.
- integer: an optional parameter that specifies the maximum length of the sort key.
- ASC: an optional parameter that specifies ascending order. Descending order (DESC) is not supported.
- ID id: an optional parameter that specifies the ID of the sort key.
For example, create an index named index3 on the c1 column of the t3 table, and sort the index in ascending order.
```
CREATE INDEX index3 ON t3 (c1 ASC);
```
(index_expr) [ASC] [ID id]: specifies an index expression as a sort key. An index expression can be defined using an expression or a function. It can include the following options:
- (index_expr): the index expression, which can be a boolean expression, such as c1=c1. OceanBase Database does not support creating function indexes on generated columns. For information about the expressions supported by function indexes, see Supported system functions for function indexes.
- ASC: an optional parameter that specifies ascending order. Descending order (DESC) is not supported.
- ID id: an optional parameter that specifies the ID of the sort key.
For example, create an index named index4 on the t4 table, specify c1+c2 as the index expression, and sort the index in ascending order.
```
CREATE INDEX index4 ON t4 ((c1 + c2) ASC);
```

You can specify multiple sort columns when creating an index. Separate multiple sort columns with commas (,). For example, create an index named index5 on the t5 table, and specify c1 and c2+c3 as the sort keys.

CREATE INDEX index5 ON t5 (c1, (c2+c3));

tokenizer_option

SPACE: the default value. Specifies the use of space-based tokenization. You can specify the following parameters:

Parameter

Value range

min_token_size [1, 16]

max_token_size [10, 84]
NGRAM: specifies the use of N-gram-based tokenization for Chinese text. You can specify the following parameters:

Parameter

Value range

ngram_token_size [1, 10]
NGRAM2: specifies the use of N-gram-based tokenization for text. You can specify the following parameters:

Parameter

Value range

min_ngram_size [1, 16]

max_ngram_size [1, 16]
BENG: specifies the use of basic English-based tokenization. You can specify the following parameters:

Parameter

Value range

min_token_size [1, 16]

max_token_size [10, 84]
IK: specifies the use of IK-based tokenization for Chinese text. Only the utf-8 character set is supported. You can specify the following parameter:
Parameter

Value range
ik_mode

smart

max_word

Parameter	Value range
min_token_size	[1, 16]
max_token_size	[10, 84]

Parameter	Value range
ngram_token_size	[1, 10]

Parameter	Value range
min_ngram_size	[1, 16]
max_ngram_size	[1, 16]

Parameter	Value range
min_token_size	[1, 16]
max_token_size	[10, 84]

Parameter	Value range
ik_mode	`smart` `max_word`

You can use the TOKENIZE function to view the tokenization results of text based on the specified tokenizer and JSON-formatted parameters.

parser_properties

min_token_size: the minimum token length. Default value: 3. Value range: 1 to 16.
max_token_size: the maximum token length. Default value: 84. Value range: 10 to 84.
ngram_token_size: the token length for the NGRAM tokenizer. This parameter is valid only for the NGRAM tokenizer. Default value: 2. Value range: 1 to 10.
ik_mode: the tokenization mode for the IK tokenizer. Valid values:
- smart: the default value. The dictionary is used to improve tokenization accuracy. The boundaries of dictionary words are prioritized, which may reduce unnecessary extensions.
- max_word: the dictionary is used to identify words, but it does not affect the maximum extension of tokenization. Even if the dictionary defines words, the max_word mode will still attempt to split the text into more words.

Here is an example:

Create a table named tbl1.

CREATE TABLE tbl1 (col1 VARCHAR(200), col2 TEXT);

Create a full-text index on the col2 column of the tbl1 table, specify the IK tokenizer, and set the tokenizer properties using PARSER_PROPERTIES.
```
CREATE FULLTEXT INDEX fidx_tbl1 ON tbl1(col2) WITH PARSER IK PARSER_PROPERTIES=(ik_mode='max_word');
```

index_option

GLOBAL: specifies to create a global index.
LOCAL: specifies to create a local index.
BLOCK_SIZE integer: specifies the size of an index block, that is, the number of bytes in each index block.
COMMENT STRING_VALUE: specifies a comment for the index.
STORING (column_name [, column_name...]): specifies the columns to be stored in the index. Separate multiple columns with commas (,).
WITH_ROWID: specifies to create an index that contains row identifiers.
WITH PARSER STRING_VALUE: specifies the parser required for the index.
index_using_algorithm: specifies the algorithm used for the index. You can specify USING BTREE or USING HASH. However, you cannot create a multivalued index by using the USING HASH option.
visibility_option: specifies the visibility of the index.
DATA_TABLE_ID data_table_id: specifies the data table ID to which the index belongs.
INDEX_TABLE_ID index_table_id: specifies the index table ID.
VIRTUAL_COLUMN_ID virtual_column_id: specifies the virtual column ID.
MAX_USED_PART_ID used_part_id: specifies the maximum partition ID used by the index.

index_column_group_option

WITH COLUMN GROUP(all columns, each column): specifies to create a rowstore columnstore redundant index.
WITH COLUMN GROUP(each column): specifies to create a columnstore index.

storage_cache_policy

Notice

You can specify the hot cache strategy (STORAGE_CACHE_POLICY) when you create an index only in shared storage mode.

You can configure the hot cache strategy for an index table.

For a local index, you can specify the table-level STORAGE_CACHE_POLICY. If you do not specify the table-level STORAGE_CACHE_POLICY, the default value is none, which indicates that the hot cache strategy of the local index partitions follows that of the data table partitions.
For a global index, you can specify the table-level STORAGE_CACHE_POLICY and the partition-level STORAGE_CACHE_POLICY of the index table if the index table is partitioned. If you do not specify the table-level STORAGE_CACHE_POLICY, the default value is none.

The following example creates a local index idx1_tbl1 on the c1 column of the tbl1 table in the Key-Value format. The storage_cache_policy_option specifies the hot cache strategy of the index. The semantics of each attribute are as follows:

GLOBAL = {"hot" | "auto" | "none"}: specifies whether the data of the index is hot data. Valid values:
- hot: specifies that the index is stored in hot storage. If the cache disk space is sufficient, all data in the index table will be cached to the local cloud disk.
- auto: specifies that the system automatically identifies hot data in the index table.
- none: the default value. This value indicates that the hot cache strategy of the index follows that of the primary table STORAGE_CACHE_POLICY.
timeline_strategy_list: specifies the time axis strategy parameters. Separate the parameters with commas (,). The hot cache time axis strategy supports a mechanism to determine whether partition data is hot data based on time. The system automatically adjusts the partition data on the local cache disk based on the configured strategy. This parameter is supported only for global indexes.
- BOUNDARY_COLUMN = column_name: specifies the column used to determine hot data. Supported types include integers (BIGINT or INT, in the Unix timestamp format) and time types (TIMESTAMP, DATE, DATETIME, or YEAR). If BOUNDARY_COLUMN is an integer, the table can be partitioned by range or range columns. If BOUNDARY_COLUMN is a time type, the table can be partitioned only by range columns.
- BOUNDARY_COLUMN_UNIT = {"s" | "us" | "ms"}: specifies the time unit of the BOUNDARY_COLUMN parameter. This parameter is supported only when BOUNDARY_COLUMN is an integer. Valid values:
  - If the partition column is of the INT type, BOUNDARY_COLUMN_UNIT can only be s.
  - If the partition column is of the BIGINT type, BOUNDARY_COLUMN_UNIT can be s or ms.
- HOT_RETENTION = intnum retention_time_unit: specifies the time range for hot data. You can specify the time range in MINUTE, HOUR, DAY, WEEK, MONTH, or YEAR.

Examples

Example 1

Run the following SQL statements to create a columnstore index for a table.

Run the following SQL statement to create a table named test_tbl1.
```
CREATE TABLE test_tbl1 (col1 INT, col2 VARCHAR(50));
```
Run the following SQL statement to create a columnstore index named idx1_test_tbl1 for the test_tbl1 table, and specify the col1 column as the reference column.
```
CREATE INDEX idx1_test_tbl1 ON test_tbl1 (col1) WITH COLUMN GROUP(each column);
```

Example 2

In shared storage mode, run the following SQL statements to configure a hot data caching strategy for an index table.

Create an index table named idx1_tbl1 and specify its data as hot data.

CREATE INDEX idx1_tbl1 ON tbl1 (c1) LOCAL 
  STORAGE_CACHE_POLICY (GLOBAL = "hot");

Create a table named tbl2, and configure the data of the index table named idx2 on the tbl2 table as hot data.

CREATE TABLE tbl2 (col1 INT, col2 INT,
  INDEX idx2 ((col1 + 1)) STORAGE_CACHE_POLICY (GLOBAL = "hot"), UNIQUE KEY ((col1 + col2)));

Configure the caching strategy for global index partitions and the time axis strategy when you create a global index.

CREATE INDEX idx_global ON tbl1 (c1) GLOBAL
  PARTITION BY RANGE (c1)
  (PARTITION p1 VALUES LESS THAN (100),
   PARTITION p2 VALUES LESS THAN (200),
   PARTITION p3 VALUES LESS THAN MAXVALUE);

OceanBase

Customer Stories

Documentation