This topic describes the search index, including its capabilities, prerequisites, supported data types and predicates, common DDL operations, and query examples.
A search index can be built on a single column or multiple columns to create a unified inverted index structure. It can establish index entries for multiple paths within a JSON column or for the entire value of a scalar column. A single index can cover multiple paths and support various predicates. Compared to creating separate function indexes for each access path or using multi-value indexes with partial array element expansion, search indexes emphasize building a single index that covers multiple paths, reducing the number of indexes and maintenance costs, and making it easier to handle the variable paths and diverse query patterns in semi-structured data.
Overview
- Typical scenarios: Used together with hybrid search or with multi-column search indexes on analytical wide tables, and with index merge queries, to improve performance.
- AI hybrid search scenarios: Full-text tokenization and keyword retrieval are handled by the full-text index, while the search index focuses on structured and semi-structured conditions within complex types (which can also be used for simple scalar conditions). The two types of indexes can coexist in the same table and be used together with vector indexes.
- AP analytical scenarios: When search indexes are built on wide tables and multiple columns, the optimizer can automatically identify the best index combinations and scan paths using index merge capabilities, improving performance.
- DDL operations: DDL operations for search indexes are the same as for regular indexes, supporting creation, deletion, renaming, and visibility changes.
Limitations
- Tenants: This feature is only available for MySQL mode tenants.
- Table type: The target table must be a
ORGANIZATION = HEAPtable. - Column-level configuration: Column-level
WITH (...)only applies to JSON columns. - JSON path length: The encoded JSON path must be no longer than 2KB. If the JSON document has very long keys or deeply nested structures, the DML operation may fail with
ERROR 1071due to path length limits. It is recommended to avoid excessively long keys in the JSON document. - Array types: Only single-level arrays are supported; nested arrays are not supported.
- Partitioned tables: Only local search indexes (
LOCAL) are supported for partitioned tables. - Index columns: Accelerated
LIKEpattern matching on index columns is not supported. - Generated columns: Search indexes cannot be created on generated columns (
GENERATED); theSTORINGclause is not supported.
How to use
Supported data types
Search indexes are primarily used to accelerate queries on scalar columns and JSON/one-level array columns. The supported data types are as follows:
| Category | Supported types |
|---|---|
| Integer types | TINYINT, SMALLINT, MEDIUMINT, INT, BIGINT, and their UNSIGNED variants |
| Floating-point/decimal types | FLOAT, DOUBLE, DECIMAL, and NUMBER |
| String types | VARCHAR and CHAR |
| Text types | TINYTEXT, TEXT, MEDIUMTEXT, and LONGTEXT |
| Binary types | BINARY and VARBINARY |
| Binary large objects | TINYBLOB, BLOB, MEDIUMBLOB, and LONGBLOB |
| Date and time types | DATE, TIME, DATETIME, TIMESTAMP, and YEAR |
| JSON type | JSON |
| Array types | ARRAY(element_type) (only one-level arrays, such as ARRAY(INT) and ARRAY(VARCHAR(256))) |
The following limitations apply:
- Search indexes cannot be created on columns of the
BIT,ENUM,SET, orMAPtype. - For PICK columns, the
json_numberfield supports integer and floating-point/decimal types, and thejson_stringfield supports string types.
Syntax and parameters
Search indexes can be created when creating a table, after the table is created, or appended to an existing table. The specific syntax and common parameters for each method are as follows:
CREATE TABLE table_name (
column_name type
[, column_name type ...]
SEARCH INDEX index_name (
column_name [WITH (option_list)]
[, column_name [WITH (option_list)] ...]
)
) ORGANIZATION = HEAP;
option_list:
[INCLUDE_PATHS = ('path1'[, 'path2']...)]
[| EXCLUDE_PATHS = ('path1'[, 'path2']...)]
[| INCLUDE_TYPES = (type1[, type2]...)]
CREATE SEARCH INDEX index_name ON table_name (
column_name [WITH (option_list)]
[, column_name [WITH (option_list)] ...]
) [LOCAL];
option_list:
[INCLUDE_PATHS = ('path1'[, 'path2']...)]
[| EXCLUDE_PATHS = ('path1'[, 'path2']...)]
[| INCLUDE_TYPES = (type1[, type2]...)]
You can also use the ALTER TABLE statement to create a search index:
ALTER TABLE table_name ADD SEARCH INDEX index_name (
column_name [WITH (option_list)]
[, column_name [WITH (option_list)] ...]
);
option_list:
[INCLUDE_PATHS = ('path1'[, 'path2']...)]
[| EXCLUDE_PATHS = ('path1'[, 'path2']...)]
[| INCLUDE_TYPES = (type1[, type2]...)]
| Parameter | Description | Value range | |------------- ----|---------|---------| | option_list ... | A list of column-level options that control the JSON paths involved in index construction for the column. | | INCLUDE_PATHS / EXCLUDE_PATHS | A column-level path whitelist or blacklist. You can specify only one of these options. These options control the JSON paths involved in index construction for the column. By default, all paths are involved in index construction. The paths can be configured in two ways:
- Simple deterministic paths. For example,
$.keyand$.a. In this case, only scalar and array data under$.keyand$.aare indexed, and data under$.a.bis not indexed. - Prefix-matching paths. For example,
$.a.*. In this case, JSON data under$.ais indexed, and data under$.a.bis also indexed.$.*matches all paths.
INCLUDE_TYPES = (JSON_STRING) indicates that only path indexes of the JSON_STRING type are constructed. By default, all types are involved in index construction. | JSON_STRING, JSON_NUMBER | | LOCAL | An optional parameter that indicates to create a local index. For partitioned tables, you must create a local search index when creating or appending a search index. |
Other operations such as deletion, renaming, and visibility changes are the same as for regular indexes. For more information, see the relevant documentation at the end of this topic.
Supported filtering queries
Note
- Whether a search index is used for optimization is determined by the optimizer and the
EXPLAINexecution plan. This topic only describes the typical range of filtering queries that can be accelerated.
Scalar
For scalar columns (numerical, string, and time), the following filtering queries can be accelerated:
- Equality:
col = value - Range:
col > / >= / < / <=andBETWEEN IN (list)IS NULL/IS NOT NULLAND/ORcombinations: If multiple columns are defined in the same search index or combined with other indexes, Index Merge may be used.
Recommendation: For high selectivity and ordered range access, retain B-tree indexes. For low selectivity, OR combinations, or coexistence of JSON and arrays, search indexes with Index Merge are usually more efficient.
JSON
After a search index is created, you can directly query the internal fields of a JSON column without full table scans.
| Category | Description |
|---|---|
| Value by path | In a JSON column, extract the specified scalar value by path, and then use the JSON_EXTRACT() function or -> operator to compare it with a value by using equality or range comparison operators (>, >=, <, <=, BETWEEN, etc.). For example, col->'$.price' > 100. |
MEMBER OF |
Determines whether a value is an element in a JSON array. The right operand can be a column or a path expression. For example, MEMBER OF(col). |
JSON_CONTAINS |
Determines whether a JSON document contains a specified scalar value or subdocument. For example, JSON_CONTAINS(c1->'$.tags', '"mobile"'). |
JSON_OVERLAPS |
Determines whether two JSON arrays have at least one common element. For example, JSON_OVERLAPS(col1, '[1, 2, 3]'). |
IN |
You can use the IN (...) clause to match the scalar value extracted by path or the entire column. For example, IN (col->'$.price', col2). |
PICK |
When the value of a JSON field may be of multiple types (such as both numeric and string values exist in the same path), you can use the PICK keyword to filter by type in queries. You can specify PICK json_number or PICK json_string after the path. You can also use PICK with comparison operators, MEMBER OF, JSON_CONTAINS, and JSON_OVERLAPS. json_number and json_string indicate numeric and string types, respectively. For more information, see the "Supported data types" section above. |
Notice
When you use the JSON_VALUE() function with the PICK keyword, the query cannot be accelerated by a search index if you also use the RETURNING, ON ERROR, or ON EMPTY clause.
Recommendation: Specify paths explicitly in queries. Use the WITH clause to narrow the index scope when a document contains many fields. Use PICK for mixed types in the same path. Use full-text indexes for keyword searches.
One-dimensional array
For ARRAY(element_type) columns:
| Function | Description |
|---|---|
ARRAY_CONTAINS(col, value) |
Whether the array contains the specified element. |
ARRAY_CONTAINS_ALL(col, array_literal) |
Whether the array contains all elements in the specified array. |
ARRAY_OVERLAPS(col, array_literal) |
Whether the array has at least one common element. |
Recommendation: Use one-dimensional arrays for tags and multi-value attributes. Select the contains, contains all, or overlaps function based on the semantics. You can also use these functions with B-tree conditions on tables for Index Merge.
Confirm whether an index is used
You can use the EXPLAIN statement to view the execution plan and confirm whether the query actually uses a search index. For example:
EXPLAIN SELECT * FROM products WHERE info->'$.price' > 100 AND category = 'electronics';
If the execution plan contains the name of a search index (such as idx_products), the query actually uses the search index.
Scenario examples
Here are some common scenario examples for reference.
Scalar column queries
The following example demonstrates how to create a search index on integer and string columns and perform various queries:
Create a sample table and search index, and insert test data
CREATE TABLE orders ( order_id INT, user_id INT, amount DOUBLE, status VARCHAR(64), SEARCH INDEX idx_orders (user_id, amount, status) ) ORGANIZATION = HEAP; INSERT INTO orders VALUES (1, 100, 99.5, 'paid'); INSERT INTO orders VALUES (2, 101, 200.0, 'pending'); INSERT INTO orders VALUES (3, 100, 50.0, 'paid'); INSERT INTO orders VALUES (4, 102, 150.0, 'shipped'); INSERT INTO orders VALUES (5, 100, 300.0, 'refund');Perform various queries
-- Equality query: Find all orders where user_id = 100 SELECT * FROM orders WHERE user_id = 100; -- Expected result: three rows with order_id = 1, 3, and 5 -- Range query: Find orders with an amount between 80 and 200 SELECT * FROM orders WHERE amount > 80 AND amount < 200; -- Expected result: two rows with order_id = 1 and 4 -- IN query: Find orders with status 'paid' or 'shipped' SELECT * FROM orders WHERE status IN ('paid', 'shipped'); -- Expected result: three rows with order_id = 1, 3, and 4 -- Combined conditions SELECT * FROM orders WHERE user_id = 100 AND status = 'paid'; -- Expected result: two rows with order_id = 1 and 3Confirm that the index is being used
EXPLAIN SELECT * FROM orders WHERE user_id = 100;The execution plan should include the search index name
idx_orders.
JSON queries
Create a sample table and search index, and insert test data
CREATE TABLE products ( id INT, -- Scalar column info JSON, -- JSON column SEARCH INDEX idx_products (info) ) ORGANIZATION = HEAP; INSERT INTO products VALUES (1, '{"name": "Phone", "price": 999, "brand": "Apple", "tags": ["electronics", "mobile"]}'), (2, '{"name": "Book", "price": 29, "brand": "O\'Reilly", "tags": ["education"]}'), (3, '{"name": "Laptop", "price": 1999, "brand": "Dell", "tags": ["electronics", "computer"]}'), (4, '{"name": "Pen", "price": 5, "brand": "Pilot", "tags": ["stationery"]}');Perform various queries
-- Path comparison: Price between 10 and 1000 SELECT * FROM products WHERE info->'$.price' BETWEEN 10 AND 1000; -- Expected result: Book and Phone -- MEMBER OF query: Find products with "electronics" in tags SELECT * FROM products WHERE '"electronics"' MEMBER OF (info->'$.tags'); -- Expected result: Phone and Laptop -- JSON_CONTAINS query: Tags include both "electronics" and "mobile" SELECT * FROM products WHERE JSON_CONTAINS(info->'$.tags', '["electronics", "mobile"]'); -- Expected result: PhoneConfirm that the index is being used
EXPLAIN SELECT * FROM products WHERE info->'$.price' BETWEEN 10 AND 1000;The execution plan should include the search index name
idx_products.
JSON filter queries
The following example demonstrates how to use the WITH clause to limit the index scope in a user profile scenario.
Create a sample table and search index, and insert test data
CREATE TABLE users ( id INT, profile JSON, SEARCH INDEX idx_profile (profile WITH (INCLUDE_PATHS = ('$.name', '$.address.*'))) ) ORGANIZATION = HEAP; INSERT INTO users VALUES (1, '{"name": "Zhang San", "age": 30, "address": {"city": "Beijing", "zip": 100000}, "debug": "xxx"}'), (2, '{"name": "Li Si", "age": 25, "address": {"city": "Shanghai", "zip": 200000}, "debug": "yyy"}');Perform various queries
-- Query that uses the index (within INCLUDE_PATHS) SELECT * FROM users WHERE JSON_EXTRACT(profile, '$.name') = '"Zhang San"'; -- Expected result: index is used SELECT * FROM users WHERE JSON_EXTRACT(profile, '$.address.city') = '"Beijing"'; -- Expected result: index is used -- Query that does not use the index (outside INCLUDE_PATHS) SELECT * FROM users WHERE JSON_EXTRACT(profile, '$.age') = 30; -- Expected result: full table scan (age path is not indexed)Confirm that the index is being used
EXPLAIN SELECT * FROM users WHERE JSON_EXTRACT(profile, '$.age') = 30;The execution plan should not include the search index name
idx_profile, indicating a full table scan is being performed.
Index merge
The following example shows how to use index merge. The search index can work with B-tree indexes and full-text indexes on a table.
Create a table with multiple indexes and insert test data.
CREATE TABLE products_v2 ( id INT PRIMARY KEY, name VARCHAR(128), category INT, price INT, tags JSON, INDEX idx_category(category), -- B-tree index INDEX idx_price(price), -- B-tree index FULLTEXT INDEX ft_name(name), -- Full-text index SEARCH INDEX idx_tags(tags) -- Search index ) ORGANIZATION HEAP; INSERT INTO products_v2 VALUES (1, 'OceanBase Database', 1, 0, '{"type": "database", "license": "open-source"}'), (2, 'MySQL Server', 1, 0, '{"type": "database", "license": "open-source"}'), (3, 'Premium Support', 2, 999, '{"type": "service"}');Run various queries.
-- category uses the B-tree index, tags uses the search index, and the results are intersected. SELECT * FROM products_v2 WHERE category = 1 AND json_extract(tags, '$.license') = '"open-source"'; -- Full-text search matches "database" or tags with type "service" SELECT * FROM products_v2 WHERE MATCH(name) AGAINST ("database") OR json_extract(tags, '$.type') = '"service"'; -- Complex combination: B-tree AND (B-tree OR search index) SELECT * FROM products_v2 WHERE category = 1 AND (price = 0 OR json_extract(tags, '$.type') = '"service"');
References
- This feature is mainly used in hybrid search scenarios. For more information about the scenarios, see Hybrid search.
- For more information about the principles and concepts, see Search index.
- For more information about the syntax and parameters, see CREATE INDEX.
- For more information about the syntax and parameters, see CREATE TABLE.
- For more information about the syntax and parameters, see ALTER TABLE.
