JSON multi-value index and full-text index practices|V4.6.0|OceanBase Database| docs|Distributed Database

JSON multi-value index and full-text index practices

Last Updated：2026-05-07 11:26:24 Updated

When you create a special index in an analytical processing (AP) database, you can follow the following best practices:

JSON multi-value indexes

Scenarios

JSON multi-value indexes are specifically designed for array fields in JSON documents. They are ideal for scenarios where queries involve multiple values or attributes, significantly improving query efficiency. Multi-value indexes are particularly useful in the following scenarios:

Many-to-many relationship queries: When two entities have a many-to-many relationship, multi-value indexes can accelerate queries. For example, an actor can appear in multiple movies, and a movie can involve multiple actors. You can store all actors involved in a movie in a JSON array and use a JSON multi-value index to optimize queries for movies featuring specific actors.
Tag and category queries: When entities have multiple tags or categories, multi-value indexes can accelerate queries. For example, a product may have multiple tag attributes. You can store these tags in a JSON array and use a JSON multi-value index to quickly query products with specific tags.

JSON multi-value indexes are commonly used to accelerate queries based on JSON arrays where the WHERE clause contains the following three predicates:
For more information about JSON multi-value indexes, see Multi-value indexes. For more information about JSON multi-value indexes, see Multi-value indexes.

Best practices

The following example shows how to use a JSON multi-value index. Assume that you have a user table that records the user ID, name, age, and hobbies. The hobbies are stored as a JSON array. A user may have multiple hobbies, which can be considered as tags. The table structure for storing user information is as follows:

create table user_info(user_id bigint, name varchar(1024), age bigint, hobbies json);
insert into user_info values(1, "LiLei", 18, '["reading", "knitting", "hiking"]');
insert into user_info values(2, "HanMeimei", 17, '["reading", "Painting", "Swimming"]');
insert into user_info values(3, "XiaoMing", 19, '["hiking", "Camping", "Swimming"]');

In product advertising, you need to accurately target users based on their hobbies. For example, before advertising climbing gear, you need to query users who have the hobby of "hiking". The corresponding query statement is as follows:

OceanBase(root@test)>select user_id, name from user_info where JSON_CONTAINS(hobbies->'$[*]', CAST('["hiking"]' AS JSON));
+---------+----------+
| user_id | name     |
+---------+----------+
|       1 | LiLei    |
|       3 | XiaoMing |
+---------+----------+

-- The first time you execute this query, it may require a full table scan, which is inefficient.
OceanBase(root@test)>explain select user_id, name from user_info where JSON_CONTAINS(hobbies->'$[*]', CAST('["hiking"]' AS JSON));
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan                                                                                                                                                              |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ====================================================                                                                                                                    |
| |ID|OPERATOR       |NAME     |EST.ROWS|EST.TIME(us)|                                                                                                                    |
| ----------------------------------------------------                                                                                                                    |
| |0 |TABLE FULL SCAN|user_info|2       |3           |                                                                                                                    |
| ====================================================                                                                                                                    |
| Outputs & filters:                                                                                                                                                      |
| -------------------------------------                                                                                                                                   |
|   0 - output([user_info.user_id], [user_info.name]), filter([JSON_CONTAINS(JSON_EXTRACT(user_info.hobbies, '$[*]'), cast('[\"hiking\"]', JSON(536870911)))]), rowset=16 |
|       access([user_info.hobbies], [user_info.user_id], [user_info.name]), partitions(p0)                                                                                |
|       is_index_back=false, is_global_index=false, filter_before_indexback[false],                                                                                       |
|       range_key([user_info.__pk_increment]), range(MIN ; MAX)always true                                                                                                |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

From the query plan, you can see that the query requires a full table scan and filters each row of the JSON array. The filtering overhead of JSON is significant. When the number of rows to be filtered reaches a certain level, the query efficiency is greatly affected. In this case, you can create a JSON multi-value index on the hobbies column to significantly improve the query efficiency.

The JSON multi-value index is disabled by default. You need to enable it in the sys tenant.

alter system set _enable_add_fulltext_index_to_existing_table = true;

-- You can see that the query plan shows a full table scan. To optimize performance, we recommend that you create a JSON multi-value index on the hobbies column:
CREATE INDEX idx1 ON user_info ( (CAST(hobbies->'$[*]' AS char(512) ARRAY)) );

-- After the index is created, execute the query again. The performance is significantly improved.
OceanBase(root@test)>explain select user_id, name from user_info where JSON_CONTAINS(hobbies->'$[*]', CAST('["hiking"]' AS JSON));
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan                                                                                                                                                   |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ==========================================================                                                                                                   |
| |ID|OPERATOR       |NAME           |EST.ROWS|EST.TIME(us)|                                                                                                   |
| ----------------------------------------------------------                                                                                                   |
| |0 |TABLE FULL SCAN|user_info(idx1)|1       |10          |                                                                                                   |
| ==========================================================                                                                                                   |
| Outputs & filters:                                                                                                                                           |
| -------------------------------------                                                                                                                        |
|   0 - output([user_info.user_id], [user_info.name]), filter([JSON_CONTAINS(JSON_EXTRACT(user_info.hobbies, '$[*]'), cast('[\"hiking\"]', JSON(536870911)))]) |
|       access([user_info.__pk_increment], [user_info.hobbies], [user_info.user_id], [user_info.name]), partitions(p0)                                         |
|       is_index_back=true, is_global_index=false, filter_before_indexback[false],                                                                             |
|       range_key([user_info.SYS_NC_mvi_21], [user_info.__pk_increment], [user_info.__doc_id_1733716274684183]), range(hiking,MIN,MIN ; hiking,MAX,MAX)        |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
11 rows in set (0.005 sec)

Note that a JSON multi-value index occupies additional storage space and may affect write performance. When you modify a JSON field that contains a multi-value index (for example, you insert, update, or delete data in the field), the index is also updated. Therefore, the write overhead increases. Therefore, you need to weigh the advantages and disadvantages of JSON multi-value indexes and create them as needed.

Full-text index

Scenarios

When you need to perform fuzzy queries on a large amount of text data, the performance of a full-table scan is often insufficient when the text is large and the data volume is high. In addition, complex query scenarios such as approximate matching and relevance sorting are also difficult to support by rewriting SQL statements.

To better support these scenarios, full-text indexes are introduced. They preprocess text content and build keyword indexes to significantly improve the efficiency of full-text searches. Full-text indexes are applicable to various scenarios. The following are some specific examples:

Internal knowledge base of enterprises: Many large enterprises build their own internal knowledge base systems to store project documents, meeting records, research reports, and other materials. Using full-text indexes can help employees quickly and accurately find the information they need, thereby improving work efficiency.
Online libraries and e-book platforms: For services that provide a large number of books for users to read, full-text indexes are extremely important. Users can search for books by title, author, or even specific text segments within the books. The system can quickly locate the results based on the full-text index.
News portals and social media websites: These platforms generate a massive amount of fresh content daily, including articles, posts, and comments. Using full-text indexes allows users to filter information streams based on topics, events, or person names they care about, and obtain the most relevant content.
Legal document retrieval systems: The legal industry involves a large amount of document review work, such as contracts, court decisions, and legal provisions. An efficient full-text search engine can greatly simplify the workflow for lawyers, allowing them to quickly find precedents, referenced clauses, and relevant legal bases.
Healthcare information systems: In the medical field, doctors often need to review patients' medical records, the latest medical research papers, and other reference materials. With full-text indexes, healthcare professionals can more easily access the relevant information, enabling more accurate diagnostic decisions.

Any application involving the management and querying of a large amount of unstructured text data can consider using full-text indexes to improve search efficiency. For more information about the full-text search capabilities of OceanBase Database, see Full-text index.

Best practices

We define a table to store document data and create a full-text index for the documents. With the full-text index, you can quickly match documents containing the expected keywords and sort them by similarity from high to low.

Create a table and add a full-text index

CREATE TABLE Articles (
    id INT AUTO_INCREMENT,
    title VARCHAR(255),
    content TEXT,
    PRIMARY KEY (id),
    FULLTEXT ft1 (content) WITH PARSER SPACE
);