When creating special indexes in an analytical processing (AP) database, follow these best practices:
JSON multi-value indexes
Scenarios
A JSON multi-value index is a type of index specifically designed for array fields in JSON documents. It is ideal for scenarios where you need to query multiple values or attributes, significantly improving query efficiency. Multi-value indexes are particularly useful in the following scenarios:
Many-to-many relationship queries: When there is a many-to-many relationship between two entities, a multi-value index can accelerate queries. For example, an actor can appear in multiple movies, and a movie can involve multiple actors. You can store all the actors involved in a movie in a JSON array and use a JSON multi-value index to optimize queries for movies featuring specific actors.
Tag and category queries: When an entity has multiple tags or categories, a multi-value index can accelerate queries. For example, a product may have multiple tag attributes. You can store the tags of a product in a JSON array and quickly query products that contain specific tags.
A JSON multi-value index is commonly used to accelerate queries based on JSON arrays where the WHERE clause contains the following three predicates:
For more information about JSON multi-value indexes, see Multi-value indexes.
Best practices
The following example shows how to use a JSON multi-value index. Assume that you have a table named user_info that stores the user ID, name, age, and hobbies of a user. The hobbies of a user are stored in a JSON array. A user can have multiple hobbies, which are regarded as the tags of the user. The table structure and data are as follows:
create table user_info(user_id bigint, name varchar(1024), age bigint, hobbies json);
insert into user_info values(1, "LiLei", 18, '["reading", "knitting", "hiking"]');
insert into user_info values(2, "HanMeimei", 17, '["reading", "Painting", "Swimming"]');
insert into user_info values(3, "XiaoMing", 19, '["hiking", "Camping", "Swimming"]');
In the advertising industry, you can use the hobbies of a user to precisely target the users for advertising. For example, before you launch an advertisement for climbing equipment, you need to query the users who have the hobby of "hiking". The query statement is as follows:
OceanBase(root@test)>select user_id, name from user_info where JSON_CONTAINS(hobbies->'$[*]', CAST('["hiking"]' AS JSON));
+---------+----------+
| user_id | name |
+---------+----------+
| 1 | LiLei |
| 3 | XiaoMing |
+---------+----------+
-- The first time you execute this query, it may scan the entire table, which is inefficient.
OceanBase(root@test)>explain select user_id, name from user_info where JSON_CONTAINS(hobbies->'$[*]', CAST('["hiking"]' AS JSON));
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ==================================================== |
| |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| |
| ---------------------------------------------------- |
| |0 |TABLE FULL SCAN|user_info|2 |3 | |
| ==================================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([user_info.user_id], [user_info.name]), filter([JSON_CONTAINS(JSON_EXTRACT(user_info.hobbies, '$[*]'), cast('[\"hiking\"]', JSON(536870911)))]), rowset=16 |
| access([user_info.hobbies], [user_info.user_id], [user_info.name]), partitions(p0) |
| is_index_back=false, is_global_index=false, filter_before_indexback[false], |
| range_key([user_info.__pk_increment]), range(MIN ; MAX)always true |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
From the query plan, you can see that the entire table needs to be scanned and each row of the JSON array needs to be filtered. The filtering of a JSON array is time-consuming. Therefore, when a large number of rows need to be filtered, the query efficiency is significantly affected. In this case, you can create a JSON multi-value index on the hobbies column to significantly improve the query efficiency.
The JSON multi-value index creation feature is disabled by default. You must enable the feature in the sys tenant.
alter system set _enable_add_fulltext_index_to_existing_table = true;
-- You can see from the query plan that the entire table needs to be scanned. To optimize performance, we recommend that you create a JSON multi-value index on the hobbies column:
CREATE INDEX idx1 ON user_info ( (CAST(hobbies->'$[*]' AS char(512) ARRAY)) );
-- After the index is created, execute the query again. The performance is significantly improved.
OceanBase(root@test)>explain select user_id, name from user_info where JSON_CONTAINS(hobbies->'$[*]', CAST('["hiking"]' AS JSON));
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ========================================================== |
| |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| |
| ---------------------------------------------------------- |
| |0 |TABLE FULL SCAN|user_info(idx1)|1 |10 | |
| ========================================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([user_info.user_id], [user_info.name]), filter([JSON_CONTAINS(JSON_EXTRACT(user_info.hobbies, '$[*]'), cast('[\"hiking\"]', JSON(536870911)))]) |
| access([user_info.__pk_increment], [user_info.hobbies], [user_info.user_id], [user_info.name]), partitions(p0) |
| is_index_back=true, is_global_index=false, filter_before_indexback[false], |
| range_key([user_info.SYS_NC_mvi_21], [user_info.__pk_increment], [user_info.__doc_id_1733716274684183]), range(hiking,MIN,MIN ; hiking,MAX,MAX) |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
11 rows in set (0.005 sec)
Note that a JSON multi-value index occupies additional storage space and may affect write performance. When you modify a JSON field that contains a multi-value index (such as by inserting, updating, or deleting data), the index is also updated. This increases the write overhead. Therefore, you need to weigh the pros and cons of JSON multi-value indexes and create them as needed.
Full-text index
Scenarios
When you need to perform fuzzy search on a large amount of text data, full-table scanning to perform fuzzy query on each row of data may not meet the performance requirements when the text is large and the amount of data is large. In addition, some complex query scenarios, such as approximate matching and relevance sorting, are difficult to support by rewriting SQL statements.
To better support these scenarios, full-text indexes are introduced. A full-text index preprocesses text content and builds keyword indexes to significantly improve the efficiency of full-text search. Full-text indexes are applicable to various scenarios. The following examples illustrate specific applications:
- Internal knowledge base of enterprises: Many large enterprises build their own internal knowledge base systems to store project documents, meeting records, research reports, and other materials. Full-text indexes help employees quickly and accurately find the required information, thereby improving work efficiency.
- Online libraries and e-book platforms: For services that provide a large number of books for users to read, full-text indexes are crucial. Users can search for books by title, author name, or even specific text segments within the books. The system can quickly locate the results based on the full-text index.
- News portals and social media websites: These platforms generate a massive amount of new content daily, including articles, posts, and comments. Full-text indexes allow users to filter information streams based on topics, events, or person names of interest, enabling them to access the most relevant content.
- Legal document retrieval systems: The legal industry involves extensive document review work, such as contracts, court decisions, and legal provisions. An efficient full-text search engine can greatly simplify the workflow for lawyers, allowing them to quickly find precedents, cited provisions, and relevant legal grounds.
- Healthcare information systems: In the medical field, doctors often need to review patients' medical histories, the latest medical research papers, and other reference materials. Full-text indexes enable healthcare professionals to conveniently access relevant information, supporting more accurate diagnostic decisions.
Any application involving the management and query of large amounts of unstructured text data can benefit from the use of full-text indexes to enhance retrieval efficiency. For more information about OceanBase Database's full-text search capabilities, see Full-text index.
Best practices
We define a table to store document data and create a full-text index for the documents. With the full-text index, we can quickly match documents containing the expected keywords and sort them by similarity from high to low.
Create a table and add a full-text index
CREATE TABLE Articles (
id INT AUTO_INCREMENT,
title VARCHAR(255),
content TEXT,
PRIMARY KEY (id),
FULLTEXT ft1 (content) WITH PARSER SPACE
);
Insert sample data
INSERT INTO Articles (title, content) VALUES
('Introduction to OceanBase', 'OceanBase is an open-source relational database management system.'),
('Full-Text Search in Databases', 'Full-text search allows for searching within the text of documents stored in a database. It is particularly useful for finding specific information quickly.'),
('Advantages of Using OceanBase', 'OceanBase offers several advantages such as high performance, reliability, and ease of use. ');
Query all data in the table
SELECT * FROM Articles;
The execution result is as follows:
+----+-------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id | title | content |
+----+-------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 | Introduction to OceanBase | OceanBase is an open-source relational database management system. |
| 2 | Full-Text Search in Databases | Full-text search allows for searching within the text of documents stored in a database. It is particularly useful for finding specific information quickly. |
| 3 | Advantages of Using OceanBase | OceanBase offers several advantages such as high performance, reliability, and ease of use. |
+----+-------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
3 rows in set
Execute a full-text search query
SELECT id, title, content, MATCH(content) AGAINST('OceanBase database') AS score
FROM Articles
WHERE MATCH(content) AGAINST('OceanBase database');
The execution result is as follows:
+----+-------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+
| id | title | content | score |
+----+-------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+
| 1 | Introduction to OceanBase | OceanBase is an open-source relational database management system. | 0.5699481865284975 |
| 3 | Advantages of Using OceanBase | OceanBase offers several advantages such as high performance, reliability, and ease of use. | 0.240174672489083 |
| 2 | Full-Text Search in Databases | Full-text search allows for searching within the text of documents stored in a database. It is particularly useful for finding specific information quickly. | 0.20072992700729927 |
+----+-------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+
3 rows in set
View the query execution plan
EXPLAIN SELECT id, title, content, MATCH(content) AGAINST('OceanBase database') AS score
FROM Articles
WHERE MATCH(content) AGAINST('OceanBase database');
The execution result is as follows:
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan |
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| ============================================================== |
| |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| |
| -------------------------------------------------------------- |
| |0 |SORT | |17 |145 | |
| |1 |└─TEXT RETRIEVAL SCAN|articles(ft1)|17 |138 | |
| ============================================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([articles.id], [articles.title], [articles.content], [MATCH(articles.content) AGAINST('OceanBase database')]), filter(nil), rowset=256 |
| sort_keys([MATCH(articles.content) AGAINST('OceanBase database'), DESC]) |
| 1 - output([articles.id], [articles.content], [articles.title], [MATCH(articles.content) AGAINST('OceanBase database')]), filter(nil), rowset=256 |
| access([articles.id], [articles.content], [articles.title]), partitions(p0) |
| is_index_back=true, is_global_index=false, |
| calc_relevance=true, match_expr(MATCH(articles.content) AGAINST('OceanBase database')), |
| pushdown_match_filter(MATCH(articles.content) AGAINST('OceanBase database')) |
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
15 rows in set
You can refer to the best practices for creating special indexes in the AP scenario. Appropriate use of these indexes can significantly improve the efficiency and performance of database queries.
