When creating special indexes in an AP (analytical processing) database, you can follow these best practices:
JSON Multi-Valued Indexes
Scenarios
A JSON multi-value index is a type of index specifically designed for array fields in JSON documents. It is suitable for scenarios where queries need to search for multiple values or attributes, significantly improving query efficiency. Multi-value indexes are primarily applicable in the following scenarios:
Many-to-many relationship queries: When two entities have a many-to-many relationship, a multi-value index can accelerate queries. For example, an actor can appear in multiple movies, and a movie can involve multiple actors. You can store all the actors involved in a movie in a JSON array and use a JSON multi-value index to optimize queries for movies featuring a specific actor.
Tag and category queries: When an entity has multiple tags or categories, a multi-value index can accelerate queries. For example, a product may have multiple tag attributes. You can store these tags in a JSON array and use a JSON multi-value index to quickly query products with specific tags or multiple tags.
A JSON multi-value index is commonly used to accelerate queries based on JSON arrays where the WHERE clause contains the following three predicates:
For more information about JSON multi-value indexes, see Multi-value indexes.
Best practices
The following example shows how to use a JSON multi-value index. Assume that you have a table named user_info that records the user ID, name, age, and hobbies of users. The hobbies of a user are stored in a JSON array. A user may have multiple hobbies, which can be considered as tags of the user. The table structure for storing user information is as follows:
create table user_info(user_id bigint, name varchar(1024), age bigint, hobbies json);
insert into user_info values(1, "LiLei", 18, '["reading", "knitting", "hiking"]');
insert into user_info values(2, "HanMeimei", 17, '["reading", "Painting", "Swimming"]');
insert into user_info values(3, "XiaoMing", 19, '["hiking", "Camping", "Swimming"]');
In the case of advertising products, you need to accurately locate users based on their hobbies. For example, before advertising outdoor equipment for hiking, you need to query which users have the hobby of "hiking". The corresponding query statement is as follows:
OceanBase(root@test)>select user_id, name from user_info where JSON_CONTAINS(hobbies->'$[*]', CAST('["hiking"]' AS JSON));
+---------+----------+
| user_id | name |
+---------+----------+
| 1 | LiLei |
| 3 | XiaoMing |
+---------+----------+
-- The first time you execute this query, a full table scan may be required, which is inefficient.
OceanBase(root@test)>explain select user_id, name from user_info where JSON_CONTAINS(hobbies->'$[*]', CAST('["hiking"]' AS JSON));
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ==================================================== |
| |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| |
| ---------------------------------------------------- |
| |0 |TABLE FULL SCAN|user_info|2 |3 | |
| ==================================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([user_info.user_id], [user_info.name]), filter([JSON_CONTAINS(JSON_EXTRACT(user_info.hobbies, '$[*]'), cast('[\"hiking\"]', JSON(536870911)))]), rowset=16 |
| access([user_info.hobbies], [user_info.user_id], [user_info.name]), partitions(p0) |
| is_index_back=false, is_global_index=false, filter_before_indexback[false], |
| range_key([user_info.__pk_increment]), range(MIN ; MAX)always true |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
From the query plan, you can see that the entire query requires a full table scan and filters the JSON array row by row. The filtering overhead of the JSON array is significant. When the number of rows to be filtered reaches a certain level, the query efficiency is greatly affected. In this case, you can create a JSON multi-value index on the hobbies column to significantly improve the query efficiency.
The JSON multi-value index is disabled by default. You need to enable it in the sys tenant.
alter system set _enable_add_fulltext_index_to_existing_table = true;
-- You can see that the query plan shows a full table scan. To optimize performance, it is recommended to create a JSON multi-value index on the hobbies column:
CREATE INDEX idx1 ON user_info ( (CAST(hobbies->'$[*]' AS char(512) ARRAY)) );
-- After creating the index, re-executing the query significantly improves performance.
OceanBase(root@test)>explain select user_id, name from user_info where JSON_CONTAINS(hobbies->'$[*]', CAST('["hiking"]' AS JSON));
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ========================================================== |
| |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| |
| ---------------------------------------------------------- |
| |0 |TABLE FULL SCAN|user_info(idx1)|1 |10 | |
| ========================================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([user_info.user_id], [user_info.name]), filter([JSON_CONTAINS(JSON_EXTRACT(user_info.hobbies, '$[*]'), cast('[\"hiking\"]', JSON(536870911)))]) |
| access([user_info.__pk_increment], [user_info.hobbies], [user_info.user_id], [user_info.name]), partitions(p0) |
| is_index_back=true, is_global_index=false, filter_before_indexback[false], |
| range_key([user_info.SYS_NC_mvi_21], [user_info.__pk_increment], [user_info.__doc_id_1733716274684183]), range(hiking,MIN,MIN ; hiking,MAX,MAX) |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
11 rows in set (0.005 sec)
Note that JSON multi-value indexes consume additional storage space and may impact write performance. When modifying a JSON field that contains a multi-value index (such as inserting, updating, or deleting data), the index will also be updated, increasing write overhead. Therefore, it is important to weigh the advantages and disadvantages of JSON multi-value indexes and create them as needed.
Full-text index
Scenarios
In scenarios involving large volumes of text data that require fuzzy searches, performing a full-table scan to query each row for fuzzy matches can be inefficient, especially when the text is large and the dataset is extensive. Additionally, complex query scenarios such as approximate matching and relevance-based sorting are challenging to support through SQL rewriting alone.
To address these scenarios, full-text indexes were introduced. By preprocessing text content and establishing keyword indexes, full-text indexes significantly enhance the efficiency of full-text searches. Full-text indexes are applicable in various scenarios. Below are some specific examples:
- Internal knowledge bases: Many large enterprises build internal knowledge base systems to store project documents, meeting records, research reports, and other materials. Using full-text indexes allows employees to quickly and accurately locate the information they need, thereby improving work efficiency.
- Online libraries and e-book platforms: For services that provide a large number of books for users to read, full-text indexes are crucial. Users can search by book title, author name, or even specific text segments within the book. The system leverages full-text indexes to swiftly locate results that match the search criteria.
- News portals and social media websites: These platforms generate a massive amount of fresh content daily, including articles, posts, and comments. Full-text indexes enable users to filter information streams based on topics, events, or person names they are interested in, helping them access the most relevant content.
- Legal document retrieval systems: The legal industry involves extensive document review tasks, such as contracts, court rulings, and legal statutes. An efficient full-text search engine can greatly simplify the workflow for lawyers, allowing them to quickly find precedents, cited provisions, and relevant legal references.
- Healthcare information systems: In the medical field, doctors often need to review patient medical records, the latest medical research papers, and other reference materials. With full-text indexes, healthcare professionals can more easily access relevant information, enabling more accurate diagnostic decisions.
Any application involving the management and querying of large volumes of unstructured text data can benefit from implementing full-text indexes to enhance search efficiency. For more information about OceanBase Database's full-text search capabilities, see Full-text index.
Best practice example
We define a table to store document information and set up a full-text index for the documents. Using the full-text index, we can quickly match documents containing the desired keywords and sort them by similarity in descending order.
Create a table and add a full-text index
CREATE TABLE Articles (
id INT AUTO_INCREMENT,
title VARCHAR(255),
content TEXT,
PRIMARY KEY (id),
FULLTEXT ft1 (content) WITH PARSER SPACE
);
Insert sample data
INSERT INTO Articles (title, content) VALUES
('Introduction to OceanBase', 'OceanBase is an open-source relational database management system.'),
('Full-Text Search in Databases', 'Full-text search allows for searching within the text of documents stored in a database. It is particularly useful for finding specific information quickly.'),
('Advantages of Using OceanBase', 'OceanBase offers several advantages such as high performance, reliability, and ease of use. ');
Query all data in the table
SELECT * FROM Articles;
The execution result is as follows:
+----+-------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id | title | content |
+----+-------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 | Introduction to OceanBase | OceanBase is an open-source relational database management system. |
| 2 | Full-Text Search in Databases | Full-text search allows for searching within the text of documents stored in a database. It is particularly useful for finding specific information quickly. |
| 3 | Advantages of Using OceanBase | OceanBase offers several advantages such as high performance, reliability, and ease of use. |
+----+-------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
3 rows in set
Execute a full-text search query
SELECT id, title, content, MATCH(content) AGAINST('OceanBase database') AS score
FROM Articles
WHERE MATCH(content) AGAINST('OceanBase database');
The execution result is as follows:
+----+-------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+
| id | title | content | score |
+----+-------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+
| 1 | Introduction to OceanBase | OceanBase is an open-source relational database management system. | 0.5699481865284975 |
| 3 | Advantages of Using OceanBase | OceanBase offers several advantages such as high performance, reliability, and ease of use. | 0.240174672489083 |
| 2 | Full-Text Search in Databases | Full-text search allows for searching within the text of documents stored in a database. It is particularly useful for finding specific information quickly. | 0.20072992700729927 |
+----+-------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+
3 rows in set
View the query execution plan
EXPLAIN SELECT id, title, content, MATCH(content) AGAINST('OceanBase database') AS score
FROM Articles
WHERE MATCH(content) AGAINST('OceanBase database');
The execution result is as follows:
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan |
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| ============================================================== |
| |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| |
| -------------------------------------------------------------- |
| |0 |SORT | |17 |145 | |
| |1 |└─TEXT RETRIEVAL SCAN|articles(ft1)|17 |138 | |
| ============================================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([articles.id], [articles.title], [articles.content], [MATCH(articles.content) AGAINST('OceanBase database')]), filter(nil), rowset=256 |
| sort_keys([MATCH(articles.content) AGAINST('OceanBase database'), DESC]) |
| 1 - output([articles.id], [articles.content], [articles.title], [MATCH(articles.content) AGAINST('OceanBase database')]), filter(nil), rowset=256 |
| access([articles.id], [articles.content], [articles.title]), partitions(p0) |
| is_index_back=true, is_global_index=false, |
| calc_relevance=true, match_expr(MATCH(articles.content) AGAINST('OceanBase database')), |
| pushdown_match_filter(MATCH(articles.content) AGAINST('OceanBase database')) |
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
15 rows in set
You can refer to the best practice guide for creating special indexes in the database in the AP scenario mentioned above. Properly applying these indexes can help improve the efficiency and performance of database queries.
