In analytical processing (AP) databases, you can follow the best practices below when deciding to create special indexes:
JSON multi-valued index
Scenarios
A JSON multi-valued index is a type of index designed specifically for array fields within JSON documents. It is suitable for scenarios where you need to query multiple values or attributes and can significantly improve query efficiency. The main applicable scenarios for multi-valued indexes include:
Queries in many-to-many relationships: When there is a many-to-many relationship between two entities, a multi-valued index can accelerate queries. For example, an actor can participate in multiple movies, and a movie may feature multiple actors. You can use a JSON array to store all actors involved in a movie and leverage a JSON multi-valued index to optimize queries for movies featuring a specific actor.
Queries based on tags and categories: When an entity has multiple tags or categories, a multi-valued index can speed up queries. For example, a product may have multiple tag attributes. By storing these tags in a JSON array, you can quickly query products that contain one or more specific tags.
JSON multi-valued indexes are often used to accelerate queries based on JSON arrays where the WHERE clause includes any of the following three predicates:
For more information about JSON multi-valued indexes, see Multi-valued indexes.
Best practice examples
The following example describes an application scenario of a JSON multi-valued index. Assume you have a user information table that records user ID, name, age, and hobbies, where hobbies are stored as a JSON array. A user can have multiple hobbies, which can be considered user tags. The table structure and sample data are as follows:
create table user_info(user_id bigint, name varchar(1024), age bigint, hobbies json);
insert into user_info values(1, "LiLei", 18, '["reading", "knitting", "hiking"]');
insert into user_info values(2, "HanMeimei", 17, '["reading", "Painting", "Swimming"]');
insert into user_info values(3, "XiaoMing", 19, '["hiking", "Camping", "Swimming"]');
In a product advertisement campaign, it is often necessary to precisely target users based on their hobbies. For example, before placing ads for mountaineering equipment, you need to find out which users have the "hiking" hobby. The corresponding query statement is as follows:
OceanBase(root@test)>select user_id, name from user_info where JSON_CONTAINS(hobbies->'$[*]', CAST('["hiking"]' AS JSON));
+---------+----------+
| user_id | name |
+---------+----------+
| 1 | LiLei |
| 3 | XiaoMing |
+---------+----------+
-- The initial execution of the query statement may require a full table scan, which is inefficient.
OceanBase(root@test)>explain select user_id, name from user_info where JSON_CONTAINS(hobbies->'$[*]', CAST('["hiking"]' AS JSON));
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ==================================================== |
| |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| |
| ---------------------------------------------------- |
| |0 |TABLE FULL SCAN|user_info|2 |3 | |
| ==================================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([user_info.user_id], [user_info.name]), filter([JSON_CONTAINS(JSON_EXTRACT(user_info.hobbies, '$[*]'), cast('[\"hiking\"]', JSON(536870911)))]), rowset=16 |
| access([user_info.hobbies], [user_info.user_id], [user_info.name]), partitions(p0) |
| is_index_back=false, is_global_index=false, filter_before_indexback[false], |
| range_key([user_info.__pk_increment]), range(MIN ; MAX)always true |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
The preceding query plan shows that the entire table needs to be scanned and the JSON array needs to be filtered row by row. JSON filtering itself incurs considerable overhead. When the number of records to be filtered reaches a certain threshold, query efficiency can be significantly affected. In this case, you can create a JSON multi-valued index on the hobbies column to significantly improve query efficiency.
By default, the JSON multi-valued index is not enabled. You need to enable it in the sys tenant.
alter system set _enable_add_fulltext_index_to_existing_table = true;
-- You can see from the query plan that a full table scan is required. To improve performance, it is recommended that you create a JSON multi-valued index on the hobbies column:
CREATE INDEX idx1 ON user_info ( (CAST(hobbies->'$[*]' AS char(512) ARRAY)) );
-- After the index is created, the performance improves significantly upon executing the same query.
OceanBase(root@test)>explain select user_id, name from user_info where JSON_CONTAINS(hobbies->'$[*]', CAST('["hiking"]' AS JSON));
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ========================================================== |
| |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| |
| ---------------------------------------------------------- |
| |0 |TABLE FULL SCAN|user_info(idx1)|1 |10 | |
| ========================================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([user_info.user_id], [user_info.name]), filter([JSON_CONTAINS(JSON_EXTRACT(user_info.hobbies, '$[*]'), cast('[\"hiking\"]', JSON(536870911)))]) |
| access([user_info.__pk_increment], [user_info.hobbies], [user_info.user_id], [user_info.name]), partitions(p0) |
| is_index_back=true, is_global_index=false, filter_before_indexback[false], |
| range_key([user_info.SYS_NC_mvi_21], [user_info.__pk_increment], [user_info.__doc_id_1733716274684183]), range(hiking,MIN,MIN ; hiking,MAX,MAX) |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
11 rows in set (0.005 sec)
Note that a JSON multi-valued index consumes additional storage space and may affect the write performance. When a JSON field with a multi-valued index is modified (for example, inserted, updated, or deleted), the index is also updated, which increases the write overheads. Therefore, you need to weigh the pros and cons of the JSON multi-valued index and create it as needed.
Full-text indexing
Scenarios
When full-table scan is used to perform fuzzy queries on each row of data, the performance cannot meet the requirement in scenarios involving a large amount of text data that requires fuzzy search. Additionally, some complex query scenarios, such as approximate matching and relevance ranking, are also difficult to support through SQL rewriting.
To better support these scenarios, full-text indexes are introduced. Full-text indexes improve the efficiency of full-text search by preprocessing text content to build keyword indexes. Full-text indexes are applicable to various scenarios. Some specific cases are described as follows:
- Intranet knowledge base: Many large enterprises build their own intranet knowledge bases to store project documents, meeting records, research reports, and other materials. Full-text indexes help employees quickly and accurately find the required information, improving work efficiency.
- Online libraries and e-book platforms: Full-text indexes are extremely important for services that provide a large number of books for users to read. Users can enter book names, author names, or even a piece of text from the book as keywords for search. The system can quickly locate results that meet the criteria based on full-text indexes.
- News portals and social media websites: These platforms generate a massive amount of new content every day, including articles, posts, and comments. Full-text indexes allow users to filter information streams based on topics, events, or names of people they are interested in, to obtain the most relevant content.
- Legal document retrieval systems: The legal industry involves a large amount of document review work, such as contracts, judgments, and legal provisions. An efficient full-text search engine can greatly simplify the workflow of lawyers, allowing them to quickly find precedents, cited clauses, and relevant legal basis.
- Medical health information systems: In the medical field, doctors frequently need to review patients' historical cases, the latest medical research papers, and other reference materials. Full-text indexes enable healthcare workers to conveniently access the required information, facilitating more accurate diagnostic decisions.
Full-text indexes can be considered for any application involving the management and query of a large amount of unstructured text data. For more information about the full-text search capability of OceanBase Database, see Full-text indexes.
Best practice example
We define a table to store document information and set full-text indexes on the documents. Full-text indexes allow for quick matching of documents containing the desired keywords and sorting the results by relevance from highest to lowest.
obclient> CREATE TABLE Articles (
-> id INT AUTO_INCREMENT,
-> title VARCHAR(255) ,
-> content TEXT ,
-> PRIMARY KEY (id),
-> FULLTEXT ft1 (content) WITH PARSER SPACE
-> );
Query OK, 0 rows affected (0.67 sec)
-- Insert data.
obclient> INSERT INTO Articles (title, content) VALUES
-> ('Introduction to OceanBase', 'OceanBase is an open-source relational database management system.'),
-> ('Full-Text Search in Databases', 'Full-text search allows for searching within the text of documents stored in a database. It is particularly useful for finding specific information quickly.'),
-> ('Advantages of Using OceanBase', 'OceanBase offers several advantages such as high performance, reliability, and ease of use. ');
Query OK, 3 rows affected (0.10 sec)
Records: 3 Duplicates: 0 Warnings: 0
obclient> select * from Articles;
+----+-------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id | title | content |
+----+-------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 | Introduction to OceanBase | OceanBase is an open-source relational database management system. |
| 2 | Full-Text Search in Databases | Full-text search allows for searching within the text of documents stored in a database. It is particularly useful for finding specific information quickly. |
| 3 | Advantages of Using OceanBase | OceanBase offers several advantages such as high performance, reliability, and ease of use. |
+----+-------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
3 rows in set (0.08 sec)
-- Search for matching documents.
obclient> select id,title, content,match(content) against('OceanBase database') score from Articles where match(content) against('OceanBase database');
+----+-------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+
| id | title | content | score |
+----+-------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+
| 1 | Introduction to OceanBase | OceanBase is an open-source relational database management system. | 0.5699481865284975 |
| 3 | Advantages of Using OceanBase | OceanBase offers several advantages such as high performance, reliability, and ease of use. | 0.240174672489083 |
| 2 | Full-Text Search in Databases | Full-text search allows for searching within the text of documents stored in a database. It is particularly useful for finding specific information quickly. | 0.20072992700729927 |
+----+-------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+
3 rows in set (0.08 sec)
-- You can run the EXPLAIN statement to view the query plan and analyze its performance.
obclient> explain select id,title, content,match(content) against('OceanBase database') score from Articles where match(content) against('OceanBase database');
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan |
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| ============================================================== |
| |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| |
| -------------------------------------------------------------- |
| |0 |SORT | |17 |145 | |
| |1 |└─TEXT RETRIEVAL SCAN|articles(ft1)|17 |138 | |
| ============================================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([articles.id], [articles.title], [articles.content], [MATCH(articles.content) AGAINST('OceanBase database')]), filter(nil), rowset=256 |
| sort_keys([MATCH(articles.content) AGAINST('OceanBase database'), DESC]) |
| 1 - output([articles.id], [articles.content], [articles.title], [MATCH(articles.content) AGAINST('OceanBase database')]), filter(nil), rowset=256 |
| access([articles.id], [articles.content], [articles.title]), partitions(p0) |
| is_index_back=true, is_global_index=false, |
| calc_relevance=true, match_expr(MATCH(articles.content) AGAINST('OceanBase database')), |
| pushdown_match_filter(MATCH(articles.content) AGAINST('OceanBase database')) |
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
15 rows in set (0.08 sec)
You can refer to the best practices for creating special indexes in the preceding database scenario. Proper use of these indexes can help improve query efficiency and performance of the database.