When you create special indexes in an analytical processing (AP) database, you can follow the best practices below:
JSON multi-value index
Scenarios
JSON multi-value indexes are specifically designed for array fields in JSON documents. They are ideal for scenarios where queries need to handle multiple values or attributes, significantly enhancing query efficiency. Multi-value indexes are particularly useful in the following scenarios:
Many-to-many relationship queries: When two entities have a many-to-many relationship, multi-value indexes can speed up queries. For example, an actor can appear in multiple films, and a film can involve multiple actors. You can store all actors involved in a film in a JSON array and use a JSON multi-value index to optimize queries for films featuring specific actors.
Tag and category queries: When entities have multiple tags or categories, multi-value indexes can accelerate queries. For instance, a product may have multiple tag attributes. By storing these tags in a JSON array, you can quickly query products that include one or more specific tags.
JSON multi-value indexes are commonly used to accelerate queries based on JSON arrays where the WHERE clause includes the following three predicates:
For more information about JSON multi-value indexes, see Multi-value indexes.
Best practice examples
The following example shows how to use a JSON multi-value index. Assume that you have a table that records user information, including user ID, name, age, and hobbies. The hobbies are stored as a JSON array. A user may have multiple hobbies, which can be considered as tags of the user. The table structure and data are as follows:
create table user_info(user_id bigint, name varchar(1024), age bigint, hobbies json);
insert into user_info values(1, "LiLei", 18, '["reading", "knitting", "hiking"]');
insert into user_info values(2, "HanMeimei", 17, '["reading", "Painting", "Swimming"]');
insert into user_info values(3, "XiaoMing", 19, '["hiking", "Camping", "Swimming"]');
In product advertising, you need to precisely locate users based on their hobbies. For example, before advertising climbing equipment, you need to query which users have the hobby of "hiking". The corresponding query statement is as follows:
OceanBase(root@test)>select user_id, name from user_info where JSON_CONTAINS(hobbies->'$[*]', CAST('["hiking"]' AS JSON));
+---------+----------+
| user_id | name |
+---------+----------+
| 1 | LiLei |
| 3 | XiaoMing |
+---------+----------+
-- The initial execution of this query may require a full table scan, which is inefficient.
OceanBase(root@test)>explain select user_id, name from user_info where JSON_CONTAINS(hobbies->'$[*]', CAST('["hiking"]' AS JSON));
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ==================================================== |
| |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| |
| ---------------------------------------------------- |
| |0 |TABLE FULL SCAN|user_info|2 |3 | |
| ==================================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([user_info.user_id], [user_info.name]), filter([JSON_CONTAINS(JSON_EXTRACT(user_info.hobbies, '$[*]'), cast('[\"hiking\"]', JSON(536870911)))]), rowset=16 |
| access([user_info.hobbies], [user_info.user_id], [user_info.name]), partitions(p0) |
| is_index_back=false, is_global_index=false, filter_before_indexback[false], |
| range_key([user_info.__pk_increment]), range(MIN ; MAX)always true |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
From the query plan above, you can see that the entire query requires a full table scan and row-by-row filtering and comparison of the JSON array. The filtering overhead of JSON itself is also significant. When the number of rows to be filtered reaches a certain level, it will greatly affect the query efficiency. In this case, you can create a JSON multi-value index on the hobbies column to significantly improve the query efficiency.
The post-creation feature of JSON multi-value indexes is disabled by default. You need to enable it in the sys tenant.
alter system set _enable_add_fulltext_index_to_existing_table = true;
-- You can see from the query plan that the entire table needs to be scanned. To optimize performance, we recommend that you create a JSON multi-value index on the hobbies column:
CREATE INDEX idx1 ON user_info ( (CAST(hobbies->'$[*]' AS char(512) ARRAY)) );
-- After the index is created, execute the query again. The performance is significantly improved.
OceanBase(root@test)>explain select user_id, name from user_info where JSON_CONTAINS(hobbies->'$[*]', CAST('["hiking"]' AS JSON));
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ========================================================== |
| |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| |
| ---------------------------------------------------------- |
| |0 |TABLE FULL SCAN|user_info(idx1)|1 |10 | |
| ========================================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([user_info.user_id], [user_info.name]), filter([JSON_CONTAINS(JSON_EXTRACT(user_info.hobbies, '$[*]'), cast('[\"hiking\"]', JSON(536870911)))]) |
| access([user_info.__pk_increment], [user_info.hobbies], [user_info.user_id], [user_info.name]), partitions(p0) |
| is_index_back=true, is_global_index=false, filter_before_indexback[false], |
| range_key([user_info.SYS_NC_mvi_21], [user_info.__pk_increment], [user_info.__doc_id_1733716274684183]), range(hiking,MIN,MIN ; hiking,MAX,MAX) |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
11 rows in set (0.005 sec)
Note that a JSON multi-value index occupies additional storage space and may affect write performance. When you modify a JSON field that contains a multi-value index (such as by inserting, updating, or deleting data), the index is updated. This increases the write overhead. Therefore, you need to weigh the advantages and disadvantages of JSON multi-value indexes and create them as needed.
Full-text index
Scenarios
When you need to perform fuzzy queries on a large amount of text data, scanning the entire table to query each row of data may not meet the performance requirements, especially when the text is large and the data volume is high. Additionally, complex query scenarios such as approximate matching and relevance sorting are difficult to support by rewriting SQL statements.
To better support these scenarios, full-text indexes were introduced. By preprocessing the text content and establishing keyword indexes, full-text indexes significantly improve the efficiency of full-text searches. Full-text indexes are applicable to various scenarios. Below are some specific examples:
- Internal knowledge base of enterprises: Many large enterprises build their own internal knowledge base systems to store project documents, meeting records, research reports, and other materials. Using full-text indexes helps employees quickly and accurately find the required information, thereby improving work efficiency.
- Online libraries and e-book platforms: For services that provide a large number of books for users to read, full-text indexes are extremely important. Users can search by entering the book title, author name, or even a specific segment of text from the book as keywords. The system can quickly locate the results that match the criteria based on the full-text index.
- News portals and social media websites: These platforms generate a massive amount of fresh content daily, including articles, posts, and comments. By utilizing full-text indexes, users can filter information streams based on topics, events, or person names they care about, and retrieve the most relevant content.
- Legal document retrieval systems: The legal industry involves a significant amount of document review work, such as contracts, judgments, and legal provisions. An efficient full-text search engine can greatly simplify the workflow for lawyers, allowing them to quickly find precedents, cited clauses, and relevant legal grounds.
- Healthcare information systems: In the medical field, doctors often need to review patients' medical histories, the latest medical research papers, and other reference materials. With full-text indexes, healthcare professionals can conveniently access the required information, enabling more accurate diagnostic decisions.
Any application that involves managing and querying a large amount of unstructured text data can consider using full-text indexes to enhance search efficiency. For more information about OceanBase's full-text search capabilities, see Full-text index.
Best practices
We define a table to store document data and set up a full-text index for the documents. Using the full-text index, we can quickly match documents containing the desired keywords and sort them by similarity from highest to lowest.
Create a table and add a full-text index
CREATE TABLE Articles (
id INT AUTO_INCREMENT,
title VARCHAR(255),
content TEXT,
PRIMARY KEY (id),
FULLTEXT ft1 (content) WITH PARSER SPACE
);
Insert sample data
INSERT INTO Articles (title, content) VALUES
('Introduction to OceanBase', 'OceanBase is an open-source relational database management system.'),
('Full-Text Search in Databases', 'Full-text search allows for searching within the text of documents stored in a database. It is particularly useful for finding specific information quickly.'),
('Advantages of Using OceanBase', 'OceanBase offers several advantages such as high performance, reliability, and ease of use. ');
Query all data in the table
SELECT * FROM Articles;
The execution result is as follows:
+----+-------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id | title | content |
+----+-------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 | Introduction to OceanBase | OceanBase is an open-source relational database management system. |
| 2 | Full-Text Search in Databases | Full-text search allows for searching within the text of documents stored in a database. It is particularly useful for finding specific information quickly. |
| 3 | Advantages of Using OceanBase | OceanBase offers several advantages such as high performance, reliability, and ease of use. |
+----+-------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
3 rows in set
Execute a full-text search query
SELECT id, title, content, MATCH(content) AGAINST('OceanBase database') AS score
FROM Articles
WHERE MATCH(content) AGAINST('OceanBase database');
The execution result is as follows:
+----+-------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+
| id | title | content | score |
+----+-------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+
| 1 | Introduction to OceanBase | OceanBase is an open-source relational database management system. | 0.5699481865284975 |
| 3 | Advantages of Using OceanBase | OceanBase offers several advantages such as high performance, reliability, and ease of use. | 0.240174672489083 |
| 2 | Full-Text Search in Databases | Full-text search allows for searching within the text of documents stored in a database. It is particularly useful for finding specific information quickly. | 0.20072992700729927 |
+----+-------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+
3 rows in set
View the query execution plan
EXPLAIN SELECT id, title, content, MATCH(content) AGAINST('OceanBase database') AS score
FROM Articles
WHERE MATCH(content) AGAINST('OceanBase database');
The execution result is as follows:
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan |
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| ============================================================== |
| |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| |
| -------------------------------------------------------------- |
| |0 |SORT | |17 |145 | |
| |1 |└─TEXT RETRIEVAL SCAN|articles(ft1)|17 |138 | |
| ============================================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([articles.id], [articles.title], [articles.content], [MATCH(articles.content) AGAINST('OceanBase database')]), filter(nil), rowset=256 |
| sort_keys([MATCH(articles.content) AGAINST('OceanBase database'), DESC]) |
| 1 - output([articles.id], [articles.content], [articles.title], [MATCH(articles.content) AGAINST('OceanBase database')]), filter(nil), rowset=256 |
| access([articles.id], [articles.content], [articles.title]), partitions(p0) |
| is_index_back=true, is_global_index=false, |
| calc_relevance=true, match_expr(MATCH(articles.content) AGAINST('OceanBase database')), |
| pushdown_match_filter(MATCH(articles.content) AGAINST('OceanBase database')) |
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
15 rows in set
You can refer to the best practices for creating special indexes in the database for the AP scenario mentioned above. Properly applying these indexes can help improve the efficiency and performance of database queries.