When you create a special index in an analytical processing (AP) database, you can follow the following best practices:
JSON multi-value indexes
JSON multi-value indexes
Scenarios
JSON multi-value indexes are specifically designed for array fields in JSON documents. They are ideal for scenarios where queries involve multiple values or attributes, significantly improving query efficiency. Multi-value indexes are particularly useful in the following scenarios:
Many-to-many relationship queries: When two entities have a many-to-many relationship, multi-value indexes can accelerate queries. For example, an actor can appear in multiple movies, and a movie can involve multiple actors. You can store all actors involved in a movie in a JSON array and use a JSON multi-value index to optimize queries for movies featuring specific actors.
Tag and category queries: When entities have multiple tags or categories, multi-value indexes can accelerate queries. For example, a product may have multiple tag attributes. You can store these tags in a JSON array and use a JSON multi-value index to quickly query products with specific tags.
JSON multi-value indexes are commonly used to accelerate queries based on JSON arrays where the WHERE clause contains the following three predicates:
For more information about JSON multi-value indexes, see Multi-value indexes. For more information about JSON multi-value indexes, see Multi-value indexes.
Best practices
The following example shows how to use a JSON multi-value index. Assume that you have a user table that records the user ID, name, age, and hobbies. The hobbies are stored as a JSON array. A user may have multiple hobbies, which can be considered as tags. The table structure for storing user information is as follows:
create table user_info(user_id bigint, name varchar(1024), age bigint, hobbies json);
insert into user_info values(1, "LiLei", 18, '["reading", "knitting", "hiking"]');
insert into user_info values(2, "HanMeimei", 17, '["reading", "Painting", "Swimming"]');
insert into user_info values(3, "XiaoMing", 19, '["hiking", "Camping", "Swimming"]');
In product advertising, you need to accurately target users based on their hobbies. For example, before advertising climbing gear, you need to query users who have the hobby of "hiking". The corresponding query statement is as follows:
OceanBase(root@test)>select user_id, name from user_info where JSON_CONTAINS(hobbies->'$[*]', CAST('["hiking"]' AS JSON));
+---------+----------+
| user_id | name |
+---------+----------+
| 1 | LiLei |
| 3 | XiaoMing |
+---------+----------+
-- The first time you execute this query, it may require a full table scan, which is inefficient.
OceanBase(root@test)>explain select user_id, name from user_info where JSON_CONTAINS(hobbies->'$[*]', CAST('["hiking"]' AS JSON));
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ==================================================== |
| |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| |
| ---------------------------------------------------- |
| |0 |TABLE FULL SCAN|user_info|2 |3 | |
| ==================================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([user_info.user_id], [user_info.name]), filter([JSON_CONTAINS(JSON_EXTRACT(user_info.hobbies, '$[*]'), cast('[\"hiking\"]', JSON(536870911)))]), rowset=16 |
| access([user_info.hobbies], [user_info.user_id], [user_info.name]), partitions(p0) |
| is_index_back=false, is_global_index=false, filter_before_indexback[false], |
| range_key([user_info.__pk_increment]), range(MIN ; MAX)always true |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
From the query plan, you can see that the query requires a full table scan and filters each row of the JSON array. The filtering overhead of JSON is significant. When the number of rows to be filtered reaches a certain level, the query efficiency is greatly affected. In this case, you can create a JSON multi-value index on the hobbies column to significantly improve the query efficiency.
The JSON multi-value index is disabled by default. You need to enable it in the sys tenant.
alter system set _enable_add_fulltext_index_to_existing_table = true;
-- You can see that the query plan shows a full table scan. To optimize performance, we recommend that you create a JSON multi-value index on the hobbies column:
CREATE INDEX idx1 ON user_info ( (CAST(hobbies->'$[*]' AS char(512) ARRAY)) );
-- After the index is created, execute the query again. The performance is significantly improved.
OceanBase(root@test)>explain select user_id, name from user_info where JSON_CONTAINS(hobbies->'$[*]', CAST('["hiking"]' AS JSON));
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ========================================================== |
| |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| |
| ---------------------------------------------------------- |
| |0 |TABLE FULL SCAN|user_info(idx1)|1 |10 | |
| ========================================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([user_info.user_id], [user_info.name]), filter([JSON_CONTAINS(JSON_EXTRACT(user_info.hobbies, '$[*]'), cast('[\"hiking\"]', JSON(536870911)))]) |
| access([user_info.__pk_increment], [user_info.hobbies], [user_info.user_id], [user_info.name]), partitions(p0) |
| is_index_back=true, is_global_index=false, filter_before_indexback[false], |
| range_key([user_info.SYS_NC_mvi_21], [user_info.__pk_increment], [user_info.__doc_id_1733716274684183]), range(hiking,MIN,MIN ; hiking,MAX,MAX) |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
11 rows in set (0.005 sec)
Note that a JSON multi-value index occupies additional storage space and may affect write performance. When you modify a JSON field that contains a multi-value index (for example, you insert, update, or delete data in the field), the index is also updated. Therefore, the write overhead increases. Therefore, you need to weigh the advantages and disadvantages of JSON multi-value indexes and create them as needed.
Full-text index
Full-text index
Scenarios
When you need to perform fuzzy queries on a large amount of text data, the performance of a full-table scan is often insufficient when the text is large and the data volume is high. In addition, complex query scenarios such as approximate matching and relevance sorting are also difficult to support by rewriting SQL statements.
To better support these scenarios, full-text indexes are introduced. They preprocess text content and build keyword indexes to significantly improve the efficiency of full-text searches. Full-text indexes are applicable to various scenarios. The following are some specific examples:
- Internal knowledge base of enterprises: Many large enterprises build their own internal knowledge base systems to store project documents, meeting records, research reports, and other materials. Using full-text indexes can help employees quickly and accurately find the information they need, thereby improving work efficiency.
- Online libraries and e-book platforms: For services that provide a large number of books for users to read, full-text indexes are extremely important. Users can search for books by title, author, or even specific text segments within the books. The system can quickly locate the results based on the full-text index.
- News portals and social media websites: These platforms generate a massive amount of fresh content daily, including articles, posts, and comments. Using full-text indexes allows users to filter information streams based on topics, events, or person names they care about, and obtain the most relevant content.
- Legal document retrieval systems: The legal industry involves a large amount of document review work, such as contracts, court decisions, and legal provisions. An efficient full-text search engine can greatly simplify the workflow for lawyers, allowing them to quickly find precedents, referenced clauses, and relevant legal bases.
- Healthcare information systems: In the medical field, doctors often need to review patients' medical records, the latest medical research papers, and other reference materials. With full-text indexes, healthcare professionals can more easily access the relevant information, enabling more accurate diagnostic decisions.
Any application involving the management and querying of a large amount of unstructured text data can consider using full-text indexes to improve search efficiency. For more information about the full-text search capabilities of OceanBase Database, see Full-text index.
Best practices
Best practices
We define a table to store document data and create a full-text index for the documents. With the full-text index, you can quickly match documents containing the expected keywords and sort them by similarity from high to low.
Create a table and add a full-text index
CREATE TABLE Articles (
id INT AUTO_INCREMENT,
title VARCHAR(255),
content TEXT,
PRIMARY KEY (id),
FULLTEXT ft1 (content) WITH PARSER SPACE
);
Insert sample data
INSERT INTO Articles (title, content) VALUES
('Introduction to OceanBase', 'OceanBase is an open-source relational database management system.'),
('Full-Text Search in Databases', 'Full-text search allows for searching within the text of documents stored in a database. It is particularly useful for finding specific information quickly.'),
('Advantages of Using OceanBase', 'OceanBase offers several advantages such as high performance, reliability, and ease of use. ');
Query all data in the table
SELECT * FROM Articles;
The execution result is as follows:
SELECT * FROM Articles;
The execution result is as follows:
+----+-------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id | title | content |
+----+-------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 | Introduction to OceanBase | OceanBase is an open-source relational database management system. |
| 2 | Full-Text Search in Databases | Full-text search allows for searching within the text of documents stored in a database. It is particularly useful for finding specific information quickly. |
| 3 | Advantages of Using OceanBase | OceanBase offers several advantages such as high performance, reliability, and ease of use. |
+----+-------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
3 rows in set
Execute a full-text search query
SELECT id, title, content, MATCH(content) AGAINST('OceanBase database') AS score
FROM Articles
WHERE MATCH(content) AGAINST('OceanBase database');
The execution result is as follows:
3 rows in set
Execute a full-text search query
SELECT id, title, content, MATCH(content) AGAINST('OceanBase database') AS score
FROM Articles
WHERE MATCH(content) AGAINST('OceanBase database');
The execution result is as follows:
+----+-------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+
| id | title | content | score |
+----+-------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+
| 1 | Introduction to OceanBase | OceanBase is an open-source relational database management system. | 0.5699481865284975 |
| 3 | Advantages of Using OceanBase | OceanBase offers several advantages such as high performance, reliability, and ease of use. | 0.240174672489083 |
| 2 | Full-Text Search in Databases | Full-text search allows for searching within the text of documents stored in a database. It is particularly useful for finding specific information quickly. | 0.20072992700729927 |
+----+-------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+
3 rows in set
View the query execution plan
EXPLAIN SELECT id, title, content, MATCH(content) AGAINST('OceanBase database') AS score
FROM Articles
WHERE MATCH(content) AGAINST('OceanBase database');
The execution result is as follows:
3 rows in set
View the query execution plan
EXPLAIN SELECT id, title, content, MATCH(content) AGAINST('OceanBase database') AS score
FROM Articles
WHERE MATCH(content) AGAINST('OceanBase database');
The execution result is as follows:
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan |
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| ============================================================== |
| |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| |
| -------------------------------------------------------------- |
| |0 |SORT | |17 |145 | |
| |1 |└─TEXT RETRIEVAL SCAN|articles(ft1)|17 |138 | |
| ============================================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([articles.id], [articles.title], [articles.content], [MATCH(articles.content) AGAINST('OceanBase database')]), filter(nil), rowset=256 |
| sort_keys([MATCH(articles.content) AGAINST('OceanBase database'), DESC]) |
| 1 - output([articles.id], [articles.content], [articles.title], [MATCH(articles.content) AGAINST('OceanBase database')]), filter(nil), rowset=256 |
| access([articles.id], [articles.content], [articles.title]), partitions(p0) |
| is_index_back=true, is_global_index=false, |
| calc_relevance=true, match_expr(MATCH(articles.content) AGAINST('OceanBase database')), |
| pushdown_match_filter(MATCH(articles.content) AGAINST('OceanBase database')) |
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
15 rows in set
15 rows in set
You can refer to the best practices for creating special indexes in the database in the AP scenario described above. Applying these indexes appropriately can help improve the efficiency and performance of database queries.
