This topic describes the search index in OceanBase Database in MySQL mode. A search index organizes the paths and values in a column in an inverted manner. It provides index acceleration for columns of the JSON, single-level array, and various scalar types. It is commonly used for semistructured filtering in hybrid search and for multi-column indexes and index merge queries on analytical wide tables.
Concepts
A search index organizes the values (and JSON paths and values) in an indexed column into index entries that can be quickly filtered. A single index can cover multiple columns, and you can use the WITH clause to precisely control the paths and types in a JSON column. A search index is suitable for semistructured filtering and wide-table multi-column filtering. It can be combined with B-tree, full-text, and vector indexes. The optimizer can perform index merge queries by identifying the optimal combination of indexes and improving performance.
Compared with creating a function index for each access path or expanding multi-value indexes for array elements, a search index allows you to create a single index that covers multiple paths. This reduces the number of indexes and maintenance costs and facilitates handling of semistructured data with multiple paths and diverse query patterns.
Scenarios
Hybrid search
When a business table contains path filtering conditions on columns of vector, full-text, scalar, JSON, and single-level array types, you often need to combine multiple predicates in a single request. A search index provides a unified inverted index support for such semistructured conditions, reducing the combinatorial explosion of "one index per path." It is easy to combine with the hybrid search feature. This topic provides a simplified example of declaring a search index in the same table as a vector index in the "Performance optimization" section.
Multi-column indexes and index merge queries
Analytical workloads often involve wide tables with many columns and filtering dimensions. You often need to create indexes on multiple columns. In some analytical databases, you can create a nearly full-column index by maintaining indexes on all columns, so that conditions on any column can be indexed. A search index allows you to create indexes on multiple columns, aligning with the modeling habit of creating indexes on multiple columns or even all columns.
When multiple indexes are available on the same table, you can perform index merge queries. The optimizer automatically identifies the optimal combination of indexes and selects the best joint scanning method. This reduces full-table scans when multiple column predicates are present, providing a query experience similar to that of analytical products. When migrating from an analytical database or running in dual mode, you can also consider "multi-column indexes + joint scanning" as a reference for workload handling, in addition to semistructured path filtering.
Implementation details (conceptual layer)
- Generalized Inverted Index (GIN): An index that organizes paths and values in an inverted manner, supporting multiple operators and paths.
- Full-path index: An index that maintains inverted indexes for all indexable paths in a column. During queries, the paths are pushed down to the index for access, rather than relying on materialized secondary indexes for each path (specific pushdown and optimization behaviors depend on the actual version and plan).
References
- Overview and location of index types: Index overview
- Usage guide (data types, predicates, and examples): Search index
- Hybrid search (scenario-oriented): Hybrid search
