This topic introduces how to store unstructured, semi-structured, and structured data in a unified way within OceanBase. This not only fully leverages the foundational capabilities of OceanBase Database, but also provides strong support for hybrid search.
How it works
OceanBase Database can store data of different modalities and supports hybrid search by converting various types of data (such as text, images, and videos) into vectors. Searches are performed by calculating the distances between these vectors. Hybrid search can be divided into two types: simple search, which is based on similarity search for a single vector, and complex search, which involves combining vector and scalar searches.
Since vector search is inherently approximate, it is necessary to employ multiple techniques in practical applications to improve accuracy. Only precise search results can deliver greater value to your business.
Configure vector index memory
OceanBase vector search uses ob_vector_memory_limit_percentage to control vector index memory:
Before V4.4.1, you must set
ob_vector_memory_limit_percentagebefore using HNSW, HNSW_SQ, or HNSW_BQ indexes. We recommend30for best search performance. If you leave the default, no memory is allocated for vector indexes and index creation fails. IVF and IVF_PQ indexes do not require resident memory, so you can ignore this parameter for them. Example:ALTER SYSTEM SET ob_vector_memory_limit_percentage = 30;From V4.4.1, vector search is on by default. The default
0means adaptive mode: the system sets the memory ratio for vector index data in the tenant automatically.- If tenant memory is 8 GB or less, the effective value is
40. - If tenant memory is greater than 8 GB, the effective value is
50.
- If tenant memory is 8 GB or less, the effective value is
Create a vector column
The example below shows a table that stores vector data, spatial data, and relational data together. The vector column uses the VECTOR type; you must specify its dimension at creation time (up to 16,000). The spatial column uses the GEOMETRY type:
CREATE TABLE t (
-- Store relational data (structured data).
id INT PRIMARY KEY,
-- Store spatial data (semi-structured data).
g GEOMETRY,
-- Store vector data (unstructured data).
vec VECTOR(3)
);
Insert vector data with INSERT
Once you have created a table with a VECTOR column, you can insert vectors with the INSERT statement. The vector dimension must match the column definition or an error is returned. This keeps data consistent and queries efficient. Vectors are written as arrays of floats; each dimension must be a valid float. Example:
INSERT INTO t (id, g, vec) VALUES (
-- Insert structured data.
1,
-- Insert semi-structured data.
ST_GeomFromText('POINT(1 1)'),
-- Insert unstructured data.
'[0.1, 0.2, 0.3]'
);