This topic introduces how to store unstructured, semi-structured, and structured data in a unified way within OceanBase. This not only fully leverages the foundational capabilities of OceanBase Database, but also provides strong support for hybrid search.
How it works
OceanBase Database can store data of different modalities and supports hybrid search by converting various types of data (such as text, images, and videos) into vectors. Searches are performed by calculating the distances between these vectors. Hybrid search can be divided into two types: simple search, which is based on similarity search for a single vector, and complex search, which involves combining vector and scalar searches.
Since vector search is inherently approximate, it is necessary to employ multiple techniques in practical applications to improve accuracy. Only precise search results can deliver greater value to your business.
Configure vector index memory
OceanBase vector search allows you to configure vector index memory by setting ob_vector_memory_limit_percentage:
Before V4.3.5 BP3, you need to manually set
ob_vector_memory_limit_percentageto enable vector search. We recommend that you set it to30for optimal search performance. If you use the default value, no memory is allocated for vector indexes, and an error will be returned when you create an index. Here is an example:ALTER SYSTEM SET ob_vector_memory_limit_percentage = 30;Starting from V4.3.5 BP3, the vector search feature is enabled by default. The default value
0indicates adaptive mode, where the system automatically adjusts the memory usage ratio for vector index data within the tenant, so manual adjustment is not required.- When the tenant’s actual memory is 8 GB or less, the value is set to
40. - When the tenant’s actual memory is more than 8 GB, the value is set to
50.
- When the tenant’s actual memory is 8 GB or less, the value is set to
Create a vector column
The following example shows a table that stores vector data, spatial data, and relational data. The data type of the vector column is VECTOR, and the dimension must be specified when the column is created. The maximum supported dimension is 16,000. The data type of the spatial column is GEOMETRY:
CREATE TABLE t (
-- Store relational data (structured data).
id INT PRIMARY KEY,
-- Store spatial data (semi-structured data).
g GEOMETRY,
-- Store vector data (unstructured data).
vec VECTOR(3)
);
Use the INSERT statement to insert vector data
After you create a table that contains a column of the VECTOR data type, you can use the INSERT statement to insert vectors into the table. When you insert data, the vector must have the same dimension as specified when the table is created. Otherwise, an error will be returned. This design ensures data consistency and query efficiency. Vectors are represented in standard floating-point number arrays. Each dimension must have a valid floating-point number. Here is a simple example:
INSERT INTO t (id, g, vec) VALUES (
-- Insert structured data.
1,
-- Insert semi-structured data.
ST_GeomFromText('POINT(1 1)'),
-- Insert unstructured data.
'[0.1, 0.2, 0.3]'
);