This topic describes the compatibility of OceanBase's vector search feature with the data models and SDK interfaces of Milvus, as well as the mappings between relevant concepts.
Concept mappings
To help users familiar with Milvus quickly get started with OceanBase's vector storage capabilities, we analyze the similarities and differences between the two systems and provide a mapping of related concepts.
Data models
| Data model layer | Milvus | OceanBase | Description |
|---|---|---|---|
| First layer | Shards | Partition | Milvus specifies partition rules by setting some columns as partition_key in the schema definition.OceanBase supports range/range columns, list/list columns, hash, key, and subpartitioning strategies. |
| Second layer | Partitions | ≈Tablet | Milvus enhances read performance by chunking the same shard (shards are usually partitioned by primary key) based on other columns. OceanBase implements this by sorting keys within a partition. |
| Third layer | Segments | MemTable+SSTable | Both have a minor compaction mechanism. |
SDKs
This section introduces the conceptual differences between OceanBase's vector storage SDK, pyobvector, and Milvus's SDK, pymilvus.
pyobvector supports two usage modes:
pymilvus-compatible mode: This mode is compatible with common interfaces of Milvus clients. Users familiar with Milvus can use this mode without learning new concepts.
SQLAlchemy extension mode: This mode can be used as a vector feature extension of python SQLAlchemy, retaining the operation mode of a relational database. Concepts need to be mapped in this mode.
For more information about pyobvector's APIs, see pyobvector Python SDK.
The following table describes the concept mappings between pyobvector's SQLAlchemy extension mode and pymilvus:
| pymilvus | pyobvector | Description |
|---|---|---|
| Database | Database | Database |
| Collection | Table | Table |
| Field | Column | Column |
| Primary Key | Primary Key | Primary key |
| Vector Field | Vector Column | Vector column |
| Index | Index | Index |
| Partition | Partition | Partition |
| DataType | DataType | Data type |
| Metric Type | Distance Function | Distance function |
| Search | Query | Query |
| Insert | Insert | Insert |
| Delete | Delete | Delete |
| Update | Update | Update |
| Batch | Batch | Batch operations |
| Transaction | Transaction | Transaction |
| NONE | Not supported | NULL value |
| BOOL | Boolean | Corresponds to the MySQL TINYINT type |
| INT8 | Boolean | Corresponds to the MySQL TINYINT type |
| INT16 | SmallInteger | Corresponds to the MySQL SMALLINT type |
| INT32 | Integer | Corresponds to the MySQL INT type |
| INT64 | BigInteger | Corresponds to the MySQL BIGINT type |
| FLOAT | Float | Corresponds to the MySQL FLOAT type |
| DOUBLE | Double | Corresponds to the MySQL DOUBLE type |
| STRING | LONGTEXT | Corresponds to the MySQL LONGTEXT type |
| VARCHAR | STRING | Corresponds to the MySQL VARCHAR type |
| JSON | JSON | For more information, see pyobvector Python SDK. |
| FLOAT_VECTOR | VECTOR | Vector type |
| BINARY_VECTOR | Not supported | |
| FLOAT16_VECTOR | Not supported | |
| BFLOAT16_VECTOR | Not supported | |
| SPARSE_FLOAT_VECTOR | Not supported | |
| dynamic_field | Not supported | The hidden $meta metadata column in Milvus.You can create a JSON-type column in OceanBase Database. |
Compatibility with Milvus
Milvus SDK
Except load_collection(), release_collection(), and close(), which are supported through SQLAlchemy, all operations listed in the following tables are supported.
Operations related to collections
| Operation | Description |
|---|---|
| create_collection() | Creates a vector table based on the given schema. |
| get_collection_stats() | Queries the statistics of a table, such as the number of rows in the table. |
| describe_collection() | Queries the metadata of a vector table. |
| has_collection() | Checks whether a table exists. |
| list_collections() | Lists the existing tables. |
| drop_collection() | Drops a table. |
Operations related to column and schema definition
| Operation | Description |
|---|---|
| create_schema() | Creates a schema with column definitions. |
| add_field() | Adds column definitions. The call sequence is as follows: create_schema > add_field > ... > add_field. You can also manually build a FieldSchema list and then use the CollectionSchema constructor function to create a schema. |
Operations related to vector indexes
| Operation | Description |
|---|---|
| list_indexes() | Lists all indexes. |
| create_index() | Creates one or more vector indexes. Specifically, use prepare_index_params to initialize an index parameter list, call add_index multiple times to set multiple index parameters, and call create_index to create the indexes. |
| drop_index() | Drops a vector index. |
| describe_index() | Queries the metadata (schema) of an index. |
Operations related to vector indexes
| Operation | Description |
|---|---|
| search() | Performs an approximate nearest neighbor (ANN) search.
|
| query() | Performs a point query with filters, namely, SELECT ... WHERE ids IN (..., ...) AND <filters>. |
| get() | Performs a point query without filters, namely, SELECT ... WHERE ids IN (..., ...). |
| delete() | Drops a group of vectors, namely, DELETE FROM ... WHERE ids IN (..., ...). |
| insert() | Inserts a group of vectors. |
| upsert() | Inserts data into a table and updates existing data when a primary key conflict exists. |
Operations related to collection metadata synchronization
| Operation | Description |
|---|---|
| load_collection() | Loads the schema (structure) of a table from the database to the memory of the Python application, enabling the application to operate the database table in an object-oriented manner. This is a standard feature of an object-relational mapping (ORM) framework. |
| release_collection() | Unloads the schema (structure) of a table from the memory of the Python application and releases the related resources. This is a standard feature of an ORM framework for memory management. |
| close() | Closes the database connection and releases the related resources. This is a standard feature of an ORM framework. |
PyMillvus
Data model
The data model of Milvus comprises three levels: shards, partitions, and segments. Compatibility of OceanBase Database with the data model of Milvus is described as follows:
A shard in Milvus corresponds to a partition in OceanBase Database.
Partition in Milvus has no corresponding term in OceanBase Database.
Milvus allows you to partition a shard into blocks by other columns to improve the read performance. Generally, a shard is partitioned by the primary key. In OceanBase Database, data in partitions is sorted by the primary key to improve the read performance.
Compatibility with Milvus Lite APIs
Operations related to collections
create_collection()in Milvus:create_collection( collection_name: str, dimension: int, primary_field_name: str = "id", id_type: str = DataType, vector_field_name: str = "vector", metric_type: str = "COSINE", auto_id: bool = False, timeout: Optional[float] = None, schema: Optional[CollectionSchema] = None, # Used for custom setup index_params: Optional[IndexParams] = None, # Used for custom setup **kwargs, ) -> NoneThe compatibility of OceanBase Database with this operation is described as follows:
collection_name: supported and corresponds totable_namein OceanBase Database.dimension: supported and corresponds tovector(dim)in OceanBase Database.primary_field_name: supported and specifies the name of the primary key column.id_type: supported and specifies the type of the primary key column.vector_field_name: supported and specifies the name of the vector column.auto_id: supported and corresponds to the auto-increment column in OceanBase Database.timeout: supported. OceanBase Database supports the timeout feature through a hint.schema: supported.index_params: supported.
get_collection_stats()in Milvus:get_collection_stats( collection_name: str, timeout: Optional[float] = None ) -> DictThe compatibility of OceanBase Database with this operation is described as follows:
OceanBase Database is compatible with this operation.
OceanBase Database is compatible with the return value
{ 'row_count': ... }.
has_collection()in Milvus:has_collection( collection_name: str, timeout: Optional[float] = None ) -> BoolOceanBase Database is compatible with the
has_collection()operation in Milvus.drop_collection()in Milvus:drop_collection(collection_name: str) -> NoneOceanBase Database is compatible with the
drop_collection()operation in Milvus.rename_collection()in Milvus:rename_collection( old_name: str, new_name: str, timeout: Optional[float] = None ) -> NoneOceanBase Database is compatible with the
rename_collection()operation in Milvus.
Operations related to schemas
create_schema()in Milvus:create_schema( auto_id: bool, enable_dynamic_field: bool, primary_field: str, partition_key_field: str, ) -> CollectionSchemaThe compatibility of OceanBase Database with this operation is described as follows:
auto_id: supported and specifies whether to set the primary key column as an auto-increment column.primary_fieldandpartition_key_field: supported.
add_field()in Milvus:add_field( field_name: str, datatype: DataType, is_primary: bool, max_length: int, element_type: str, max_capacity: int, dim: int, is_partition_key: bool, )OceanBase Database is compatible with the
add_field()operation in Milvus.
Operations related to insertion and search
search()in Milvus:search( collection_name: str, data: Union[List[list], list], filter: str = "", limit: int = 10, output_fields: Optional[List[str]] = None, search_params: Optional[dict] = None, timeout: Optional[float] = None, partition_names: Optional[List[str]] = None, **kwargs, ) -> List[dict]The compatibility of OceanBase Database with this operation is described as follows:
filter: a string expression. For more information, see https://milvus.io/docs/boolean.md#Usage. It is similar to theWHEREexpression in SQL.search_params:metric_type: supported.radiusandrange filter: related to reverse nearest neighbor (RNN) queries and not supported now.group_by_field: groups approximate nearest neighbor (ANN) search results. It is not supported now.max_empty_result_buckets: used for inverted file (IVF) indexes. It is not supported now.ignore_growing: skips incremental data and directly reads the baseline index. It is not supported now.
partition_names: supported and reads data by partition.kwargs:offset: the number of records to be skipped in the search results. It is not supported now.round_decimal: rounds off the result based on the specified number of decimal places. It is not supported now.
get()in Milvus:get( collection_name: str, ids: Union[list, str, int], output_fields: Optional[List[str]] = None, timeout: Optional[float] = None, partition_names: Optional[List[str]] = None, **kwargs, ) -> List[dict]OceanBase Database is compatible with the
get()operation in Milvus.delete()in Milvusdelete( collection_name: str, ids: Optional[Union[list, str, int]] = None, timeout: Optional[float] = None, filter: Optional[str] = "", partition_name: Optional[str] = "", **kwargs, ) -> dictOceanBase Database is compatible with the
delete()operation in Milvus.insert()in Milvusinsert( collection_name: str, data: Union[Dict, List[Dict]], timeout: Optional[float] = None, partition_name: Optional[str] = "", ) -> List[Union[str, int]]OceanBase Database is compatible with the
insert()operation in Milvus.upsert()in Milvusupsert( collection_name: str, data: Union[Dict, List[Dict]], timeout: Optional[float] = None, partition_name: Optional[str] = "", ) -> List[Union[str, int]]OceanBase Database is compatible with the
upsert()operation in Milvus.
Operations related to indexes
create_index()in Milvuscreate_index( collection_name: str, index_params: IndexParams, timeout: Optional[float] = None, **kwargs, )OceanBase Database is compatible with the
create_index()operation in Milvus.drop_index()in Milvusdrop_index( collection_name: str, index_name: str, timeout: Optional[float] = None, **kwargs, )OceanBase Database is compatible with the
drop_index()operation in Milvus.
Compatibility with MySQL
In terms of initiating requests, all operations are implemented through general SQL query statements. No compatibility issue exists.
In terms of processing responses (result sets), only processing of vector data elements of new data types needs to be considered. At present, OceanBase Database supports the parsing of string and byte elements. If the transmission mode of vector data elements changes in the future, you can update the SDK for compatibility.