Integrate OceanBase vector search with Jina AI|V4.3.5| docs|Distributed Database

Integrate OceanBase vector search with Jina AI

Last Updated：2025-11-19 12:43:31 Updated

OceanBase OceanBase offers features such as vector storage, vector indexing, and embedding-based vector search. You can store vectorized data in OceanBase Database, making it available for fast and efficient search.

Jina AI is an AI platform framework that focuses on multimodal search and vector search. It provides the core components and tools required to build enterprise-level search-enhanced generative AI applications, helping enterprises and developers build Retrieval-Augmented Generation (RAG) applications based on multimodal search.

Prerequisites

You have deployed OceanBase Database V4.4.0 or later, and created a MySQL-compatible tenant. After creating the tenant, continue with the steps below.
Your environment includes an active MySQL-compatible tenant, a MySQL database, and a user account with read and write privileges.
Python 3.11 or above is installed.

Required dependencies are installed:

python3 -m pip install pyobvector requests sqlalchemy

Make sure you have set the ob_vector_memory_limit_percentage parameter in your tenant to enable vector search. For versions prior to 4.4.1, the recommended value is 30. Starting from V4.4.1, the default value of 0 is recommended. For more precise configuration, refer to ob_vector_memory_limit_percentage for guidance on calculating this value.

Step 1: Get your database connection information

Reach out to your OceanBase administrator or deployment team to obtain the database connection string, for example:

obclient -h$host -P$port -u$user_name -p$password -D$database_name

Parameters:

$host: The IP address for connecting to OceanBase Database. If you are using OceanBase Database Proxy (ODP), use the ODP address. For direct connections, use the OBServer node IP.
$port: The port number for connecting to OceanBase Database. The default for ODP is 2883 (can be customized during ODP deployment). For direct connections, the default is 2881 (customizable during OceanBase deployment).
$database_name: The name of the database you want to access.

Notice

The user connecting to the tenant must have CREATE, INSERT, DROP, and SELECT privileges on the database. For more details on user privileges, see privilege types in MySQL-compatible mode.
$user_name: The user account for connecting to the tenant. For ODP, common formats are username@tenant_name#cluster_name or cluster_name:tenant_name:username; for direct connections, use username@tenant_name.
$password: The password for the account.

For more details about connection strings, see Connect to an OceanBase tenant using OBClient.

Step 2: Build your AI assistant

Set the Jina AI API key environment variable

Obtain the Jina AI API key and configure it in the environment variables along with the OceanBase connection information.

export OCEANBASE_DATABASE_URL=YOUR_OCEANBASE_DATABASE_URL
export OCEANBASE_DATABASE_USER=YOUR_OCEANBASE_DATABASE_USER
export OCEANBASE_DATABASE_DB_NAME=YOUR_OCEANBASE_DATABASE_DB_NAME
export OCEANBASE_DATABASE_PASSWORD=YOUR_OCEANBASE_DATABASE_PASSWORD
export JINAAI_API_KEY=YOUR_JINAAI_API_KEY

Sample code snippets

Obtain Jina AI embeddings

Jina AI provides various embedding models. You can select the model that meets your requirements.

Model	Parameter Size	Embedding Dimension	Text
jina-embeddings-v3	570M	flexible embedding size (Default: 1024)	multilingual text embeddings; supports 94 language in total
jina-embeddings-v2-small-en	33M	512	English monolingual embeddings
jina-embeddings-v2-base-en	137M	768	English monolingual embeddings
jina-embeddings-v2-base-zh	161M	768	Chinese-English Bilingual embeddings
jina-embeddings-v2-base-de	161M	768	German-English Bilingual embeddings
jina-embeddings-v2-base-code	161M	768	English and programming languages

Here, the generate_embeddings helper function is defined to call the Jina AI embedding API, using the jina-embeddings-v3 model as an example:

import os
import requests
from sqlalchemy import Column, Integer, String
from pyobvector import ObVecClient, VECTOR, IndexParam, cosine_distance

JINAAI_API_KEY = os.getenv('JINAAI_API_KEY')

# Step 1. Text Data Vectorization
def generate_embeddings(text: str):
    JINAAI_API_URL = 'https://api.jina.ai/v1/embeddings'
    JINAAI_HEADERS = {
        'Content-Type': 'application/json',
        'Authorization': f'Bearer {JINAAI_API_KEY}'
    }
    JINAAI_REQUEST_DATA = {
        'input': [text],
        'model': 'jina-embeddings-v3'
    }
    
    response = requests.post(JINAAI_API_URL, headers=JINAAI_HEADERS, json=JINAAI_REQUEST_DATA)
    response_json = response.json()
    return response_json['data'][0]['embedding']
    

TEXTS = [
    'Jina AI offers best-in-class embeddings, reranker and prompt optimizer, enabling advanced multimodal AI.',
    'OceanBase Database is an enterprise-level, native distributed database independently developed by the OceanBase team. It is cloud-native, highly consistent, and highly compatible with Oracle and MySQL.',
    'OceanBase is a native distributed relational database that supports HTAP hybrid transaction analysis and processing. It features enterprise-level characteristics such as high availability, transparent scalability, and multi-tenancy, and is compatible with MySQL/Oracle protocols.'
]
data = []
for text in TEXTS:
    # Generate the embedding for the text via Jina AI API.
    embedding = generate_embeddings(text)
    data.append({
        'content': text,
        'content_vec': embedding
    })

print(f"Successfully processed {len(data)} texts")

Define the vector table schema and store the vectors in OceanBase

Create a table named jinaai_oceanbase_demo_documents with columns for storing text (content), embeddings (content_vec), and vector index information. Store the vector data in OceanBase:

# Step 2. Connect OceanBase Serverless
OCEANBASE_DATABASE_URL = os.getenv('OCEANBASE_DATABASE_URL')
OCEANBASE_DATABASE_USER = os.getenv('OCEANBASE_DATABASE_USER')
OCEANBASE_DATABASE_DB_NAME = os.getenv('OCEANBASE_DATABASE_DB_NAME')
OCEANBASE_DATABASE_PASSWORD = os.getenv('OCEANBASE_DATABASE_PASSWORD')

client = ObVecClient(uri=OCEANBASE_DATABASE_URL, user=OCEANBASE_DATABASE_USER,password=OCEANBASE_DATABASE_PASSWORD,db_name=OCEANBASE_DATABASE_DB_NAME)
# Step 3. Create the vector table.
table_name = "jinaai_oceanbase_demo_documents"
client.drop_table_if_exist(table_name)

cols = [
    Column("id", Integer, primary_key=True, autoincrement=True),
    Column("content", String(500), nullable=False),
    Column("content_vec", VECTOR(1024))
]

# Create vector index
vector_index_params = IndexParam(
    index_name="idx_content_vec",
    field_name="content_vec",  
    index_type="HNSW",
    distance_metric="cosine"
)

client.create_table_with_index_params(
    table_name=table_name,
    columns=cols, 
    vidxs=[vector_index_params]
)

print('- Inserting Data to OceanBase...')
client.insert(table_name, data=data)

Perform semantic search

Generate an embedding for the query text via the Jina AI embedding API. Then, search for the most relevant document based on the cosine distance between the query embedding and the embeddings in the vector table:

# Step 4. Query the most relevant document based on the query.
query = 'What is OceanBase?'
# Generate the embedding for the query via Jina AI API.
query_embedding = generate_embeddings(query)

res = client.ann_search(
    table_name,
    vec_data=query_embedding,
    vec_column_name="content_vec",
    distance_func=cosine_distance,  # Use the cosine distance function
    with_dist=True,
    topk=1,
    output_column_names=["id", "content"],
)

print('- The Most Relevant Document and Its Distance to the Query:')
for row in res.fetchall():
    print(f'  - ID: {row[0]}\n'
          f'    content: {row[1]}\n'
          f'    distance: {row[2]}')

Expected result

  - ID: 2
    content: OceanBase Database is an enterprise-level, native distributed database independently developed by the OceanBase team. It is cloud-native, highly consistent, and highly compatible with Oracle and MySQL.
    distance: 0.14733879001870276