OpenAI|V4.4.2|OceanBase Database| docs|Distributed Database

OpenAI

Last Updated：2026-04-09 08:31:39 Updated

OpenAI is an artificial intelligence company that has developed advanced large language models. These models excel in natural language understanding and generation, enabling functionalities such as text generation, question answering, and conversational interactions, all accessible via APIs.

OceanBase Database provides vector storage, vector indexing, and embedding vector search capabilities. You can leverage OpenAI's APIs to store vectorized data in OceanBase Database and use OceanBase Database's vector search capabilities to query related data.

Prerequisites

You have deployed OceanBase Database V4.4.0 or later and created a MySQL-compatible tenant. After you create a tenant, follow the steps below. For more information, see Create a tenant.
You have a MySQL-compatible tenant, database, and account in your environment, and the database account has read and write privileges.
You have installed Python 3.9 or later and the corresponding pip.

You have installed Poetry, PyObVector, and the OpenAI SDK.

python3 pip install poetry
python3 pip install pyobvector
python3 pip install openai

You have obtained an OpenAI API key.
You have set the ob_vector_memory_limit_percentage parameter in the tenant to enable vector search. We recommend that you set the value to 30 for OceanBase Database versions earlier than V4.4.1, and keep the default value 0 for V4.4.1 and later. For more information about this parameter, see ob_vector_memory_limit_percentage.

Step 1: Obtain the connection string of OceanBase Database

Contact the deployment personnel or administrator of OceanBase Database to obtain the database connection string. For example:

obclient -h$host -P$port -u$user_name -p$password -D$database_name

Parameter description:

$host: the IP address for connecting to OceanBase Database. For connection through OceanBase Database Proxy (ODP), use the IP address of an ODP. For direct connection, use the IP address of an OBServer node.
$port: the port for connecting to OceanBase Database. For connection through ODP, the default value is 2883, which can be customized when ODP is deployed. For direct connection, the default value is 2881, which can be customized when OceanBase Database is deployed.
$database_name: the name of the database to be accessed.

Notice

The user for connecting to a tenant must have the CREATE, INSERT, DROP, and SELECT privileges on the database. For more information about user privileges, see Privilege types in MySQL-compatible mode.
$user_name: the tenant account. For connection through ODP, the tenant account can be in the username@tenant name#cluster name or cluster name:tenant name:username format. For direct connection, the tenant account is in the username@tenant name format.
$password: the account password.

For more information about connection strings, see Connect to an OceanBase tenant by using OBClient.

Here is an example:

obclient -hxxx.xxx.xxx.xxx -P2881 -utest_user001@mysql001 -p****** -Dtest

Step 2: Register an account on the LLM platform

Obtain an OpenAI API key:

Log in to the OpenAI console.
In the upper-right corner, click API Keys.
Click Create API Key.
Fill in the required information and click Create API Key.

Configure the OpenAI API key as an environment variable:

For a Unix-based system (such as Ubuntu or macOS), run the following command in the terminal:
```
export OPENAI_API_KEY='your-api-key'
```
For a Windows system, run the following command in the command prompt:
```
set OPENAI_API_KEY=your-api-key
```

Make sure to replace your-api-key with your actual OpenAI API key.

Step 3: Store vector data to OceanBase Database

Store vector data in OceanBase Database

Prepare test data.

Download the CSV file that contains pre-computed vectorized data. The CSV file contains 1,000 reviews of food and the last column contains the vectorized values. Therefore, you do not need to compute the vectors again. You can also use the following code to recalculate the embedding column (i.e., the vector column) and generate a new CSV file.

from openai import OpenAI
import pandas as pd
input_datapath = "./fine_food_reviews.csv"
client = OpenAI()
# You can change the text-embedding-ada-002 model to another one as needed.
def embedding_text(text, model="text-embedding-ada-002"):
    # For more information about how to generate embedding vectors, see: https://community.openai.com/t/embeddings-api-documentation-needs-to-updated/475663
    res = client.embeddings.create(input=text, model=model)
    return res.data[0].embedding
df = pd.read_csv(input_datapath, index_col=0)
# It takes a few minutes to generate vectors. The OpenAI Embedding API is called for each row.
df["embedding"] = df.combined.apply(embedding_text)
output_datapath = './fine_food_reviews_self_embeddings.csv'
df.to_csv(output_datapath)

Run the following script to insert the test data into OceanBase Database. The directory where the script is located must be the same as the directory where the test data is located.

    import os
    import sys
    import csv
    import json
    from pyobvector import *
    from sqlalchemy import Column, Integer, String
    # Connect to OceanBase Database by using pyobvector. Replace @ in the username and password with %40.
    client = ObVecClient(uri="host:port", user="username",password="****",db_name="test")
    # The prepared test dataset has been vectorized. By default, the dataset is stored in the same directory as the Python script. If you have vectorized the dataset yourself, replace the file name with the name of the corresponding file.
    file_name = "fine_food_reviews.csv"
    file_path = os.path.join("./", file_name)
    # Define columns. The vectorized column is the last field.
    cols = [
        Column('id', Integer, primary_key=True, autoincrement=False),
        Column('product_id', String(256), nullable=True),
        Column('user_id', String(256), nullable=True),
        Column('score', Integer, nullable=True),
        Column('summary', String(2048), nullable=True),
        Column('text', String(8192), nullable=True),
        Column('combined', String(8192), nullable=True),
        Column('n_tokens', Integer, nullable=True),
        Column('embedding', VECTOR(1536))
    ]
    # Specify the table name.
    table_name = 'fine_food_reviews'
    # Create the table if it does not exist.
    if not client.check_table_exists(table_name):
        client.create_table(table_name,columns=cols)
        # Create an index for the vector column.
        client.create_index(
            table_name=table_name,
            is_vec_index=True,
            index_name='vidx',
            column_names=['embedding'],
            vidx_params='distance=l2, type=hnsw, lib=vsag',
        )
    # Open and read the CSV file.
    with open(file_name, mode='r', newline='', encoding='utf-8') as csvfile:
        csvreader = csv.reader(csvfile)
        # Read the header row.
        headers = next(csvreader)
        print("Headers:", headers)
        batch = [] # Store data. Insert the data into the database in batches of 10 rows.
        for i, row in enumerate(csvreader):
            # The CSV file has nine fields: id, product_id, user_id, score, summary, text, combined, n_tokens, and embedding.
            if not row:
                break
            food_review_line= {'id':row[0],'product_id':row[1],'user_id':row[2],'score':row[3],'summary':row[4],'text':row[5],\
            'combined':row[6],'n_tokens':row[7],'embedding':json.loads(row[8])}
            batch.append(food_review_line)
            # Insert data in batches of 10 rows.
            if (i + 1) % 10 == 0:
                client.insert(table_name,batch)
                batch = []  # Clear the cache.
        # Insert the remaining rows (if any).
        if batch:
            client.insert(table_name,batch)
    # Check the data in the table to make sure that all data has been inserted.
    count_sql = f"select count(*) from {table_name};"
    cursor = client.perform_raw_text_sql(count_sql)
    result = cursor.fetchone()
    print(f"Total number of imported rows:{result[0]}")

Query OceanBase data

Save the following Python script as openAIQuery.py.

    import os
    import sys
    import csv
    import json
    from pyobvector import *
    from sqlalchemy import func
    from openai import OpenAI
    # Get the command-line arguments.
    if len(sys.argv) != 2:
        print("Please enter a query statement.")
        sys.exit()
    queryStatement = sys.argv[1]
    # Connect to OceanBase Database by using pyobvector. Replace @ in the username or password with %40.
    client = ObVecClient(uri="host:port", user="usename",password="****",db_name="test")
    openAIclient = OpenAI()
    # Define a function to generate text vectors.
    def generate_embeddings(text, model="text-embedding-ada-002"):
        # For more information about how to create embedding vectors, see: https://community.openai.com/t/embeddings-api-documentation-needs-to-updated/475663
        res = openAIclient.embeddings.create(input=text, model=model)
        return res.data[0].embedding

    def query_ob(query, tableName, vector_name="embedding", top_k=1):
        embedding = generate_embeddings(query)
        # Perform approximate nearest neighbor search.
        res = client.ann_search(
            table_name=tableName,
            vec_data=embedding,
            vec_column_name=vector_name,
            distance_func=func.l2_distance,
            topk=top_k,
            output_column_names=['combined']
        )
        for row in res:
            print(str(row[0]).replace("Title: ", "").replace("; Content: ", ": "))
    # Specify the table name.
    table_name = 'fine_food_reviews'
    query_ob(queryStatement,table_name,'embedding',1)

Enter a question and get the relevant answer.

python3 openAIQuery.py 'pet food'

The expected result is as follows:

Crack for dogs. These thing are like crack for dogs. I am not sure of the make-up but the doggies sure love them.

Previous topic

Zabbix

Last

Next topic

Qwen

OceanBase

Customer Stories

Documentation