Meet OceanBase AI Database, the unified database for operational data, real-time analytics, and AI. Explore ->

Meet OceanBase AI Database, the unified database for operational data, real-time analytics, and AI. Explore ->

OceanBase logo

OceanBase

A unified distributed database ready for your transactional, analytical, and AI workloads.

Product Overview
DEPLOY YOUR WAY

OceanBase Cloud

The best way to deploy and scale OceanBase

OceanBase Enterprise

Run and manage OceanBase on your infra

TRY OPEN SOURCE

OceanBase Community Edition

The free, open-source distributed database

OceanBase seekdb

Open source AI native search database

Customer Stories

Real-world success stories from enterprises across diverse industries.

View All
BY USE CASES

Mission-Critical Transactions

Global & Multicloud Application

Elastic Scaling for Peak Traffic

Real-time Analytics

Active Geo-redundancy

Database Consolidation

Resources

Comprehensive knowledge hub for OceanBase.

Blog

Live Demos

Training & Certification

Documentation

Official technical guides, tutorials, API references, and manuals for all OceanBase products.

View All
PRODUCTS

OceanBase Cloud

OceanBase Database

Tools

Connectors and Middleware

QUICK START

OceanBase Cloud

OceanBase Database

BEST PRACTICES

Practical guides for utilizing OceanBase more effectively and conveniently

Company

Learn more about OceanBase – our company, partnerships, and trust and security initiatives.

About OceanBase

Partner

Trust Center

Contact Us

International - English
中国站 - 简体中文
日本 - 日本語
Sign In
Start on Cloud

OceanBase

A unified distributed database ready for your transactional, analytical, and AI workloads.

Product Overview
DEPLOY YOUR WAY

OceanBase Cloud

The best way to deploy and scale OceanBase

OceanBase Enterprise

Run and manage OceanBase on your infra

TRY OPEN SOURCE

OceanBase Community Edition

The free, open-source distributed database

OceanBase seekdb

Open source AI native search database

Customer Stories

Real-world success stories from enterprises across diverse industries.

View All
BY USE CASES

Mission-Critical Transactions

Global & Multicloud Application

Elastic Scaling for Peak Traffic

Real-time Analytics

Active Geo-redundancy

Database Consolidation

Comprehensive knowledge hub for OceanBase.

Blog

Live Demos

Training & Certification

Documentation

Official technical guides, tutorials, API references, and manuals for all OceanBase products.

View All
PRODUCTS
OceanBase CloudOceanBase Database
ToolsConnectors and Middleware
QUICK START
OceanBase CloudOceanBase Database
BEST PRACTICES

Practical guides for utilizing OceanBase more effectively and conveniently

Learn more about OceanBase – our company, partnerships, and trust and security initiatives.

About OceanBase

Partner

Trust Center

Contact Us

Start on Cloud
编组
All Products
    • Databases
    • iconOceanBase Database
    • iconOceanBase Cloud
    • iconOceanBase Tugraph
    • iconInteractive Tutorials
    • iconOceanBase Best Practices
    • Tools
    • iconOceanBase Cloud Platform
    • iconOceanBase Migration Service
    • iconOceanBase Developer Center
    • iconOceanBase Migration Assessment
    • iconOceanBase Admin Tool
    • iconOceanBase Loader and Dumper
    • iconOceanBase Deployer
    • iconKubernetes operator for OceanBase
    • iconOceanBase Diagnostic Tool
    • iconOceanBase Binlog Service
    • Connectors and Middleware
    • iconOceanBase Database Proxy
    • iconEmbedded SQL in C for OceanBase
    • iconOceanBase Call Interface
    • iconOceanBase Connector/C
    • iconOceanBase Connector/J
    • iconOceanBase Connector/ODBC
    • iconOceanBase Connector/NET
icon

OceanBase Cloud

    Download PDF

    OceanBase logo

    The Unified Distributed Database for the AI Era.

    Follow Us
    Products
    OceanBase CloudOceanBase EnterpriseOceanBase Community EditionOceanBase seekdb
    Resources
    DocsBlogWhite PaperLive DemosTraining & CertificationTicket
    Company
    About OceanBaseTrust CenterLegalPartnerContact Us
    Follow Us

    © OceanBase 2026. All rights reserved

    Cloud Service AgreementPrivacy PolicySecurity
    Contact Us
    Document Feedback
    1. Documentation Center
    2. OceanBase Cloud
    iconOceanBase Cloud
    Databases
    • OceanBase Database
    • OceanBase Cloud
    • OceanBase Tugraph
    • Interactive Tutorials
    • OceanBase Best Practices
    Tools
    • OceanBase Cloud Platform
    • OceanBase Migration Service
    • OceanBase Developer Center
    • OceanBase Migration Assessment
    • OceanBase Admin Tool
    • OceanBase Loader and Dumper
    • OceanBase Deployer
    • Kubernetes operator for OceanBase
    • OceanBase Diagnostic Tool
    • OceanBase Binlog Service
    Connectors and Middleware
    • OceanBase Database Proxy
    • Embedded SQL in C for OceanBase
    • OceanBase Call Interface
    • OceanBase Connector/C
    • OceanBase Connector/J
    • OceanBase Connector/ODBC
    • OceanBase Connector/NET

      Integrate OceanBase Cloud vector retrieval with OpenAI

      Last Updated:2026-04-07 08:08:34  Updated
      Share
      What is on this page
      Prerequisites
      Step 1: Obtain the database connection information
      Step 2: Register an LLM account
      Step 3: Store and query vectorized data
      Store vectorized data in OceanBase Cloud
      Query the vector data

      folded

      Share

      OceanBase Database supports vector data storage, vector indexes, and embedding vector retrieval in V4.3.3 and later. You can store vectorized data in OceanBase Database for further retrieval.

      As an artificial intelligence company, OpenAI has developed several large language models (LLMs) that are excellent at natural language understanding and generation, and capable of generating text, answering questions, and conducting conversations. You can access these models through the OpenAI API.

      This topic describes how to use the OpenAI API to store vectorized data in OceanBase Cloud and query data by using the vector retrieval feature of OceanBase Cloud.

      Prerequisites

      • A transactional cluster instance of the MySQL compatible mode is available in your environment.

      • To use a cluster instance, you first need to create a tenant by referring to Create a tenant.

      • You have created a MySQL-compatible tenant, a database, and an account, and granted the read and write permissions to the database account. For more information, see Create an account and Create a database (MySQL compatible mode).

      • You have been granted the project admin or instance admin role to perform read and write operations on instances in the project. If you do not have the required permissions, contact the organization admin.

      • You have installed Python 3.9 or later and pip. If Python installed on your server is of an early version, you can use Miniconda to build a new environment of Python 3.9 or later. For more information, see Installing Miniconda.

      • You have installed Poetry, Pyobvector, and OpenAI. The installation commands are as follows:

        python3 -m pip install poetry
        python3 -m pip install pyobvector
        python3 -m pip install openai
        

      Step 1: Obtain the database connection information

      1. Log in to the OceanBase Cloud console.

      2. In the instance list page, expand the the information of the target instance.

      3. Select Connect > Get Connection String under the target tenant.

      4. In the pop-up window, select Public Network as the connection method.

      5. Follow the prompts in the pop-up window to obtain the public endpoint and the connection string.

      Step 2: Register an LLM account

      Notice

      To obtain an OpenAI API key, you need to visit a third-party platform. This operation will follow the billing rules of the third-party platform and may incur corresponding fees. Please visit its official website or view the relevant documentation to confirm and accept its billing standards before proceeding. If you disagree, please do not proceed with the operation.

      Obtain an OpenAI API key.

      1. Log in to the OpenAI platform.

      2. Click API Keys in the upper-right corner.

      3. Click Create API Key.

      4. Specify the required information and click Create API Key.

      Specify the API key for the relevant environment variable.

      • For a Unix-based system such as Ubuntu or macOS, you can run the following command in a terminal:
      export OPENAI_API_KEY='your-api-key'
      
      • For a Windows system, you can run the following command in Command Prompt:
      set OPENAI_API_KEY=your-api-key
      

      You must replace your-api-key with the actual OpenAI API key.

      Step 3: Store and query vectorized data

      Store vectorized data in OceanBase Cloud

      1. Prepare test data.

      Download the CSV file of precalculated vectorized data. This file is an open dataset of 1,000 food comments. The last column stores the vectorized values. Therefore, no vector calculation is required. You can also run the following command to recalculate values in the embedding column (vector column) to generate a new CSV file:

      from openai import OpenAI
      import pandas as pd
      input_datapath = "./fine_food_reviews.csv"
      client = OpenAI()
      # Here the text-embedding-ada-002 model is used. You can change the model as needed.
      def embedding_text(text, model="text-embedding-ada-002"):
          # For more information about how to create embedding vectors, see https://community.openai.com/t/embeddings-api-documentation-needs-to-updated/475663.
          res = client.embeddings.create(input=text, model=model)
          return res.data[0].embedding
      df = pd.read_csv(input_datapath, index_col=0)
      # It takes a few minutes to generate the CSV file by calling the OpenAI Embedding API row by row.
      df["embedding"] = df.combined.apply(embedding_text)
      output_datapath = './fine_food_reviews_self_embeddings.csv'
      df.to_csv(output_datapath)
      
      1. Run the following script to insert the test data into OceanBase Cloud. The script must be located in the same directory as the test data.
      import os
      import sys
      import csv
      import json
      from pyobvector import *
      from sqlalchemy import Column, Integer, String
      # Connect to OceanBase Cloud by using pyobvector and replace the at (@) sign in the username and password with %40, if any.
      client = ObVecClient(uri="host:port", user="username",password="****",db_name="test")
      # The test dataset has been vectorized and is stored in the same directory as the Python script by default. If you vectorize the dataset again, specify the new file.
      file_name = "fine_food_reviews.csv"
      file_path = os.path.join("./", file_name)
      # Define columns. The last column is a vector column.
      cols = [
          Column('id', Integer, primary_key=True, autoincrement=False),
          Column('product_id', String(256), nullable=True),
          Column('user_id', String(256), nullable=True),
          Column('score', Integer, nullable=True),
          Column('summary', String(2048), nullable=True),
          Column('text', String(8192), nullable=True),
          Column('combined', String(8192), nullable=True),
          Column('n_tokens', Integer, nullable=True),
          Column('embedding', VECTOR(1536))
      ]
      # Define the table name.
      table_name = 'fine_food_reviews'
      # If the table does not exist, create it.
      if not client.check_table_exists(table_name):
          client.create_table(table_name,columns=cols)
          # Create an index on the vector column.
          client.create_index(
              table_name=table_name,
              is_vec_index=True,
              index_name='vidx',
              column_names=['embedding'],
              vidx_params='distance=l2, type=hnsw, lib=vsag',
          )
      # Open and read the CSV file.
      with open(file_name, mode='r', newline='', encoding='utf-8') as csvfile:
          csvreader = csv.reader(csvfile)
          # Read the header line.
          headers = next(csvreader)
          print("Headers:", headers)
          batch = [] # Store data by inserting 10 rows into the database each time.
          for i, row in enumerate(csvreader):
              # The CSV file contains nine columns: `id`, `product_id`, `user_id`, `score`, `summary`, `text`, `combined`, `n_tokens`, and `embedding`.
              if not row:
                  break
              food_review_line= {'id':row[0],'product_id':row[1],'user_id':row[2],'score':row[3],'summary':row[4],'text':row[5],\
              'combined':row[6],'n_tokens':row[7],'embedding':json.loads(row[8])}
              batch.append(food_review_line)
              # Insert 10 rows each time.
              if (i + 1) % 10 == 0:
                  client.insert(table_name,batch)
                  batch = []  # Clear the cache.
          # Insert the rest rows, if any.
          if batch:
              client.insert(table_name,batch)
      # Check the data in the table and make sure that all data has been inserted.
      count_sql = f"select count(*) from {table_name};"
      cursor = client.perform_raw_text_sql(count_sql)
      result = cursor.fetchone()
      print(f"Total number of inserted rows:{result[0]}")
      

      Query the vector data

      1. Save the following Python script and name it as openAIQuery.py.
      import os
      import sys
      import csv
      import json
      from pyobvector import *
      from sqlalchemy import func
      from openai import OpenAI
      # Obtain command-line options.
      if len(sys.argv) != 2:
          print("Enter a query statement." )
          sys.exit()
      queryStatement = sys.argv[1]
      # Connect to OceanBase Cloud by using pyobvector and replace the at (@) sign in the username and password with %40, if any.
      client = ObVecClient(uri="host:port", user="usename",password="****",db_name="test")
      openAIclient = OpenAI()
      # Define the function for generating text vectors.
      def generate_embeddings(text, model="text-embedding-ada-002"):
          # For more information about how to create embedding vectors, see https://community.openai.com/t/embeddings-api-documentation-needs-to-updated/475663.
          res = openAIclient.embeddings.create(input=text, model=model)
          return res.data[0].embedding
      
      def query_ob(query, tableName, vector_name="embedding", top_k=1):
          embedding = generate_embeddings(query)
          # Perform an approximate nearest neighbor search (ANNS).
          res = client.ann_search(
              table_name=tableName,
              vec_data=embedding,
              vec_column_name=vector_name,
              distance_func=func.l2_distance,
              topk=top_k,
              output_column_names=['combined']
          )
          for row in res:
              print(str(row[0]).replace("Title: ", "").replace("; Content: ", ": "))
      # Specify the table name.
      table_name = 'fine_food_reviews'
      query_ob(queryStatement,table_name,'embedding',1)
      
      1. Enter a question for an answer.
      python3 openAIQuery.py 'pet food'
      

      The expected result is as follows:

      Crack for dogs.: These thing are like crack for dogs. I am not sure of the make-up but the doggies sure love them.
      

      Previous topic

      Tongyi Qianwen
      Last

      Next topic

      n8n
      Next
      What is on this page
      Prerequisites
      Step 1: Obtain the database connection information
      Step 2: Register an LLM account
      Step 3: Store and query vectorized data
      Store vectorized data in OceanBase Cloud
      Query the vector data