OceanBase logo

OceanBase

A unified distributed database ready for your transactional, analytical, and AI workloads.

DEPLOY YOUR WAY

OceanBase Cloud

The best way to deploy and scale OceanBase

OceanBase Enterprise

Run and manage OceanBase on your infra

TRY OPEN SOURCE

OceanBase Community Edition

The free, open-source distributed database

OceanBase seekdb

Open source AI native search database

Customer Stories

Real-world success stories from enterprises across diverse industries.

View All
BY USE CASES

Mission-Critical Transactions

Global & Multicloud Application

Elastic Scaling for Peak Traffic

Real-time Analytics

Active Geo-redundancy

Database Consolidation

Resources

Comprehensive knowledge hub for OceanBase.

Blog

Live Demos

Training & Certification

Documentation

Official technical guides, tutorials, API references, and manuals for all OceanBase products.

View All
PRODUCTS

OceanBase Cloud

OceanBase Database

Tools

Connectors and Middleware

QUICK START

OceanBase Cloud

OceanBase Database

BEST PRACTICES

Practical guides for utilizing OceanBase more effectively and conveniently

Company

Learn more about OceanBase – our company, partnerships, and trust and security initiatives.

About OceanBase

Partner

Trust Center

Contact Us

International - English
中国站 - 简体中文
日本 - 日本語
Sign In
Start on Cloud

A unified distributed database ready for your transactional, analytical, and AI workloads.

DEPLOY YOUR WAY

OceanBase Cloud

The best way to deploy and scale OceanBase

OceanBase Enterprise

Run and manage OceanBase on your infra

TRY OPEN SOURCE

OceanBase Community Edition

The free, open-source distributed database

OceanBase seekdb

Open source AI native search database

Customer Stories

Real-world success stories from enterprises across diverse industries.

View All
BY USE CASES

Mission-Critical Transactions

Global & Multicloud Application

Elastic Scaling for Peak Traffic

Real-time Analytics

Active Geo-redundancy

Database Consolidation

Comprehensive knowledge hub for OceanBase.

Blog

Live Demos

Training & Certification

Documentation

Official technical guides, tutorials, API references, and manuals for all OceanBase products.

View All
PRODUCTS
OceanBase CloudOceanBase Database
ToolsConnectors and Middleware
QUICK START
OceanBase CloudOceanBase Database
BEST PRACTICES

Practical guides for utilizing OceanBase more effectively and conveniently

Learn more about OceanBase – our company, partnerships, and trust and security initiatives.

About OceanBase

Partner

Trust Center

Contact Us

Start on Cloud
编组
All Products
    • Databases
    • iconOceanBase Database
    • iconOceanBase Cloud
    • iconOceanBase Tugraph
    • iconInteractive Tutorials
    • iconOceanBase Best Practices
    • Tools
    • iconOceanBase Cloud Platform
    • iconOceanBase Migration Service
    • iconOceanBase Developer Center
    • iconOceanBase Migration Assessment
    • iconOceanBase Admin Tool
    • iconOceanBase Loader and Dumper
    • iconOceanBase Deployer
    • iconKubernetes operator for OceanBase
    • iconOceanBase Diagnostic Tool
    • iconOceanBase Binlog Service
    • Connectors and Middleware
    • iconOceanBase Database Proxy
    • iconEmbedded SQL in C for OceanBase
    • iconOceanBase Call Interface
    • iconOceanBase Connector/C
    • iconOceanBase Connector/J
    • iconOceanBase Connector/ODBC
    • iconOceanBase Connector/NET
icon

OceanBase Database

SQL - V4.4.2

    Download PDF

    OceanBase logo

    The Unified Distributed Database for the AI Era.

    Follow Us
    Products
    OceanBase CloudOceanBase EnterpriseOceanBase Community EditionOceanBase seekdb
    Resources
    DocsBlogLive DemosTraining & Certification
    Company
    About OceanBaseTrust CenterLegalPartnerContact Us
    Follow Us

    © OceanBase 2026. All rights reserved

    Cloud Service AgreementPrivacy PolicySecurity
    Contact Us
    Document Feedback
    1. Documentation Center
    2. OceanBase Database
    3. SQL
    4. V4.4.2
    iconOceanBase Database
    SQL - V 4.4.2
    SQL
    KV
    • V 4.4.2
    • V 4.3.5
    • V 4.3.3
    • V 4.3.1
    • V 4.3.0
    • V 4.2.5
    • V 4.2.2
    • V 4.2.1
    • V 4.2.0
    • V 4.1.0
    • V 4.0.0
    • V 3.1.4 and earlier

    OpenAI

    Last Updated:2026-04-09 08:31:39  Updated
    share
    What is on this page
    Prerequisites
    Step 1: Obtain the connection string of OceanBase Database
    Step 2: Register an account on the LLM platform
    Step 3: Store vector data to OceanBase Database
    Store vector data in OceanBase Database
    Query OceanBase data

    folded

    share

    OpenAI is an artificial intelligence company that has developed advanced large language models. These models excel in natural language understanding and generation, enabling functionalities such as text generation, question answering, and conversational interactions, all accessible via APIs.

    OceanBase Database provides vector storage, vector indexing, and embedding vector search capabilities. You can leverage OpenAI's APIs to store vectorized data in OceanBase Database and use OceanBase Database's vector search capabilities to query related data.

    Prerequisites

    • You have deployed OceanBase Database V4.4.0 or later and created a MySQL-compatible tenant. After you create a tenant, follow the steps below. For more information, see Create a tenant.

    • You have a MySQL-compatible tenant, database, and account in your environment, and the database account has read and write privileges.

    • You have installed Python 3.9 or later and the corresponding pip.

    • You have installed Poetry, PyObVector, and the OpenAI SDK.

      python3 pip install poetry
      python3 pip install pyobvector
      python3 pip install openai
      
    • You have obtained an OpenAI API key.

    • You have set the ob_vector_memory_limit_percentage parameter in the tenant to enable vector search. We recommend that you set the value to 30 for OceanBase Database versions earlier than V4.4.1, and keep the default value 0 for V4.4.1 and later. For more information about this parameter, see ob_vector_memory_limit_percentage.

    Step 1: Obtain the connection string of OceanBase Database

    Contact the deployment personnel or administrator of OceanBase Database to obtain the database connection string. For example:

    obclient -h$host -P$port -u$user_name -p$password -D$database_name
    

    Parameter description:

    • $host: the IP address for connecting to OceanBase Database. For connection through OceanBase Database Proxy (ODP), use the IP address of an ODP. For direct connection, use the IP address of an OBServer node.

    • $port: the port for connecting to OceanBase Database. For connection through ODP, the default value is 2883, which can be customized when ODP is deployed. For direct connection, the default value is 2881, which can be customized when OceanBase Database is deployed.

    • $database_name: the name of the database to be accessed.

      Notice

      The user for connecting to a tenant must have the CREATE, INSERT, DROP, and SELECT privileges on the database. For more information about user privileges, see Privilege types in MySQL-compatible mode.

    • $user_name: the tenant account. For connection through ODP, the tenant account can be in the username@tenant name#cluster name or cluster name:tenant name:username format. For direct connection, the tenant account is in the username@tenant name format.

    • $password: the account password.

    For more information about connection strings, see Connect to an OceanBase tenant by using OBClient.

    Here is an example:

    obclient -hxxx.xxx.xxx.xxx -P2881 -utest_user001@mysql001 -p****** -Dtest
    

    Step 2: Register an account on the LLM platform

    Obtain an OpenAI API key:

    1. Log in to the OpenAI console.

    2. In the upper-right corner, click API Keys.

    3. Click Create API Key.

    4. Fill in the required information and click Create API Key.

    Configure the OpenAI API key as an environment variable:

    • For a Unix-based system (such as Ubuntu or macOS), run the following command in the terminal:

      export OPENAI_API_KEY='your-api-key'
      
    • For a Windows system, run the following command in the command prompt:

      set OPENAI_API_KEY=your-api-key
      

    Make sure to replace your-api-key with your actual OpenAI API key.

    Step 3: Store vector data to OceanBase Database

    Store vector data in OceanBase Database

    1. Prepare test data.

      Download the CSV file that contains pre-computed vectorized data. The CSV file contains 1,000 reviews of food and the last column contains the vectorized values. Therefore, you do not need to compute the vectors again. You can also use the following code to recalculate the embedding column (i.e., the vector column) and generate a new CSV file.

      from openai import OpenAI
      import pandas as pd
      input_datapath = "./fine_food_reviews.csv"
      client = OpenAI()
      # You can change the text-embedding-ada-002 model to another one as needed.
      def embedding_text(text, model="text-embedding-ada-002"):
          # For more information about how to generate embedding vectors, see: https://community.openai.com/t/embeddings-api-documentation-needs-to-updated/475663
          res = client.embeddings.create(input=text, model=model)
          return res.data[0].embedding
      df = pd.read_csv(input_datapath, index_col=0)
      # It takes a few minutes to generate vectors. The OpenAI Embedding API is called for each row.
      df["embedding"] = df.combined.apply(embedding_text)
      output_datapath = './fine_food_reviews_self_embeddings.csv'
      df.to_csv(output_datapath)
      
    2. Run the following script to insert the test data into OceanBase Database. The directory where the script is located must be the same as the directory where the test data is located.

        import os
        import sys
        import csv
        import json
        from pyobvector import *
        from sqlalchemy import Column, Integer, String
        # Connect to OceanBase Database by using pyobvector. Replace @ in the username and password with %40.
        client = ObVecClient(uri="host:port", user="username",password="****",db_name="test")
        # The prepared test dataset has been vectorized. By default, the dataset is stored in the same directory as the Python script. If you have vectorized the dataset yourself, replace the file name with the name of the corresponding file.
        file_name = "fine_food_reviews.csv"
        file_path = os.path.join("./", file_name)
        # Define columns. The vectorized column is the last field.
        cols = [
            Column('id', Integer, primary_key=True, autoincrement=False),
            Column('product_id', String(256), nullable=True),
            Column('user_id', String(256), nullable=True),
            Column('score', Integer, nullable=True),
            Column('summary', String(2048), nullable=True),
            Column('text', String(8192), nullable=True),
            Column('combined', String(8192), nullable=True),
            Column('n_tokens', Integer, nullable=True),
            Column('embedding', VECTOR(1536))
        ]
        # Specify the table name.
        table_name = 'fine_food_reviews'
        # Create the table if it does not exist.
        if not client.check_table_exists(table_name):
            client.create_table(table_name,columns=cols)
            # Create an index for the vector column.
            client.create_index(
                table_name=table_name,
                is_vec_index=True,
                index_name='vidx',
                column_names=['embedding'],
                vidx_params='distance=l2, type=hnsw, lib=vsag',
            )
        # Open and read the CSV file.
        with open(file_name, mode='r', newline='', encoding='utf-8') as csvfile:
            csvreader = csv.reader(csvfile)
            # Read the header row.
            headers = next(csvreader)
            print("Headers:", headers)
            batch = [] # Store data. Insert the data into the database in batches of 10 rows.
            for i, row in enumerate(csvreader):
                # The CSV file has nine fields: id, product_id, user_id, score, summary, text, combined, n_tokens, and embedding.
                if not row:
                    break
                food_review_line= {'id':row[0],'product_id':row[1],'user_id':row[2],'score':row[3],'summary':row[4],'text':row[5],\
                'combined':row[6],'n_tokens':row[7],'embedding':json.loads(row[8])}
                batch.append(food_review_line)
                # Insert data in batches of 10 rows.
                if (i + 1) % 10 == 0:
                    client.insert(table_name,batch)
                    batch = []  # Clear the cache.
            # Insert the remaining rows (if any).
            if batch:
                client.insert(table_name,batch)
        # Check the data in the table to make sure that all data has been inserted.
        count_sql = f"select count(*) from {table_name};"
        cursor = client.perform_raw_text_sql(count_sql)
        result = cursor.fetchone()
        print(f"Total number of imported rows:{result[0]}")
    

    Query OceanBase data

    1. Save the following Python script as openAIQuery.py.

          import os
          import sys
          import csv
          import json
          from pyobvector import *
          from sqlalchemy import func
          from openai import OpenAI
          # Get the command-line arguments.
          if len(sys.argv) != 2:
              print("Please enter a query statement.")
              sys.exit()
          queryStatement = sys.argv[1]
          # Connect to OceanBase Database by using pyobvector. Replace @ in the username or password with %40.
          client = ObVecClient(uri="host:port", user="usename",password="****",db_name="test")
          openAIclient = OpenAI()
          # Define a function to generate text vectors.
          def generate_embeddings(text, model="text-embedding-ada-002"):
              # For more information about how to create embedding vectors, see: https://community.openai.com/t/embeddings-api-documentation-needs-to-updated/475663
              res = openAIclient.embeddings.create(input=text, model=model)
              return res.data[0].embedding
      
          def query_ob(query, tableName, vector_name="embedding", top_k=1):
              embedding = generate_embeddings(query)
              # Perform approximate nearest neighbor search.
              res = client.ann_search(
                  table_name=tableName,
                  vec_data=embedding,
                  vec_column_name=vector_name,
                  distance_func=func.l2_distance,
                  topk=top_k,
                  output_column_names=['combined']
              )
              for row in res:
                  print(str(row[0]).replace("Title: ", "").replace("; Content: ", ": "))
          # Specify the table name.
          table_name = 'fine_food_reviews'
          query_ob(queryStatement,table_name,'embedding',1)
      
    2. Enter a question and get the relevant answer.

      python3 openAIQuery.py 'pet food'
      

      The expected result is as follows:

      Crack for dogs. These thing are like crack for dogs. I am not sure of the make-up but the doggies sure love them.
      

    Previous topic

    Zabbix
    Last

    Next topic

    Qwen
    Next
    What is on this page
    Prerequisites
    Step 1: Obtain the connection string of OceanBase Database
    Step 2: Register an account on the LLM platform
    Step 3: Store vector data to OceanBase Database
    Store vector data in OceanBase Database
    Query OceanBase data