
hybrid_search() call), both running in embedded mode with zero infrastructure.Client() for Client(host="...", port=2881) to move from an in-process prototype to a distributed OceanBase cluster.As AI applications scale, most teams hit the same wall: the Frankenstein stack.
To build a modern AI application, you are often forced to stitch together MySQL for metadata, Pinecone or Milvus for vectors, and Elasticsearch for keywords. Then comes the “Glue Code Tax”: hundreds of lines of brittle Python just to sync data, keep systems consistent, and merge query results. The result is fragile, complex, and expensive to operate.
Enter OceanBase seekdb, an AI-native search database that unifies relational, vector, full-text, JSON, and GIS data in a single ACID-compliant, MySQL-compatible engine.
In this post, we will skip the architecture talk and go straight to code. You go from a blank editor to a fully operational, ACID compliant hybrid search engine running entirely in user space.
Before diving into the code, it helps to understand what you gain by replacing the stitched stack with a unified engine.
seekdb supports two deployment modes:
In this guide, we use the Embedded mode to get you started. It is the fastest path to trying seekdb: no Docker, no sidecars, no network config—just Python.
Before installation, ensure that your environment meets the following requirements:
Run the following command to install seekdb in the embeded mode:
pip install -U pyseekdbNote:
If your pip version is low, upgrade pip first as promopted before installing.
This script demonstrates the “zero infrastructure” path. It initializes the database, ingests text with automatic embedding generation, and runs a semantic search.
Create a file named hello_seekdb.py and execute it:
import pyseekdb# ==================== Step 1: Create Client Connection ====================# Start in embedded mode (local SeekDB)client = pyseekdb.Client()# ==================== Step 2: Create a Collection with Embedding Function ====================# A collection is like a table that stores documents with vector embeddingscollection_name = "my_simple_collection"# Create collection with default embedding function# The embedding function will automatically convert documents to embeddingscollection = client.create_collection( name=collection_name,)print(f"Created collection '{collection_name}' with dimension: {collection.dimension}")print(f"Embedding function: {collection.embedding_function}")# ==================== Step 3: Add Data to Collection ====================# With embedding function, you can add documents directly without providing embeddings# The embedding function will automatically generate embeddings from documentsdocuments = [ "Machine learning is a subset of artificial intelligence", "Python is a popular programming language", "Vector databases enable semantic search", "Neural networks are inspired by the human brain", "Natural language processing helps computers understand text"]ids = ["id1", "id2", "id3", "id4", "id5"]# Add data with documents only - embeddings will be auto-generated by embedding functioncollection.add( ids=ids, documents=documents, # embeddings will be automatically generated metadatas=[ {"category": "AI", "index": 0}, {"category": "Programming", "index": 1}, {"category": "Database", "index": 2}, {"category": "AI", "index": 3}, {"category": "NLP", "index": 4} ])print(f"\nAdded {len(documents)} documents to collection")print("Note: Embeddings were automatically generated from documents using the embedding function")# ==================== Step 4: Query the Collection ====================# The embedding function will automatically convert query text to query vector# Query using text - query vector will be auto-generated by embedding functionquery_text = "artificial intelligence and machine learning"results = collection.query( query_texts=query_text, # Query text - will be embedded automatically n_results=3 # Return top 3 most similar documents)print(f"\nQuery: '{query_text}'")print(f"Query results: {len(results['ids'][0])} items found")# ==================== Step 5: Print Query Results ====================for i in range(len(results['ids'][0])): print(f"\nResult {i+1}:") print(f" ID: {results['ids'][0][i]}") print(f" Distance: {results['distances'][0][i]:.4f}") if results.get('documents'): print(f" Document: {results['documents'][0][i]}") if results.get('metadatas'): print(f" Metadata: {results['metadatas'][0][i]}")# ==================== Step 6: Cleanup ====================# Delete the collectionclient.delete_collection(collection_name)print(f"\nDeleted collection '{collection_name}'")>>> Creating collection: my_simple_collection - Dimension: 384 - Embedding function: DefaultEmbeddingFunction(model_name='all-MiniLM-L6-v2')>>> Ingesting data (Auto-Embedding)... - Added 5 documents in 5.31s>>> Querying for: 'artificial intelligence and machine learning' - Found 3 matches.Result 1: ID: id1 Distance: 0.3008 Document: Machine learning is a subset of artificial intelligence Metadata: {'index': 0, 'category': 'AI'}Result 2: ID: id4 Distance: 0.5983 Document: Neural networks are inspired by the human brain Metadata: {'index': 3, 'category': 'AI'}Wrap Up:
You now have a working semantic search engine with zero infrastructure setup and just a few lines of logic. pyseekdbhandled the embeddings internally, so no external API key or separate model service was needed.
Pure vector search is excellent at understanding intent but weak on exact keywords, numbers, and proper nouns. Full-text search has the opposite tradeoff. Hybrid search combines both signals and ranks the merged result set.
seekdb exposes hybrid search in embedded mode through the hybrid_search() method. This uses the same embedded engine as Part 1—no extra services required.
Create hybrid_seekdb.py and execute it:
import pyseekdbfrom pprint import pprintimport timeprint(">>> Initializing SeekDB Client...")# Initialize Embedded Engineclient = pyseekdb.Client()collection_name = "quickstart_demo"# Idempotency: Clean up previous runtry: client.delete_collection(collection_name)except: passprint(f">>> Creating collection '{collection_name}'...")collection = client.create_collection(name=collection_name)documents = [ "Machine learning is a subset of artificial intelligence", "Python is a popular programming language", "Vector databases enable semantic search", "Neural networks are inspired by the human brain", "Natural language processing helps computers understand text",]print(">>> Indexing documents...")# The engine performs Vectorization + Inverted Index building herecollection.add( ids=["id1", "id2", "id3", "id4", "id5"], documents=documents, metadatas=[ {"tag": "ai"}, {"tag": "code"}, {"tag": "db"}, {"tag": "ai"}, {"tag": "ai"}, ],)print(f" - Indexed {len(documents)} documents.")# ---------------------------------------------------------# Hybrid Search: Vector('AI') + Keyword('learning') + RRF# ---------------------------------------------------------print("\n>>> Executing Hybrid Search...")print(" Criteria: Near 'artificial intelligence' (Vector) AND contains 'learning' (Text)")try: hybrid_results = collection.hybrid_search( query={ # Full text condition (lexical match) # Note: The backslash before $ is for shell escaping; Python sees "\$contains" "where_document": {"\$contains": "learning"}, "n_results": 5, }, knn={ # Semantic condition (vector match) "query_texts": ["artificial intelligence"], "n_results": 5, }, # Reciprocal Rank Fusion (RRF) rank={"rrf": {}}, n_results=3, include=["documents", "metadatas"], ) print("\n--- RRF Re-ranked Results ---") pprint(hybrid_results)except AttributeError: print("\n[NOTE] Feature unavailable. Your current pyseekdb version or embedded binary") print(" may not support 'hybrid_search'. This is a standard feature in OceanBase Server.")except Exception as e: print(f"\n[ERROR] Hybrid search failed: {e}")# Cleanupclient.delete_collection(collection_name)print(f"\n>>> Cleanup complete.")EOFSample output:
...- Indexed 5 documents.>>> Executing Hybrid Search... Criteria: Near 'artificial intelligence' (Vector) AND contains 'learning' (Text)--- RRF Re-ranked Results ---{'distances': [[Decimal('0.0328'), Decimal('0.0161'), Decimal('0.0159')]], 'documents': [['Machine learning is a subset of artificial intelligence', 'Natural language processing helps computers understand text', 'Neural networks are inspired by the human brain']], 'ids': [['id1', 'id5', 'id4']], 'metadatas': [[{'tag': 'ai'}, {'tag': 'ai'}, {'tag': 'ai'}]]}What this script does:
Wrap-up:
You now have a single embedded engine that handles:
You’ve just experienced the core capabilities of OceanBase seekdb using its lightweight embedded mode. It allowed you to build a hybrid search engine in seconds without setting up a server.
Note that the code you just wrote is compatible with the Server mode. When your application demands high concurrency, massive storage, or multi-tenant isolation, you can switch to the distributed server deployment without rewriting your query logic.
Prototype: client = pyseekdb.Client()Production: client = pyseekdb.Client(host="...", port=2881)Ready to build?

At the OceanBase DevCon 2024, we introduced the OceanBase 4.3.0 Beta, unveiling a brand new columnar engine. This release achieves near petabyte-scale, real-time analytics in seconds, and enhances the integration of TP and AP capabilities.


OpenClaw's memory degrades over time—an architectural limitation, not a configuration issue. seekdb M0 solves this with cloud-based memory that persists across sessions and shares learned experience across agents.


Learn how to connect Claude to OceanBase using the Model Context Protocol (MCP) and enable dynamic schema discovery and natural-language SQL without brittle prompt engineering.
