VectorDBBench is a tool that provides benchmark results for mainstream vector databases and cloud services. This topic describes how to use VectorDBBench to benchmark OceanBase vector search performance. VectorDBBench is designed for ease of use so you can reproduce results or test new systems with minimal effort.
Prerequisites
Deploy an OceanBase cluster of V4.4.0 or later.
Create a MySQL-compatible tenant. See Create a tenant.
Install Python 3.11 or later. The example below uses Conda:
# Download and install Conda mkdir -p ~/miniconda3 wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3 rm ~/miniconda3/miniconda.sh # Reopen your terminal and initialize Conda source ~/miniconda3/bin/activate conda init --all # Create and initialize the Python environment required by VectorDBBench conda create -n vdb python=3.11 conda activate vdbConnect to the business tenant and tune memory and query parameters for HNSW vector index search:
-- Set ob_vector_memory_limit_percentage to 30%. ALTER SYSTEM SET ob_vector_memory_limit_percentage = 30; -- Set ob_query_timeout to 24 hours. SET GLOBAL ob_query_timeout = 86400000000; -- Set max_allowed_packet to 1 GB. SET GLOBAL max_allowed_packet=1073741824; -- Set concurrency for index creation ALTER SYSTEM SET ddl_thread_score = 8; -- Concurrency for DDL operations SET GLOBAL parallel_servers_target = 624; -- Number of parallel queries the server can handleob_vector_memory_limit_percentage = 30is only an example; adjust it for your tenant memory and workload. For the calculation logic, see ob_vector_memory_limit_percentage.
Recommended configuration
Recommended tenant resource specs:
| Configuration item | Value |
|---|---|
| Memory | 64 GB |
| CPU | 16 cores |
Testing methods
Clone the VectorDBBench code
Note
Deploy VectorDBBench and the OceanBase cluster on different machines to avoid CPU contention and to make test results more reliable.
Clone the VectorDBBench repository to your machine:
git clone https://github.com/zilliztech/VectorDBBench.git
Install dependencies
Go to the VectorDBBench directory and install dependencies:
cd VectorDBBench
pip install .
Run the test
Run VectorDBBench. The following examples show HNSW and IVF index tests.
HNSW index example
# Replace $host, $port, and $user with your OceanBase connection details.
vectordbbench oceanbasehnsw --host $host --port $port --user $user --database test --m 16 --ef-construction 200 --ef-search 40 --k 10 --case-type Performance768D1M --index-type HNSW
To see all options:
vectordbbench oceanbasehnsw --help
Common options:
--num-concurrency: Concurrency level. VectorDBBench runs vector queries at this concurrency and reports the highest QPS (queries per second) as the result.--skip-drop-old/--skip-load: Skip dropping old data and loading data. With these options, the command only runs vector queries.--k: Number of top-k nearest neighbors to return.--ef-search: HNSW query parameter; size of the candidate set during search.--index-type: Index type. Supported values:HNSW,HNSW_SQ,HNSW_BQ.
IVF index example
vectordbbench oceanbaseivf --host $host --port $port --user $user --database test --nlist 1000 --sample_per_nlist 256 --ivf_nprobes 100 --case-type Performance768D1M --index-type IVF_FLAT
Common options:
--sample_per_nlist: Number of samples per cluster center. Default:256.--ivf_nprobes: Number of nearest cluster centers to search per query. Default:8. Higher values improve recall but increase search time.--index-type: Index type. Supported value:IVF_FLAT.
To see all options:
vectordbbench oceanbaseivf --help
FAQs
Is it normal for the first run to be slow?
Yes. The first run downloads the dataset from AWS S3, which can take a while.
Can I modify the test code?
Yes. If you change the code, run pip install . again before running the test.