VectorDBBench is a benchmarking tool designed to evaluate the performance of mainstream vector databases and cloud services. This topic explains how to use VectorDBBench to assess the vector search performance of OceanBase Database. Designed for ease of use, VectorDBBench allows you to easily replicate test results or benchmark new systems.
Notice
Currently, this test supports only HNSW index-based searches.
Prerequisites
- Deploy an OceanBase cluster of V4.3.5 or later.
- Create a MySQL tenant. For more information, see Create a tenant.
- Install Python 3.11 or later. The example below demonstrates how to install Python 3.11 using Conda:
# Download and install Conda mkdir -p ~/miniconda3 wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3 rm ~/miniconda3/miniconda.sh # Reopen your terminal and initialize Conda source ~/miniconda3/bin/activate conda init --all # Create and initialize the Python environment required by VectorDBBench conda create -n vdb python=3.11 conda activate vdb - Connect to the business tenant and optimize memory and query parameters for HNSW vector index searches:
-- Set ob_vector_memory_limit_percentage to 30%. ALTER SYSTEM SET ob_vector_memory_limit_percentage = 30; -- Set ob_query_timeout to 24 hours. SET GLOBAL ob_query_timeout = 86400000000; -- Set max_allowed_packet to 1 GB. SET GLOBAL max_allowed_packet=1073741824;Here,
ob_vector_memory_limit_percentage = 30is an example value. Adjust it based on the tenant's memory size and workload. For more information, see ob_vector_memory_limit_percentage.
Recommended configuration
The recommended resource specifications for a tenant are as follows:
| Parameter | Value |
|---|---|
| Memory | 64 GB |
| CPU | 16 cores |
Testing methods
Clone the VectorDBBench code
Notice
We recommend that you deploy VectorDBBench and the OceanBase cluster on separate servers to avoid CPU resource contention and improve the reliability of test results.
Clone the VectorDBBench code to your local server.
git clone -b v0.1.0 https://github.com/wyfanxiao/VectorDBBench.git
Install dependencies
Go to the VectorDBBench directory and install the dependencies.
cd VectorDBBench
pip install .
Run the test
Run the VectorDBBench tool.
# Replace $host, $port, and $user with the actual connection information for OceanBase Database.
vectordbbench oceanbasehnsw --host $host --port $port --user $user --database test --m 16 --ef-construction 200 --ef-search 40 --k 10 --case-type Performance768D1M
For more information about the parameters of the vectordbbench oceanbasehnsw command, run the following command:
vectordbbench oceanbasehnsw --help
Commonly used parameters include:
--num-concurrency: Specifies the level of concurrency. VectorDBBench executes vector queries with the defined concurrency level and reports the highest QPS as the final result.--skip-drop-old/--skip-load: Skips the deletion of old data and the loading of new data. Adding these options ensures the command only performs vector query operations.--k: Specifies the number of top-k nearest neighbors to return in a vector query.--ef-search: Defines the size of the candidate set during an HNSW query.
FAQs
Why is the first test execution slow?
The first test may take longer because it downloads the required dataset from AWS S3 storage. This is expected behavior.
Can I modify the test code?
Yes, you can. If you customize and modify the test code, you need to run pip install . again before executing the test.
