This topic describes considerations and best practices for developing with OBKV-Table.
Notice
OBKV-Table supports only the Java client.
Usage recommendations
Batch operations
If rows in a batch are not logically related, avoid spanning partitions in a single batch when possible. The OBKV-Table client provides partition APIs. When using batch operations, you can first compute partition IDs with the partition APIs, then accumulate and submit batches per partition. This approach keeps each batch within a single machine transaction, minimizes performance overhead, and simplifies retry logic in your application.
Indexes
Follow these index usage recommendations:
- Avoid global indexes unless necessary. When you insert, update, or delete data in the primary table, maintaining global indexes incurs significant distributed transaction overhead.
- Although local indexes have lower maintenance overhead than global indexes, keep the number of local indexes under control. More indexes multiply index maintenance cost.
Partition planning
The following examples use KEY partitioning.
Partition key design
In OBKV-Table, the primary goal of partition key design is to ensure that requests in your business scenarios do not span partitions. Choose partition keys using these guidelines:
A partition key is usually the primary key or a prefix of the primary key.
- Primary key as the partition key: In an order scenario, if
orderidis the primary key and queries are always point lookups byorderid, you can useorderiddirectly as the partition key to quickly retrieve all or part of the data for a given order. - Primary key prefix as the partition key: In an order scenario, if the primary key is a composite of
useridandorderid, and queries may scan byuseridprefix to retrieve all or part of a user's orders, useuseridas the partition key.
- Primary key as the partition key: In an order scenario, if
Queries in your business scenarios must always include the partition key column(s).
Partition count considerations
Why partition count matters
- Changing the partition count requires schema changes (table DDL) and redistribution of data, which is costly.
- Do not use too few partitions: The database balancing algorithm keeps the difference in partition count between any two nodes within 1. For example, with 3 nodes and 7 partitions, the distribution is <3,2,2>, so the imbalance ratio is (3 - 2) / 3 = 33%. Assuming no data skew, partition count is roughly proportional to storage and access volume, and partition imbalance is reflected in uneven resource usage across nodes. Therefore, avoid too few partitions.
- Do not use too many partitions: Treat each partition as a mini database within the cluster. Each mini database manages its own storage and data access independently. With this structure, too many mini databases prevents optimal storage sharing, scan operators may span multiple partitions with some performance loss, and batch operations are harder to push down to storage for maximum optimization. Therefore, avoid too many partitions.
Impact of partition count on performance
In general, partition count has little impact on routine performance. However, tasks that operate at the partition granularity—such as compaction, migration, and backup—are affected. Too few partitions with too much data per partition leads to longer individual task execution times and higher space requirements.
How to choose a partition count
- For capacity planning, we recommend keeping single-replica data per partition under 100 GB when possible. When choosing a partition count, prefer prime numbers such as 23, 59, 97, 193, 389, or 997. For example, with three replicas and 24 TB of storage, single-replica data per partition is about 8 TB, so you need at least 80 partitions. Based on the recommended values above, choose 97 partitions.
- Changing the partition count requires DDL changes and incurs overhead. Plan partition count carefully upfront and avoid frequent changes. If current storage is 24 TB but you expect it to grow to 50 TB within a year, design initially based on 50 TB—for example, choose 193 partitions.
- Do not use too many partitions. We do not recommend exceeding 997.
How to detect data skew across partitions
KEY partitioning currently uses MurmurHash logic: data is hashed by the partition key and routed to the corresponding partition. Unless there is an extreme case—for example, 100 million rows where 10 million rows share the same partition key, which causes severe skew—skew is unlikely. If 100 million rows include partition keys with only thousands of rows each, skew is generally not a concern. With a large sample, hashing partition keys provides strong randomness, and data across partitions tends to be relatively even globally.
