Partitioned indexes |V3.2.4|OceanBase Database| docs|Distributed Database

Partitioned indexes

Last Updated：2023-10-24 09:23:04 Updated

In a database, you can create indexes to improve query efficiency. To improve the overall service capacity of a cluster with an increasing amount of data in tables, you can partition the tables and use a load balancer to distribute the partitions to different servers in the cluster. This topic describes local partitioned indexes, global partitioned indexes, and unique partitioned indexes.

Types of partitioned indexes

OceanBase Database provides the following types of partitioned indexes:

Local partitioned index: If you specify the LOCAL keyword in the statement to create an index for a table, the index is a local index. You do not need to specify partitioning rules for a local index, because it is automatically linked to the partitioning method for the table on which it is created. That is, a local index is equipartitioned with the underlying table.
Global partitioned index: If you specify the GLOBAL keyword in the statement to create an index for a table, the index is a global index. A global index is partitioned independently of the table on which it is created.
Unique index: If you specify the UNIQUE keyword in the statement to create an index for a table, the index is a unique index. A unique index ensures that no two rows of a table have duplicate values in the index key column or columns.
Prefixed index: If the partition key of an index is a left prefix of the index columns, the index is a prefixed partitioned index, or prefixed index for short. Do not confuse prefixed indexes with prefix indexes in MySQL for long string columns. For example, assume that you have created an index idx on the c1 and c2 columns. If the partition key of the index is defined on the c1 column, this index is a prefixed index. If the partition key of the index is defined on the c2 or other columns, this index is a non-prefixed index.
Non-prefixed index: A partitioned index that is not prefixed is a non-prefixed index.

Local partitioned indexes

Local partitioned indexes are divided into local prefixed indexes and local non-prefixed indexes.

Local prefixed indexes

If a local index is partitioned on a left prefix of the index columns and the subpartioning key is included in the index key, it is a local prefixed index. Local prefixed indexes can be unique or nonunique.

A query with a specified index key can use a local prefixed index to uniquely locate an index partition. Local prefixed indexes are suitable for scenarios with small result sets and partition pruning requirements.

For example, assume that the local index idx(c1,c2,c3) is created for Table A that is partitioned on the c1 column. The index is always equipartitioned with the underlying table, so the local index idx(c1,c2,c3) is also partitioned on the c1 column. In this case, this index is a local prefixed index. A query with a specified index key can uniquely locate an index partition based on the value of the partition key column c1. This significantly reduces the number of index partitions to be scanned.

Local non-prefixed indexes

If a local partitioned index is not a local prefixed index, it is a local non-prefixed index. Likely, the partition key of the underlying table of the index is not on a left prefix of the index columns, or the subpartioning key is not included in the index key.

If the partition key columns are not a subset of the index columns, the local non-prefixed index cannot be unique. A query with a specified index key cannot locate the target index partition by using a local non-prefixed index. In this case, all index partitions are scanned before the result is returned. Therefore, local non-prefixed indexes are suitable for scenarios where you want to implement parallel execution to access a large amount of data.

For example, assume that the index idx(c1,c2,c3) is created for Table A that is partitioned on the c4 column. This index is a local non-prefixed index based on the definition. A query with a specified index key cannot use the local non-prefixed index to locate the target index partition based on the partition key. In this case, all partitions are scanned before the result is returned. Parallel execution is suitable for such a scenario.

Global partitioned indexes

A global index is partitioned independently of the underlying table. You can split and merge partitions of a global partitioned index. Generally, if a global index, unless unique and non-prefixed, is equipartitioned with the underlying table, we recommend that you define it as a local index. A global partitioned index requires more resources for partition management and maintenance. In terms of query costs and partition pruning efficiency, a local partitioned index achieves the same results as a global partitioned index if they are partitioned in the same way.

Global prefixed indexes

A global partitioned index is prefixed if it is partitioned on a left prefix of the index columns.

Global prefixed indexes can be unique or nonunique.

Global prefixed indexes are useful when they are partitioned by range and are not useful when partitioned by hash. If the index is hash partitioned, you will most likely specify the index key in your query. If the partition key columns are a subset of the index key columns, the target index partition can be located based on the index key values that you specified, regardless of whether the index is prefixed. If the partition key columns are not a subset of the index key columns, in other words, if you do not specify all partition key values, the query that uses a hash-partitioned index must scan all index partitions. However, a query that uses a range-partitioned index always allows for index partition pruning.

Global non-prefixed indexes

OceanBase Database does not support global non-prefixed indexes, because they do not help improve query efficiency.

For example, assume that a global index idx(c1,c2) is created for Table A and is partitioned on the c2 column. Then, the index idx(c1,c2) is a global non-prefixed index. In this case, partition pruning can be performed only when all index key values are specified in a user query. Otherwise, all index partitions are scanned before the result is returned. Therefore, you have every reason to partition the index on the c1 column. This allows the query to eliminate unneeded partitions based on c1, the prefix of the index.

Unique partitioned indexes

Unique partitioned indexes can be global or local.

For a local index to be unique, the index partition key columns must be a subset of the index key columns. For example, assume that a unique index idx(c1,c2,c3) is created for Table A and is partitioned on the c1 and c2 columns. In this case, rows with the same values in the c1 and c2 columns are distributed in the same partition. This ensures that no identical rows exist in the same partition. If the index idx(c1,c2,c3) is partitioned on the c2 and c4 columns, the partition key columns are not a subset of the index key columns. You cannot create a local index to ensure that the values in the index key columns are unique among partitions. In this case, a local unique index cannot be created.

Index selection

We recommend that you create indexes that best suit your business requirements in aspects such as user query modes, index management, performance, and availability.

If you want a unique index and the index key columns include all partition key columns of the underlying table, create a local index. Otherwise, create a global index.
If the partition key columns are a subset of the index key columns, you can create a local index.
If the index is equipartitioned with the underlying table, we recommend that you create a local index.
If you are concerned about the management costs of the index partitions and the partitions of the underlying table are frequently dropped, we recommend that you avoid creating a global index. Frequent dropping of partitions significantly changes the global index. The changes are irreversible and may cause the index to become unavailable.
If the partition key values of the index are always specified in user queries, create an index on columns other than the partition key columns. This reduces the maintenance and storage costs. If the partition key values of the index are not specified in user queries, create a prefixed index that is suitable for scenarios that have a small amount of data and require partition pruning. Non-prefixed indexes are suitable for scenarios that have a large amount of data and support the parallel execution of queries.

Partitioning method selection

For a single index table that contains a huge amount of data, you can partition the index to facilitate parallel queries among partitions and balance the load.

If the index key is specified in most user queries, you can use the hash or range partitioning method. The two partitioning methods are the same in terms of the number of partitions to be scanned and the performance of parallel execution. However, if you have data hotspots in your table, hash partitioning is preferred because it can better prevent data hotspots.
For range queries where the index key is specified, range partitioning is more suitable than hash partitioning in terms of the number of partitions to be scanned. However, hash partitioning can provide better performance when the result set is huge because it allows parallel queries.

Selection of indexes for queries

If partition pruning is required, a prefixed index is preferred. You can specify the partition key, a left prefix of the index columns, in the filtering condition of a query to facilitate partition pruning and minimize the number of partitions to be scanned.
For a scenario that involves a large amount of data, you can use a non-prefixed index to improve the query performance, because it allows parallel execution of range queries on the partition key columns. A local non-prefixed index allows the parallel scan of all partitions, whereas a local prefixed index eliminates unneeded partitions. Queries that use local prefixed indexes can process a large amount of data by scanning a limited number of partitions, but they may take a longer time than queries that allow parallel execution.