This topic describes the recommended index design in OceanBase Database.
About indexes
OceanBase Database supports primary key indexes, unique indexes, and secondary indexes. You can define a single column for an index, or define multiple columns for an index, which is referred to as a composite index. In OceanBase Database, indexes can be divided into two types: local indexes and global indexes. The difference between a local index and a global index is that a local index shares the same partition with the partition data, while a global index occupies a separate partition.
Note
In OceanBase Database V4.0, by default, a local index is created in MySQL mode, and a global index is created in Oracle mode.
Here are examples of creating an index in MySQL mode:
Create a local index
obclient> CREATE TABLE t1(id NUMBER PRIMARY KEY,name1 VARCHAR(10),name2 VARCHAR(10)); obclient> CREATE INDEX idx_t1_name1 ON t1(name1);Note
If you do not specify
LOCALorGLOBAL, a local index is created in MySQL mode by default.Create a global index
obclient> CREATE INDEX idx_t1_name2 ON t1(name2) GLOBAL;
Index design
Requirements and recommendations on a single-value index
Follow the leftmost prefix principle when you create an index.
We recommend that you set the
NOT NULLattribute for all indexed fields in the table and define theDEFAULTvalue as needed.We recommend that you create a field (even a combined one) with unique characteristics in business as a primary key.
Note
Even if the unique index affects the insertion speed, this speed loss can be ignored, and the unique index will significantly increase the search speed. In addition, even if perfect checksum control is implemented at the application layer, as long as no unique index is created, dirty data will be generated according to Murphy's law.
The data types of the fields to be joined must be consistent. The joined fields must be indexed when you perform multi-table join queries.
Note
In multi-table join scenarios, table indexes can improve SQL performance.
Do not use the
LIKE '%...'orLIKE '%...%'clause in paging queries. If these features are necessary, use a search engine. We also recommend that you use well-filtered fields as input filters to avoid a large number of result set searches in the background.Note
The index file has the leftmost prefix matching attribute of B-Tree. If the leftmost value is not determined, this index cannot be used.
For example, for a table with the index
idx_t2_abc(a, B, c), becauseais not included in theWHEREconditionb =? and c =?, the index cannot be used. Run the following statements to create thet2table and theidx_t2_abc(a,b,c)index.obclient> CREATE TABLE t2(a NUMBER PRIMARY KEY, b INT, c VARCHAR(10)); obclient> CREATE INDEX idx_t2_abc ON t2(a,b,c);View the execution plan of the statement with the
WHEREconditionb =? and c = ?. According to the execution plan, the name column is t2, the statement does not use the index but uses a full table scan, and the cost is 409.obclient> EXPLAIN SELECT a,b,c FROM t2 WHERE b=8889 AND c='a(mbmtwm'; +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Query Plan | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | =================================== |ID|OPERATOR |NAME|EST. ROWS|COST| ----------------------------------- |0 |TABLE SCAN|t2 |1 |409 | =================================== Outputs & filters: ------------------------------------- 0 - output([t2.a], [t2.b], [t2.c]), filter([t2.b = 8889], [t2.c = 'a(mbmtwm']), access([t2.b], [t2.c], [t2.a]), partitions(p0) | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.01 sec)If you change the order of the fields in the preceding index to
idx_bca (b,c,a)oridx_cba (c,b,a), and the filter condition is stillb =? and c =?, the composite index will be used. Run the following statements to create thet2table and theidx_t2_bca(b,c,a)index.obclient> CREATE TABLE t2(a NUMBER PRIMARY KEY, b INT, c VARCHAR(10)); obclient> CREATE INDEX idx_t2_bca ON t2(b,c,a) ;View the execution plan of the statement with the
WHEREconditionb =? and c = ?. According to the execution plan, the name column ist2(idx_t2_bca), the statement uses the correct index, and the cost is significantly reduced to 46.obclient> EXPLAIN SELECT a,b,c FROM t2 WHERE b=8889 AND c='a(mbmtwm'; +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Query Plan | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ============================================= |ID|OPERATOR |NAME |EST. ROWS|COST| --------------------------------------------- |0 |TABLE SCAN|t2(idx_t2_bca)|1 |46 | ============================================= Outputs & filters: ------------------------------------- 0 - output([t2.a], [t2.b], [t2.c]), filter(nil), access([t2.b], [t2.c], [t2.a]), partitions(p0) | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in setIf the
ORDER BYclause is used, pay attention to the index orderliness to avoidfile_sort. The last field of theORDER BYclause is part of the composite index and placed at the end of the index combination to avoidfile_sortfrom affecting query performance.For example, the
WHEREcondition iswhere a=? and b=?, order by c;and the index isidx_t3_abc(a,b,c). Run the following statements to create thet3table and theidx_t3_abc(a,b,c)index.obclient> CREATE TABLE t3(a NUMBER PRIMARY KEY, b INT, c INT); obclient> CREATE INDEX idx_t3_abc ON t3(a,b,c);Look at the execution plan of the
where a=? and b=?, order by c;statement. The operator is TABLE GET, and the cost is 46.obclient> EXPLAIN SELECT a,b,c FROM t3 WHERE a=117 AND b=67176 ORDER BY c; +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Query Plan | +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ================================== |ID|OPERATOR |NAME|EST. ROWS|COST| ---------------------------------- |0 |TABLE GET|t3 |1 |46 | ================================== Outputs & filters: ------------------------------------- 0 - output([t3.a], [t3.b], [t3.c]), filter([t3.b = 67176]), access([t3.a], [t3.b], [t3.c]), partitions(p0) | +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in setFor an index with a range query, index orderliness cannot be used. For example, for
WHERE a>10 ORDER BY b, thea_bindex cannot be sorted, because the following data may already be in ascending order in the index: {a = 11, b = 2} {a = 11, b = 3} {a = 12, b = 1}. If data is output directly in sequence, b = 1 is ranked after b = 2 and b = 3, which is incorrect.We recommend that you use the covering index in table queries to avoid table access by index primary key.
Requirements and recommendations on a composite index
If an index contains multiple columns, you need to appropriately specify the order of the columns. Generally, the first column in an index should be a column with high cardinality, where the cardinality of a data column is represented by the number of unique values in the column. In other words, the leftmost column must have the highest degree of distinction.
For example, for
where a=? and b=?, almost all values in columnaare unique. In this case, you can simply create a single-value index namedidx_a.Add the columns frequently used in the
ORDER BY,GROUP BY, andDISTINCTclauses to the end of the index as a covering index to avoid table access by index primary key.For the query condition
where a=? and b=?, use the composite indexidx_ ab (a,b)instead of creating indexesidx_a(a)andidx_b(b)respectively on fieldsaandb, because the indexes created respectively on fieldsaandbcannot be used at the same time.We recommend that you design an appropriate composite index to meet queries in different scenarios to reduce index redundancy:
Multiple statements can share the same index, and some of the statements need only the prefix of the covering index. Assume that the conditions for two statements are respectively
a = ? and b = ?andb = ?. The composite indexidx_ba(b,a)can be used for both statements, without the need to separately create Indexesidx_ab(a,b)andidx_b(b).Do not use global indexes unless necessary. In particular, avoid global indexes that have small numbers of distinct values (NDVs) and are not frequently used.
If the
WHEREcondition contains a partitioning key field, you can create a joint local index on the partitioning key field and other fields in the condition.A global partitioned index applies to query scenarios where only a small amount of data is returned. In scenarios where a large amount of data is returned, we recommend that you perform a test to compare the performance of a global index and that of a local index.
Select a field with a larger NDV as the partitioning key for the global partitioned index.
We recommend that you do not update columns corresponding to the partitioning key of the partitioned table and the partitioning key of the global partitioned index. If the update operation cannot be avoided, enable the row movement feature for the partitioned table.
obclient> alter table XXX enable row movement;Avoid duplicate indexing. Redundant indexes affect the efficiency of data addition, deletion, and modification. Duplicate indexing also causes additional storage costs.
By default, the primary key is used to create indexes and unique constraints.
When index
idx_abc(a,b,c)has already been created, do not create indexesidx_a(a)andidx_ab(a,b).
Recommendations on indexes on partitioned tables
Select an index in the following precedence: local index > global partitioned index > global index. Use a global index only when necessary.
Reduce unnecessary global indexes to the minimum.
The maintenance cost of a global index is high. You need to maintain the global index each time when you add, delete, or modify data. Therefore, global indexes are not suitable for large-scale use. Use primary keys for data queries. If data uniqueness can be guaranteed within the partition, you do not need to use a global index.
Note
Global indexes can degrade the performance of data manipulation language (DML) operations because they may cause distributed transactions. If the LOCAL keyword is not added to the index definition of a partitioned table, a global index is created by default. Therefore, the syntax for defining a local index is
CREATE INDEX <index_name> ON <table_name> (column, column) LOCAL.The primary key on a partitioned table must include the partitioning key of the table.
The intersection of local unique keys on a partitioned table must include the partitioning key of the table.
Precautions for the use of indexes
Ensure that the new index has taken effect before you publish the SQL statements.
To modify an index, create a new index, ensure that the new index has taken effect, and delete the index upon confirmation that the old index is no longer needed.
When too many indexes exist, delete unnecessary indexes.
Notice
However, this operation carries several risks. Ensure that the indexes to be deleted are not used by other SQL statements.
Do not use
ALTER TABLE,ADD INDEX, orDROP INDEXin the same DDL statement.Avoid the following misunderstandings when you create indexes:
An index must be created for a single query.
An index consumes storage space and seriously slows down data update and insertion.
A unique index can be solved only at the application layer by using queries before insertion.
Precautions for the use of global indexes:
A
DROPorTRUNCATEoperation during partition O&M can cause global indexes to fail.