This topic describes how to add a proper storage model for HBase in OceanBase Database.
Prerequisites
You have connected to OceanBase Database. You can log on to OceanBase Database by using OBClient or a MySQL client. For more information, see Connect to an OceanBase tenant by using OBClient and Connect to OceanBase Database by using a MySQL client.
Storage model
Table creation syntax
Each column family in the HBase table corresponds to a relational table of the following form in the HBase database. Here is a sample SQL statement for building an HBase data table:
CREATE TABLE htable1$family1 (
K varbinary(1024),
Q varbinary(256),
T bigint,
V varbinary(1048576) NOT NULL,
PRIMARY KEY(K, Q, T));
The parameters are described as follows:
htable1specifies the name of the HBase table that is built. You can specify the name as needed.family1specifies the column family. You can specify the name as needed.Note
The table name and column family name are joined by a dollar sign ($).
The K column stores the rowkeys of the HBase table.
The Q column stores the column qualifiers.
The T column stores the timestamp version, which is the period from 1970-01-01 UTC to the current time, in milliseconds.
The V column stores values of the varbinary type, with a maximum length of 1 MB. If the length is insufficient, the longblob type can be used.
The K, Q, and T columns form a unique composite primary key.
In an HBase table created by using the preceding SQL statement, the data of several columns of a row is stored as multiple adjacent rows in the relational table. Each row stores one cell of the HBase table: <row, column family, column qualifier, timestamp, value.
Notice
HBaseAPI depends on the underlying implementation of TableAPI, and TableAPI does not support cross-partition transactions. Therefore, HBaseAPI does not support global indexes.
Template table
The table format is fixed. You can create a template table and then use the SQL statement CREATE TABLE LIKE to create other tables. Sample statement:
CREATE TABLE htable2$family2 LIKE htable1$family1;
Partitioning method
The preceding table creation procedure does not involve the partitioning method. The created table is a non-partitioned table and applies only to scenarios with a very small amount of data. Generally, HBase business tables store a large amount of data. Therefore, you must create Partitioned tables. Currently, the HBase model of OceanBase Database does not support automatic partitioning like Apache HBase. Therefore, you must select a partitioning method when you create a table. You can determine which partitioning method to use based on the following sections.
KEY partitioning
If your business system does not scan the scope, or if you only need the Get API, you can use KEY partitioning. The number of partitions must be odd, preferably a prime number. Recommended values: 97, 193, and 389. KEY partitioning features even data distribution, without data skew or hotspots. Here is an example:
CREATE TABLE htable1$family1 (
K varbinary(1024),
Q varbinary(256),
T bigint,
V varbinary(1048576) NOT NULL,
PRIMARY KEY(K, Q, T))
PARTITION BY KEY(K) PARTITIONS 97;
Virtual columns and KEY partitioning
If you must use scan in your business system but the scan scope is a fixed-length prefix (prefix scan) of the rowkey columns, you can define a virtual column on a rowkey column of the table and apply KEY partitioning for this virtual column. This way, only one partition is scanned each time. Unless in the case of prefix data skew, this partitioning method can avoid data hotspots. Here is an example:
CREATE TABLE htable1$family1 (
K varbinary(1024),
Q varbinary(256),
T bigint,
V varbinary(1048576),
K_PREFIX varbinary(1024) GENERATED ALWAYS AS (substring(K, 1, 4)),
PRIMARY KEY(K, Q, T))
PARTITION BY KEY(K_PREFIX) PARTITIONS 97;
The parameters are described as follows:
The substring(K, 1, 4) function takes the first four bytes of K (rowkey of the HBase table) as a substring. Here, the number 4 must be replaced with the prefix length based on the characteristics of the business rowkey.
RANGE partitioning
If you need both prefix scan and the Get API, you must use RANGE partitioning. We recommend that you first estimate or sample the distribution of rowkeys and then determine the range definitions. If the estimate is unreasonable, RANGE partitioning can easily cause data skew and hotspots. Therefore, you must avoid RANGE partitioning. Here is an example:
CREATE TABLE htable1$family1 (
K varbinary(1024),
Q varbinary(256),
T bigint,
V varbinary(1048576),
PRIMARY KEY(K, Q, T)
)
PARTITION BY RANGE columns (K)
(
PARTITION p0 VALUES LESS THAN ('a'),
PARTITION p1 VALUES LESS THAN ('w'),
PARTITION p2 VALUES LESS THAN MAXVALUE);
The parameters are described as follows:
The value range of the rowkeys is divided into three segments by "a" and "w".
Support for multiple column families
The native HBase supports multiple column families in a single table. OceanBase Database does not support multiple column families. However, you can build the tables (such as htable1$family1 and htable1$family2) described in the previous section to implement multiple column families on the client. We recommend that you place these tables in the same table group of OceanBase Database to avoid cross-server distributed transactions for operations across column families. For more information, see Create a table group.