This topic describes how to improve the performance of OceanBase Database by optimizing the storage of large objects (LOBs).
Background information
What is a LOB?
A LOB is a type of ultra-large text or binary data stored in a database, such as text files, images, or audio files. LOBs are commonly used to store unstructured data and provide a comprehensive data storage solution for database applications.
LOB storage modes
OceanBase Database supports two LOB storage modes:
- INROW storage: Data is stored directly within the data rows.
- Advantages: Provides excellent read performance with no additional I/O overhead.
- Disadvantages: Consumes primary table space and may reduce the access efficiency of other columns.
- OUTROW storage: Data is stored in a separate object storage.
- Advantages: Conserves primary table space and is ideal for storing extremely large objects.
- Disadvantages: Requires additional I/O operations, resulting in lower read performance.
Performance challenges
In production environments, LOBs and JSON data stored in OUTROW mode may encounter the following challenges:
- Lower read performance: Accessing data stored in OUTROW mode requires additional I/O operations, which can degrade performance.
- Challenges with JSON data types:
- Parsing JSON structures incurs higher overhead.
- Performance is negatively impacted in scenarios with frequent read and write operations.
Notice
The high overhead issues related to handling LOB and JSON columns have been resolved in OceanBase Database V4.2.2, V4.3.1, and later versions. For more information, see the "Upgrade OceanBase Database to a proper version" section in this topic.
Performance diagnostics and optimization solutions
Identify performance issues with OUTROW storage
Symptoms of performance degradation:
- Noticeable increase in query latency.
- High proportion of time spent on I/O waits.
Diagnostic methods:
- Execute the
EXPLAINstatement to analyze execution plans. - Monitor I/O-related performance metrics.
- Execute the
Optimization solutions
Implement intelligent storage strategies
The primary cause of performance degradation when handling LOB columns is that the data is stored in OUTROW mode, leading to additional data access overhead. You can configure the INROW storage threshold to balance storage space and access performance.
To avoid OUTROW storage, you can increase the INROW storage threshold by setting the ob_default_lob_inrow_threshold system variable to a larger value. Then, create a new table to verify whether the change takes effect.
Here is an example:
-- Set the INROW storage threshold to 8,192 bytes.
SET ob_default_lob_inrow_threshold = 8192;
Notice
Modifying the ob_default_lob_inrow_threshold variable does not affect the INROW storage threshold of existing tables. It only applies to newly created tables where the INROW storage threshold is not explicitly specified.
Use more appropriate data types
In practice, we have observed that using LONGTEXT and JSON data types can lead to poor storage efficiency and slow access speeds in some scenarios, particularly when complex queries are not required. If JSON computations are unnecessary, we recommend storing JSON data in VARCHAR or TEXT columns instead. Choosing smaller data types for large datasets can reduce memory usage and improve access performance.
Upgrade OceanBase Database to an appropriate version
OceanBase Database of earlier versions may lack support for new features, potentially causing performance bottlenecks. To achieve better performance in high-concurrency read/write scenarios, we recommend upgrading OceanBase Database to V4.2.2, V4.3.1, or later. These versions include optimized OUTROW storage capabilities, which enhance the performance of read/write operations on LOBs and JSON data while reducing latency.
Example
Test environment
The test environment in this example is configured as follows:
- CPU architecture: x86_64
- CPU cores: 4
- OceanBase Database version: V4.2.1 BP7
Symptom
This example involves a primary key query on a table with JSON columns, which results in slow performance. The SQL query and table creation statements are as follows:
SELECT
*
FROM
`t2_act`
WHERE
`id` = 1
AND `del_at` IS NULL
LIMIT
1;
CREATE TABLE `t2_act` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`title` varchar(128) COLLATE utf8mb4_unicode_ci NOT NULL,
`page_setting` JSON DEFAULT NULL,
`rule_setting` JSON DEFAULT NULL,
`extend` JSON DEFAULT NULL,
`del_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`)
);
To analyze the data length of the JSON columns, execute the following query:
SELECT /*+parallel(12)*/ max(length(page_setting)), min(length(rule_setting)), avg(length(extend)) FROM t2_act;
+--------------------------+--------------------------+--------------------+
| max(length(page_setting)) | min(length(rule_setting)) | avg(length(extend)) |
| 17736 | 928 | 113084 |
+--------------------------+--------------------------+--------------------+
1 row in set(0.196 sec)
The poor performance is caused by the query involving JSON columns, where the length of the JSON data exceeds the LOB storage threshold for INROW mode. This results in additional I/O operations to scan the JSON columns, significantly increasing the response time (RT).
Optimization solutions
Upgrade to OceanBase Database V4.2.2 or later to optimize overheads.
Increase the INROW storage threshold to avoid OUTROW storage. After modifying the threshold, recreate the table to ensure data is stored using the new storage mechanism.
Use VARCHAR or TEXT columns for JSON data when calculations on JSON data are not required. For example, in this scenario, JSON data is only stored and not processed, making VARCHAR or TEXT a more efficient choice.