There are many reasons for the excessive IO pressure on the node where the observer is located. Apart from the increase in traffic, it is usually the result of overlapping factors such as migration replication and merging. The approach to handling this is usually to degrade some high IO load tasks. This article provides a detailed analysis of various situations.
Emergency handling procedure
When a node experiences high disk I/O, you can typically resolve the issue by downgrading some high I/O tasks. The following methods are available.
Pause an ongoing major compaction.
If a node experiencing high disk I/O is undergoing a major compaction, you can pause the compaction to reduce the I/O pressure. The syntax is as follows:
ALTER SYSTEM SUSPEND MERGE [ZONE [=] 'zone'];After the I/O pressure is reduced, you can resume the major compaction if needed. The syntax is as follows:
ALTER SYSTEM RESUME MERGE [ZONE [=] 'zone'];Pause an ongoing backup task.
You can view whether a node is performing a backup task in the OCP console. If a backup task is identified, you can pause it to reduce the I/O pressure.
Pause an ongoing data transmission or import/export task.
You can use the TOPSQL or session management feature in the OCP console to identify a SQL query that is writing data in batches. If you cannot quickly determine which system the batch processing task belongs to, you can also use the OMS console to check whether the node is performing a data transmission task. If yes, you can pause the task if needed to reduce the I/O pressure.
For more information, see Manage migration tasks.
Alternatively, you can use the ODC console to check whether the node is performing a data export. On the Task Center pane of the Import tab, you can view the list of tasks. After connecting to the target database, click the Task tab in the navigation pane to open the Task Center pane. On this pane, click the Import tab to view the task list. On the import tab, you can perform a Abort operation as needed. Additionally, you can manually stop scheduled tasks in other third-party big data platforms or the DataX component as needed.
Reduce the number of threads in minor compaction.
If the parallelism of a minor compaction is high, the I/O of the disks will also increase. The
compaction_high_thread_scoreparameter specifies the maximum number of threads for a minor compaction. You can reduce the parameter value to lower the disk I/O. The default value is0, which indicates adaptive. For a 64-core server, the value is generally 10. You can reduce the value as needed. The modification to this parameter takes effect immediately without the need to restart the database. The syntax is as follows:ALTER SYSTEM SET compaction_high_thread_score= 5;After modifying the parameter, you can execute the
SHOW PARAMETERSstatement to check whether the modification is successful.SHOW PARAMETERS LIKE 'compaction_high_thread_score';Reduce the degree of concurrency for data migration and replication.
If an OB cluster is experiencing high disk I/O and a data migration or replica balancing task is in progress, you can reduce the degree of concurrency for these tasks to limit the I/O.
ALTER SYSTEM SET migrate_concurrency=5; --default value is 10 ALTER SYSTEM SET server_data_copy_in_concurrency=2; --default value is 2. If the value is higher, you can set it to 2. ALTER SYSTEM SET server_data_copy_out_concurrency=2; --default value is 2. If the value is higher, you can set it to 2.Reduce the network bandwidth for background tasks.
You can execute the following command to reduce the network bandwidth for background tasks:
ALTER SYSTEM SET sys_bkgd_net_percentage=30; --default value is 60Throttle or create indexes for high-load SQL statements.
When the disk I/O is high, you can identify an SQL statement and add a hint max_concurrent to the execution plan of the SQL statement to limit the concurrency of the SQL statement, thereby throttling the I/O.
If the high I/O is caused by full table scans due to missing indexes, you can create indexes on the related tables as needed.
Cancel an ongoing index creation task.
If a major index creation task is in progress in the cluster, you can cancel it and recreate it after the cluster recovers.