This topic describes how to diagnose transactions in OceanBase Database.
Overview
Transactions in OceanBase Database can be divided into three types based on the number of participants and their locations: regular transactions, distributed transactions, and XA transactions. Regular transactions and distributed transactions are collectively referred to as non-XA transactions. Transactions can also be divided into other transactions, long-running transactions, and suspended transactions based on their execution time and statuses. Long-running transactions and suspended transactions affect the business system because they occupy resources for a long period and cause the waiting sessions to be blocked. In routine use, you must take these types of transactions seriously. The following section describes these types of transactions.
Scenarios
Long-running transactions
Transactions that remain uncommitted for a long period are referred to as long-running transactions. Regular transactions, distributed transactions, and XA transactions can all be in the long-running state. When a regular transaction or distributed transaction is in the long-running state, you can kill sessions to roll back the transaction. When an XA transaction is in the long-running state, you can terminate the transaction.
If an SQL statement in a transaction takes a long period to execute or the transaction is not committed or rolled back after the SQL statement is executed, the transaction is considered a long-running transaction. If this long-running transaction causes resources to be blocked, you can roll back the session of the long-running transaction or terminate the XA transaction.
Manage sessions by using a command-line tool
The following sample code is executed in the SYS tenant by default.
Check for any long-running transactions in the OceanBase database.
SELECT * FROM OCEANBASE.__all_virtual_trans_stat WHERE part_trans_action <= 2 AND ctx_create_time < date_sub(now(), INTERVAL 600 SECOND) AND is_exiting != 1Check whether the transaction is an XA transaction. The
svr_ipin the result returned by the preceding SQL statement is the transaction ID. If the returned result of the SQL statement is empty, the long-running transaction is not an XA transaction. Otherwise, the long-running transaction is an XA transaction.SELECT * FROM OCEANBASE.__all_virtual_global_transaction WHERE trans_id LIKE '% transaction ID%';Execute the following SQL statements to process the long-running transaction.
Execute the following SQL statement to roll back the transaction to a regular long-running transaction. The OBServer must be directly connected without using the OBProxy before you can terminate the session.
SELECT svr_ip FROM OCEANBASE.__all_virtual_processlist WHERE id=session ID; kill session ID;Execute the following SQL statement to terminate the XA transaction:
Log on to the SYS tenant to find the XID of the XA transaction.
SELECT hex(gtrid),hex(bqual),format_id FROM OCEANBASE.__all_virtual_global_transaction WHERE tenant_id = tenant ID AND format_id <> -2 AND state = 3 AND gmt_modified < date_sub(now(), INTERVAL 1800 SECOND);Log on to the user tenant in Oracle mode to find the XID of the XA transaction.
SELECT rawtohex(gtrid),rawtohex(bqual),format_id FROM sys.all_virtual_tenant_global_transaction_agent WHERE format_id <> -2 AND state = 3 AND ROUND((sysdate - cast(GMT_MODIFIED as date)) * 86400) > 1800;Terminate the XA transaction.
set serveroutput on; declare l_xid DBMS_XA_XID; l_ret PLS_INTEGER; BEGIN l_xid.formatid := format_id; l_xid.gtrid := hextoraw('hex(gtrid)'); l_xid.bqual := hextoraw('hex(bqual)'); l_ret := DBMS_XA.XA_ROLLBACK(xid = > l_xid); dbms_output.put_line(l_ret); END; /
Suspended transactions
Transactions that last for a long period are referred to as suspended transactions. This type of transactions are commonly found in the following scenarios:
The partition involving the transactions is leaderless, causing the state machine of the transactions to pause.
Though the transactions on the leaders are completed, they cannot end due to limits of resources such as memory of the standby server or network bandwidth, slow synchronization of logs, or slow replay.
The processing methods of the suspended transactions in the preceding scenarios are different.
Leaderless partition
Based on the architecture of OceanBase Database, each partition includes multiple replicas, one leader with multiple followers. When the OBServer that hosts the leader failed, the leader failed. In this case, other normal replicas reselect a leader based on the PAXOS protocol. In the reselection procedure, running transactions may become suspended.
In this scenario, perform the following steps to resolve the issue:
Execute the following SQL statement to query the information of all tables involving the current transaction. Where, each
tidinparticipantsrepresents a table.SELECT svr_ip, trans_id, participants FROM OCEANBASE.__all_virtual_trans_stat WHERE part_trans_action> 2 AND ctx_create_time < date_sub(now(), INTERVAL 600 SECOND) AND is_exiting != 1;Execute the following SQL statement to check whether the preceding tables have leaders. If the values of the
rolecolumn are followers, no leader exists.SELECT svr_ip, svr_port,table_id, partition_idx, role, status, replica_type FROM OCEANBASE.__all_virtual_clog_stat WHERE table_id=table ID;In this case, the current transaction is treated as a long-running transaction.
Slow application running
The causes of slow application running vary. Resources on an OBServer become faulty or many tasks run on the OS to consume I/O and network resources, which slows down the response of the OBServer. The number of leaders is much more than that of followers. As a result, only limited resources are available to the followers and the followers are always lagged behind their leaders. The hardware of an OBServer such as the NIC becomes faulty, which slows down the log synchronzation from the leader to followers over the Paxos protocol, and the followers are always waiting for synchronizing logs of the leader. The OBServer is undergoing a major compaction or backup.
In this scenario, perform the following steps to resolve the issue:
Execute the following SQL statement to query the information of all tables involving the current transaction. Where, each
tidinpartitionrepresents a table.SELECT svr_ip, svr_port,trans_id, 'partition' FROM OCEANBASE.__all_virtual_trans_stat WHERE part_trans_action > 2 AND ctx_create_time < date_sub(now(), INTERVAL 600 SECOND) AND is_exiting != 1;Confirm the position of the
leadercorresponding to thepkeybased on thetable_idrecorded in thepartitionqueried in step 1.SELECT svr_ip,svr_port FROM OCEANBASE.__all_virtual_clog_stat WHERE role='LEADER' AND table_id = table ID;If the
svr_ipin step 1 is different from that in step 2, the long transaction exists in a follower.Query the partition corresponding to the transaction and the number of records in the cluster.
SELECT count(1) FROM OCEANBASE.__all_virtual_trans_stat WHERE trans_id LIKE '%{Select the hash field in trans_id in step 1}%'AND`partition` LIKE '%xxx%';If only one record is returned, the long transaction exists on a standby OBServer. Contact technical support of OceanBase Database.