This topic describes how to deploy OceanBase Database, create a business tenant, and run the Sysbench benchmark on OceanBase Database to achieve higher performance. Some cases about performance issues are provided for your reference.
Quick troubleshooting procedure for Sysbench issues
TPS of Sysbench is lower than expected
Perform the following steps to locate the cause:
Analyze Sysbench parameters.
We recommend that you start with parameters in the Sysbench running command.
If the
rand-typeparameter is not specified or its value is set tospecial, you can set or change its value touniformto reduce data lock contention.If the number of tables or table size is too small, you can increase the size of the dataset. For example, you can set the value of the
tablesparameter to100and the value of thetable_sizeparameter to1000000.In the
READ_WRITEorWRITE_ONLYscenario, execute the following statement to check whether deadlocks exist:SELECT * FROM oceanbase.cdb_ob_deadlock_event_history ORDER BY create_time DESC LIMIT 10;
Check whether the connection meets the testing requirements.
- If the primary zone is a single zone and you are connected to an OBServer node, check whether this node is a leader.
- If the primary zone is set to
RANDOM, connect to an OceanBase Database Proxy (ODP) to run the benchmark.
Query the tenant information.
SELECT * FROM oceanbase.dba_ob_tenants;Query the tenant leader.
SELECT svr_ip, sql_port, count(1) FROM oceanbase.cdb_ob_ls_locations WHERE tenant_id=1 AND role='leader' GROUP BY svr_ip;
Check the CPU overhead.
If the Sysbench parameters and client connection method are correct, you can check the CPU utilization.
Check the CPU utilization of threads.
top -H -p pidof observerIf each business thread can almost fully use a CPU core, check whether the number of threads approximates the number of CPU cores.
- If the business threads, such as T1002_L0_G2, are far fewer than CPU cores and the total CPU overhead is far below the upper limit of the server, the issue is probably caused by too few CPU cores. In this case, you can check the tenant creation command and check whether the value of the
cpu_countparameter is too small. - If the total CPU overhead approaches the upper limit of the server, the server has reached its performance limit. In this case, you must run Sysbench on a server with higher specifications to achieve higher performance.
- If the business threads, such as T1002_L0_G2, are far fewer than CPU cores and the total CPU overhead is far below the upper limit of the server, the issue is probably caused by too few CPU cores. In this case, you can check the tenant creation command and check whether the value of the
Check the tenant specifications.
show create tenant xxx; SELECT * FROM oceanbase.dba_ob_resource_pools; SELECT * FROM oceanbase.dba_ob_units;If no single business thread fully uses a CPU core, for example, if the CPU utilization is 20% to 50%, a bottleneck occurs somewhere else.
- If a network thread, such as RpcIO, has exhausted a CPU core, the number of network threads may be too small and network communication becomes a bottleneck. In this case, you must increase the value of the
net_thread_countparameter. - If none of the threads shows high CPU utilization, a hardware bottleneck may have occurred and further analysis is required.
- If a network thread, such as RpcIO, has exhausted a CPU core, the number of network threads may be too small and network communication becomes a bottleneck. In this case, you must increase the value of the
Analyze the hardware causes.
If you confirm that the issue is not caused by CPU-related factors, perform the following steps to check the disk and network:
- Check the maximum bandwidth of the NIC and verify whether the network bandwidth has become a bottleneck during a test. You can deploy Sysbench on an OBServer node for testing.
- Check the network latency by using the ping command. A high latency may lead to slow SQL execution.
- Check the maximum bandwidth for 16 KB writes in the log disk by using a flexible I/O (FIO) tester. Verify whether the disk I/O is a bottleneck during testing. You can replace the SSD or non-volatile memory (NVM) for testing.
Run the following command to query the network I/O:
sar -n DEV 1Run the following command to query the maximum bandwidth of the NIC:
## (eth0 is the NIC name.) ethtool eth0Analyze other possible causes.
If the issue is not caused by the preceding factors, you can check some special cases. For example, you can check whether the test period conflicts with backend tasks of the OBServer node.
A customer reported that two sets of data were below expectations during a business test. No obvious issue related to configurations was found. It turned out that the customer performed the test at 2:00 in the morning, at which OceanBase Database automatically performed daily compactions. Therefore, the database performance was significantly affected.
TPS of Sysbench fluctuates and drops to 0 sometimes
TPS occasionally decreases by a roughly constant amount
Possible causes: OceanBase Database periodically performs minor and major compactions, which consumes CPU resources and affects business threads. A tenant that has more CPU cores suffers less impact. For a tenant with 4 CPU cores and 8 GB of memory, the TPS may decrease by 80% during minor compactions. For a tenant with 32 CPU cores and 64 GB of memory, the TPS may decrease by 10% during minor compactions.
Solutions:
Increase the value of the
max_cpuparameter of the resource unit for the tenant.Increase the CPU concurrency for the tenant.
alter system set cpu_quota_concurrency=4 tenant = xxx;Reduce the threads intended for minor compactions by setting the
merge_thread_countparameter or lower the thread priority by setting thecompaction_high_thread_score,compaction_mid_thread_score, andcompaction_low_thread_scoreparameters. This can reduce the unit time overhead of minor compactions but increase the time required for minor compactions. The overall overhead of minor compactions remains unchanged.
TPC decreases to 0 and occasionally increases to more than 0
- Possible causes: A deadlock occurs and the value of the
mysql-ignore-errorsparameter is set to1062,1213,6002for Sysbench. - Solutions:
Check whether a deadlock exists.
SELECT * FROM oceanbase.cdb_ob_deadlock_event_history ORDER BY create_time DESC LIMIT 10;If yes, perform the following steps:
- Set the value of the
rand-typeparameter touniformfor Sysbench. - Increase the size of the dataset. For example, you can set the value of the
tablesparameter to100and the value of thetable_sizeparameter to1000000. The default value of thetable_sizeparameter is10000.
- Set the value of the
TPS decreases to 0 and remains 0 afterward
- Possible causes: A deadlock occurs and is not detected. In addition, the timeout period is set to a large value. As a result, after the TPS decreases to 0, no error is returned, the thread does not exit, and the TPS does not return to normal.
- Solutions:
Check whether a deadlock exists.
SELECT * FROM oceanbase.cdb_ob_deadlock_event_history ORDER BY create_time DESC LIMIT 10;If yes, perform the following steps:
- Set the value of the
rand-typeparameter touniformfor Sysbench. - Increase the size of the dataset. For example, you can set the value of the
tablesparameter to100and the value of thetable_sizeparameter to1000000. The default value of thetable_sizeparameter is10000.
- Set the value of the
TPS decreases to 0 and returns to normal after a few seconds
Possible causes: The clog disk is full. As a result, the TPS decreases to 0 and returns to normal after the space on the clog disk is recycled.
Solutions: Check the disk usage of the tenant.
SELECT tenant_id, svr_ip, svr_port, log_disk_in_use, log_disk_size FROM oceanbase.gv$ob_units;Allocate more disk space to the tenant if the disk space is insufficient.
If the issue persists, contact OceanBase Technical Support.
Sysbench errors
TPS decreases to 0 and occasionally increases to more than 0, and the 6002 or 1213 error is returned sometimes
- Possible causes: A deadlock occurs, the size of the dataset is too small, or the default value
specialof therand-typeparameter is used, which causes centralized keys. The thread does not exit because most SQL requests are in a deadlock. In this case, the thread will wait for the SQL requests to complete execution. - Solutions:
Check whether a deadlock exists.
SELECT * FROM oceanbase.cdb_ob_deadlock_event_history ORDER BY create_time DESC LIMIT 10;If yes, perform the following steps:
- Set the value of the
rand-typeparameter touniformfor Sysbench. - Increase the size of the dataset. For example, you can set the value of the
tablesparameter to100and the value of thetable_sizeparameter to1000000. The default value of thetable_sizeparameter is10000.
- Set the value of the
Error 1062 is returned
- Possible causes: The 1062 error indicates primary key or unique key conflicts. Primary key conflicts are possible because Sysbench generates data randomly.
- Solutions: Set the value of the
mysql-ignore-errorsparameter to1062for Sysbench.
Error 1213 is returned
- Possible causes: A deadlock occurs and some transactions involved in the deadlock are terminated.
- Solutions: A deadlock may occur even if the dataset is very large and the value of the
rand-typeparameter is set touniform. Therefore, you can set the value of themysql-ignore-errorsparameter to1062,1213to ignore deadlocks.
Error 6002 is returned
- Possible causes: The 6002 error may be caused by a deadlock.
- Solutions: Increase the size of the dataset and set the
rand-typeparameter touniformto reduce lock contention. Set the value of themysql-ignore-errorsparameter to1062,1213,6002for Sysbench.
Error 4012 is returned
Possible causes: A deadlock occurs due to severe lock contention, or the OBServer node has poor performance but receives heavy workload from the client.
Solutions: Increase the timeout period, in μs.
set global ob_query_timeout = 50000000000;
Error 5930 is returned
Possible causes: The value of the
db-ps-modeparameter is set toauto, which means that SQL precompilation is enabled. However, OceanBase Database does not fully support precompilation.Solutions:
- Increase the value of the
open_cursorsparameter. For example, executealter system set open_cursors=65535;. - Set the value of the
db-ps-modeparameter todisableto disable SQL precompilation.
- Increase the value of the
Other errors
- Possible causes: OceanBase Database may have bugs.
- Solutions: Record the error logs and contact OceanBase Technical Support for troubleshooting.
Complete troubleshooting procedure
If you cannot locate the cause of an issue in the quick troubleshooting procedure, you can perform the complete troubleshooting procedure. If you have any questions, you can see other sections of this topic or the reference topics by clicking the corresponding links. If the issue persists, contact OceanBase Technical Support for troubleshooting.
Here is the general analysis procedure:
- Check the hardware resources. If CPU, memory, or disk resources are insufficient, the performance is affected, and the TPS may even decrease to 0.
- Check the tenant configurations. You cannot make full use of the hardware with insufficient resources.
- Check the Sysbench parameters. The TPS is affected by the number of threads. A small dataset or centralized data distribution causes lock contention and even deadlocks.
- Check whether OBServer nodes have bugs.
Note
Run the commands in the following sections as needed.
Check the hardware
Insufficient hardware resources, such as less than 4 CPU cores and 8 GB of memory, lead to poor performance. If the disk usage is close to 100%, the write speed is affected. A high network latency also affects the performance.
- CPU: Run the
lscpucommand to query the CPU information. - Memory: Run the
topandcat /proc/meminfocommands to query the information about the memory usage. - Disk: Run the
df -hcommand to query the disk space. - Network latency: Run the
ping ipcommand to check how long it takes for packets to travel to a specified IP address and back.
Check OceanBase Database deployment configurations
If the deployment parameters of OceanBase Database are set to small values when hardware resources on the server are sufficient, resources cannot be fully used and the performance deteriorates. Most parameters can be dynamically adjusted and only a few parameters must be specified during deployment.
Connect to the
systenant and query theGV$OB_SERVERSview.SELECT * FROM GV$OB_SERVERS;Query the number of network threads. The
net_thread_countparameter cannot be dynamically adjusted.show parameters where name = 'net_thread_count';Query deployed clusters.
obd cluster listQuery the configuration of a deployed cluster.
obd cluster edit-config [cluster name]
Query the startup parameters of the OBServer node on the server where it resides.
ps -ef | grep observer
Check tenant configurations
Check whether any resource bottleneck exists in the tenant when OBServer nodes have sufficient resources.
Query the basic information about the tenant.
SELECT * FROM oceanbase.dba_ob_tenants;Query the tenant creation command.
show create tenant [tenant name];Query the resource pool of the tenant. You can add the
tenant_idfilter condition in the command.SELECT * FROM oceanbase.dba_ob_resource_pools;Query the specifications of the corresponding resource unit.
SELECT * FROM oceanbase.dba_ob_units;
Check Sysbench parameters
You can check the parameters one by one in Sysbench commands based on the parameter descriptions.
Locate a deadlock
Query deadlock records.
SELECT * FROM oceanbase.cdb_ob_deadlock_event_history ORDER BY create_time DESC LIMIT 10;role=victimindicates that the transaction was terminated.Query transaction-related SQL statements.
SELECT usec_to_time(REQUEST_TIME), TRACE_ID, TX_ID, QUERY_SQL FROM gv$ob_sql_audit WHERE TX_ID=52635 ORDER BY REQUEST_TIME;If no related transaction is found, SQL audit is disabled. You can verify this by checking whether the value of the
enable_sql_auditparameter isTrue.show parameters where name = 'enable_sql_audit';If the value is
False, you cannot view SQL statements related to the transaction. You can filter the logs by transaction ID to find some information.Query logs related to the problematic SQL statement.
After you obtain SQL statements related to the transaction, filter logs by the trace ID of the conflicting SQL statement.
grep '[trace id]' observer.log*You can view all logs related to this SQL statement. Generally, the last log contains the error code returned to the client.
sending error packet(err=-4101You can also search for related logs by the trace ID of the last
commitorbegincommand.sending error packet(err=-6002If no log is found, the log level is incorrect or log throttling is enabled. In this case, you must change the log level to
WDIAGor increase the bandwidth limit for logs.alter system set syslog_level='WDIAG'; alter system set syslog_io_bandwidth_limit='2G';Notice
You cannot search for information about deadlocks that occur before you modify the parameters.
Analyze minor and major compactions
Query minor and major compaction tasks that take more than 5s to execute.
SELECT * FROM GV$OB_MERGE_INFO WHERE tenant_id=1002 AND (END_TIME-START_TIME)>5 LIMIT 10;Query the major compaction records of a table.
SELECT * FROM GV$OB_TABLET_COMPACTION_HISTORY WHERE TABLET_ID IN (SELECT TABLET_ID FROM oceanbase.CDB_OB_TABLE_LOCATIONS WHERE TABLE_NAME = 'sbtest1') ORDER BY START_TIME DESC;Initiate a minor compaction.
alter system minor freeze tenant=xxx;
Run the top -H command on the OBServer node to view the overhead of the MINI_MERGE and MINOR_EXE threads. The overhead information is displayed only for a short period.
Query slow SQL statements
Query SQL statements that take more than 5s to execute and print all the information.
SELECT * FROM gv$ob_sql_audit WHERE elapsed_time > 5000000 LIMIT 10;Specify the key information to print as needed.
SELECT SVR_IP, SVR_PORT, SQL_EXEC_ID, TRACE_ID, TENANT_ID, SQL_ID, QUERY_SQL, PLAN_ID, PARTITION_CNT, IS_INNER_SQL, ELAPSED_TIME, EXECUTE_TIME, APPLICATION_WAIT_TIME FROM gv$ob_sql_audit WHERE elapsed_time > 5000000 ORDER BY elapsed_time DESC LIMIT 10;If no data is found, SQL audit is disabled.
show parameters where name = 'enable_sql_audit'; alter system set enable_sql_audit = true;
Troubleshoot log issues
Query disk usage.
SELECT tenant_id, svr_ip, svr_port, LOG_DISK_IN_USE, LOG_DISK_SIZE FROM gv$ob_units;
Reproduce an issue
Clear earlier SQL audit records.
alter system set enable_sql_audit = false; alter system flush sql audit global;Enable SQL audit.
alter system set enable_sql_audit = true;Change the log level.
alter system set syslog_level='WDIAG';Decrease the SQL recording threshold to record more SQL statements.
alter system set trace_log_slow_query_watermark = '10ms';Increase the bandwidth available for syslogs to avoid losing key logs due to throttling.
alter system set syslog_io_bandwidth_limit='2G';Increase the transaction timeout period if you do not want transactions to time out.
set global ob_query_timeout = 50000000000;
Set the same parameters as in the scenario of the issue and run Sysbench.
Configure OceanBase Database deployment parameters
Deployment process
Deployment process of OceanBase Database Community Edition
For more information about how to deploy the OceanBase Database Community Edition by using OceanBase Deployer (obd), see Deploy OceanBase Database in a production environment by using the CLI.
Deployment process of OceanBase Database Enterprise Edition
- For more information about how to deploy the OceanBase Database Enterprise Edition, see Deployment process.
Deployment parameters
You must manually modify the .yaml file unless you are using a quick installation process. OceanBase Database provides many parameters. This topic describes only performance-related parameters. For more information, see the deployment documentation.
- To query the parameters, run the
show parameters where name = 'xxx';command. - To query key server parameters, run the
SELECT * FROM GV$OB_SERVERS;command.
Take note of the following parameters:
memory_limit
- Description: the maximum memory for the OBServer node. This parameter has a higher priority than the
memory_limit_percentageparameter. - Default value: 0.
- Recommended value: 80% to 90% of the total system memory.
- Dynamic modification statement:
alter system set memory_limit='32G';. - Query statement:
show parameters where name = 'memory_limit';.
- Description: the maximum memory for the OBServer node. This parameter has a higher priority than the
memory_limit_percentage
- Description: the percentage of memory for the OBServer node to the total system memory. This parameter is valid only when the value of the
memory_limitparameter is0. - Default value: 80.
- Recommended value: 80 to 90.
- Dynamic modification statement:
alter system set memory_limit_percentage=80;. - Query statement:
show parameters where name = 'memory_limit_percentage';.
- Description: the percentage of memory for the OBServer node to the total system memory. This parameter is valid only when the value of the
system_memory
- Description: the memory for the
systenant, which is part of the value specified by thememory_limitparameter. If you do not specify this parameter, a value is automatically set. - Default value: 30 GB for the Enterprise Edition and 0 MB for the Community Edition.
- Recommended value: 10% of the value specified by the
memory_limitparameter. - Dynamic modification statement:
alter system set system_memory='2G';. - Query statement:
show parameters where name = 'system_memory';.
- Description: the memory for the
cpu_count
- Description: the total number of CPU cores for the OBServer node.
- Default value:
0. The system automatically detects the number of CPU cores. - Recommended value: the total number of CPU cores of the system. You can run the
lscpucommand to query the total number of CPU cores. - Dynamic modification statement:
alter system set cpu_count=24;. - Query statement:
show parameters where name = 'cpu_count';.
net_thread_count
- Description: the number of network threads.
- Default value:
0. The number of network I/O threads is max(6, CPU_NUM/8). - Recommended value: 1/8 of the total number of CPU cores. If the execution time of transactions is short and the network overhead is high, you can set the value of this parameter to 20% of the total number of CPU cores.
- This parameter can only be specified when you deploy an OBServer node, and it cannot be dynamically modified.
- Query statement:
show parameters where name = 'net_thread_count';.
sql_net_thread_count
datafile_size
- Description: the total size of data files. This parameter has a higher priority than the
datafile_disk_percentageparameter. - Default value: 0.
- Recommended value: 60% of the disk space if data files are stored on the same disk as logs, and 90% of the disk space if data files are stored on an exclusive disk.
- Dynamic modification statement:
alter system set datafile_size='80G';. - Query statement:
show parameters where name = 'datafile_size';.
- Description: the total size of data files. This parameter has a higher priority than the
datafile_disk_percentage
- Description: the percentage of disk space occupied by data files to the total space of the disk where data files are stored.
- Default value: 0.
- Recommended value: 60% of the disk space if data files are stored on the same disk as logs, and 90% of the disk space if data files are stored on an exclusive disk.
- Dynamic modification statement:
alter system set datafile_disk_percentage=60;. - Query statement:
show parameters where name = 'datafile_disk_percentage';.
log_disk_size
- Description: the total size of redo log files. This parameter has a higher priority than the
log_disk_percentageparameter. - Default value: 0.
- Recommended value: 30 GB if redo log files are stored on the same disk as data files, and 90 GB if redo log files are stored on an exclusive disk.
- Dynamic modification statement:
alter system set log_disk_size='40G';. - Query statement:
show parameters where name = 'log_disk_size';.
Reference: log_disk_size.
- Description: the total size of redo log files. This parameter has a higher priority than the
log_disk_percentage
- Description: the percentage of the disk space occupied by redo log files to the total space of the disk where the redo log files are stored.
- Default value: 0.
- Recommended value: 30% if redo log files are stored on the same disk as data files, and 90% if redo log files are stored on an exclusive disk.
- Dynamic modification statement:
alter system set log_disk_percentage=30;. - Query statement:
show parameters where name = 'log_disk_percentage';.
Configuration files
This section shows the configuration file for the 32C128G specification, in which the disk space is sufficient. Assume that three OBServer nodes and one ODP are deployed. Other components such as OceanBase Agent (OBAgent) are not considered. Each OBServer node and ODP are deployed on different servers to make full use of the hardware resources to achieve higher performance.
oceanbase:
servers:
* name: server1
ip: xxx.xxx.1.1
* name: server2
ip: xxx.xxx.1.2
* name: server3
ip: xxx.xxx.1.3
server1:
mysql_port: 2881
rpc_port: 2882
home_path: /data/1/user_name/observer1
zone: zone1
server2:
mysql_port: 2881
rpc_port: 2882
home_path: /data/1/user_name/observer2
zone: zone2
server3:
mysql_port: 2881
rpc_port: 2882
home_path: /data/1/user_name/observer3
zone: zone3
include: obd/observer.include.yaml
global:
devname: eth0
memory_limit: '100G'
system_memory: '2G'
datafile_size: '80G'
cpu_count: '32'
net_thread_count: 6
obproxy:
servers:
* xxx.xxx.1.5
depends:
* oceanbase
include: obd/obproxy.include.yaml
global:
listen_port: 2891
prometheus_listen_port: 2892
home_path: /data/1/user_name/obproxy
Configure business tenant parameters
Create a tenant
In OceanBase Database, you can create only user tenants. The sys tenant is automatically created when you create a cluster. The creation of a user tenant involves a series of operations. First, create a unit config, then create a resource pool based on the unit config, and finally create a tenant and specify the resource pool for it. The creation is based on the following sequence: unit config > resource pool > tenant.
For more information about how to create a tenant, see Create a tenant.
Tenant creation command
Note
If you use obd for automatic deployment, only the `sys` tenant is created. You can create a dedicated tenant for performance testing to make full use of resources available on the OBServer node.
The following command creates a tenant on a server with 32 CPU cores and 128 GB of memory.
sh create_perf_tenant.sh ip port
mysql -c -h $1 -P $2 -uroot -e "drop resource unit unit_1;"
mysql -c -h $1 -P $2 -uroot -e "create resource unit unit_1 max_cpu 24, min_cpu 16, memory_size 64000000000, log_disk_size 40000000000, max_iops 10000000;"
mysql -c -h $1 -P $2 -uroot -e "create resource pool pool_2 unit = 'unit_1', unit_num = 1, zone_list = ('zone1','zone2','zone3');"
mysql -c -h $1 -P $2 -uroot -e "create tenant perf replica_num = 3,primary_zone='zone1', resource_pool_list=('pool_2') set ob_tcp_invited_nodes='%';"
mysql -c -h $1 -P $2 -uroot@perf -e "set global ob_query_timeout = 50000000000;"
mysql -c -h $1 -P $2 -uroot@perf -e "set global ob_trx_timeout = 50000000000;"
mysql -c -h $1 -P $2 -uroot@perf -e "set global ob_plan_cache_percentage = 20;"
mysql -c -h $1 -P $2 -uroot@perf -e "set global binlog_row_image='MINIMAL';"
mysql -c -h $1 -P $2 -uroot -e "alter system set enable_early_lock_release=true tenant = perf;"
mysql -c -h $1 -P $2 -uroot -e "alter system set cpu_quota_concurrency=2 tenant = perf;"
If you do not specify the tenant mode in the tenant creation command, a MySQL tenant is automatically created. To create an Oracle tenant, specify the tenant mode to Oracle.
Execute the following statement in the sys tenant. The COMPATIBILITY_MODE field specifies the tenant mode.
SELECT * FROM DBA_OB_TENANTS;
Key tenant parameters
The following parameters are key to the performance of a tenant:
min_cpu
- Description: the minimum number of CPU cores available for the tenant. The sum of values of the
min_cpuparameter of all tenants must be less than or equal to the value of thecpu_countparameter. - Recommended value:
cpu_count/2. - Reference: For more information about the
min_cpuparameter, see the parameter description in the Create a unit config section of the Create a tenant topic.
- Description: the minimum number of CPU cores available for the tenant. The sum of values of the
max_cpu
- Description: the maximum number of CPU cores available for the tenant. The sum of values of the
max_cpuparameter of all tenants must be less than or equal to the value of the product of thecpu_countandresource_hard_limitparameters divided by 100. - Recommended value range:
cpu_count/2 tocpu_count. You can also set this parameter to the same value as themin_cpuparameter. - Reference: For more information about the
max_cpuparameter, see the parameter description in the Create a unit config section of the Create a tenant topic.
- Description: the maximum number of CPU cores available for the tenant. The sum of values of the
cpu_quota_concurrency
- Description: the number of concurrent threads for each CPU core. The maximum number of CPU cores available for the tenant, multiplied by the number of concurrent threads for each CPU core, must be the same as the maximum number of business threads.
- Recommended value:
4. Generally, we recommend that you use the default value. You can change the value to2for CPU-intensive stress testing.
memory_size
- Description: the size of memory provided by the resource unit.
- Recommended value: 50% to 80% of the total memory. If you want to create multiple resource units, the memory is flexibly allocated.
log_disk_size
- Description: the size of the log disk.
- Recommended value: 40 GB to 80 GB. You can set a larger value, depending on the data amount.
primary_zone
Description: the priority of allocating leaders to zones. Zones on the left and right sides of a comma share the same priority. A zone on the left side of a semicolon has a higher priority than that on the right side. Example:
zone1,zone2;zone3.Recommended value:
- If the three servers have the same specifications and similar network latency, you can set the value to
RANDOMto improve the performance. In this case, leaders are evenly distributed. - If a server has higher performance or if you want to facilitate statistics collection and data observation, you can set the primary zone to the zone where the server resides, such as
zone1.
- If the three servers have the same specifications and similar network latency, you can set the value to
Configure global parameters in the sys tenant
You can connect to the sys tenant and modify cluster-level configurations. For example, you can change the log level or disable features that affect the performance, such as SQL audit.
ALTER system SET enable_sql_audit=false;
select sleep(5);
ALTER system SET enable_perf_event=false;
ALTER system SET syslog_level='ERROR';
alter system set enable_record_trace_log=false;
Configure Sysbench parameters
Sysbench test process
Install and deploy Sysbench
Reference: For more information, see the Install Sysbench section in the Run the Sysbench benchmark on OceanBase Database topic.
Prepare data
cleanup: sysbench oltp_read_write.lua --mysql-host=x.x.x.x --mysql-port=xxxx --mysql-db=test --mysql-user=test@xxx --mysql-password=xxx --table_size=1000000 --tables=30 --rand-type=uniform --threads=1000 --report-interval=1 --time=600 cleanup prepare: sysbench oltp_read_write.lua --mysql-host=x.x.x.x --mysql-port=xxxx --mysql-db=test --mysql-user=test@xxx --mysql-password=test --table_size=1000000 --tables=30 --rand-type=uniform --threads=1000 --report-interval=1 --time=600 prepareRun the test
sysbench oltp_read_write.lua --mysql-host=x.x.x.x --mysql-port=xxxx --mysql-db=test --mysql-user=test@xxx --mysql-password=xxx --table_size=1000000 --tables=30 --rand-type=uniform --threads=1000 --report-interval=1 --time=600 --db-ps-mode=disable run
Key performance parameters
Reference: Sysbench parameters.
table_size
- Description: the number of rows initiated for each table. The default value is
10000. - Recommended value:
100000or1000000.
- Description: the number of rows initiated for each table. The default value is
tables
- Description: the number of initialized tables.
- Recommended value: 30 to 100.
threads
- Description: the number of test threads.
- Recommended value: 500 to 2,000. More CPU cores on the test server support more threads.
rand-type
Description: the random type for test data generation. Valid values are
uniform,gaussian,special, andpareto. The default value isspecial.special: keys are centralized, and lock contention easily occurs.uniform: data is discretely and evenly distributed. All values within the range have an equal probability of occurrence and no lock contention occurs.gaussian: data is centralized near medians. This is also known as normal distribution.pareto: data is centralized near the smallest values. A larger value has a lower probability of occurrence.
Recommended value:
uniform.mysql-ignore-errors
- Description: the error codes ignored during the test process. When Sysbench receives an error, it immediately stops sending new requests. After Sysbench receives responses to all requests, it exits the test. Therefore, you can ignore error codes related to expected errors.
- Valid values:
1062,1213, and6002.1062indicates a primary key or unique key conflict.1213indicates a deadlock.6002indicates a transaction rollback.
db-ps-mode
- Description: specifies whether to enable SQL precompilation. Valid values are
autoanddisable. The default value isdisable. - Recommended value:
disable. - If you set the value to
auto, you must modify theopen_cursorsparameter. For example, you can run thealter system set open_cursors=65535;command to set the value of theopen_cursorsparameter to65535.
- Description: specifies whether to enable SQL precompilation. Valid values are
skip_trx
- Description: specifies whether to skip transactions in a read-only test. The default value is
OFF, which specifies not to skip transactions. - Recommended value:
ONfor read-only tests andOFFin other cases.
- Description: specifies whether to skip transactions in a read-only test. The default value is
auto-inc
- Description: specifies whether values in the ID column automatically increment. The default value is
ON. - Recommended value:
OFFin write scenarios.
- Description: specifies whether values in the ID column automatically increment. The default value is
warmup-time
Description: the warm-up period in seconds. Statistics collection begins after the warm-up period elapses. The default value is
0.Recommended value: 10 to 30. You can specify a large value for a large dataset.
Case study of Sysbench performance issues
TPS decreases to 0 and errors are occasionally returned
Symptom
The following command runs the Sysbench benchmark on OceanBase Database Community Edition V4.1.0. The TPS decreases to 0 and an error is returned.
sysbench --db-driver=mysql --threads=500 --time=300000 --mysql-host=xxxxxxx --mysql-port=2883 --mysql-user='tpcc@t1' --mysql-password='******' --report-interval=2 --tables=100 /usr/share/sysbench/oltp_read_write.lua --db-ps-mode=disable run
The figure below shows the server configurations.

Cause analysis
The value of the table_size parameter is too small. The default value of rand-type is special. As a result, data is centralized and primary key conflicts occur. This causes a deadlock and the TPS decreases to 0.
Increase timeout periods for the test tenant.
set global ob_query_timeout = 50000000000; set global ob_trx_timeout = 50000000000;To reproduce the issue quickly, set
tablesto1and leave the values of other parameters unchanged.sysbench oltp_read_write.lua --mysql-host=[ip] --mysql-port=2881 --mysql-user=root@user --mysql-db=test --table_size=10000 --tables=1 --threads=500 --time=3600 --report-interval=1 --db-ps-mode=disable runThe TPS decreases to 0. However, the TPS and QPS sometimes can be greater than 0.

Query the deadlock view.
SELECT * FROM oceanbase.CDB_OB_DEADLOCK_EVENT_HISTORY ORDER BY create_time DESC LIMIT 10;
You can see the latest deadlock records.
role=victimindicates that a transaction was terminated, androle=witnessindicates that a transaction witnessed the conflict.cycle_size=2indicates that two transactions conflict with each other.cycle_size>2indicates that multiple transactions conflict with one transaction. Select an event with a cycle size of 2 and filter a group of conflicting transactions by event ID.SELECT * FROM oceanbase.CDB_OB_DEADLOCK_EVENT_HISTORY WHERE cycle_size=2 ORDER BY create_time DESC LIMIT 10;
You can view transaction IDs in the
VISITORfield. You can find all SQL statements related to a transaction based on the transaction ID.SELECT TRACE_ID, TX_ID, QUERY_SQL FROM gv$ob_sql_audit WHERE TX_ID=5042912 ORDER BY REQUEST_TIME; SELECT TRACE_ID, TX_ID, QUERY_SQL FROM gv$ob_sql_audit WHERE TX_ID=5042591 ORDER BY REQUEST_TIME;
The figure shows that two transactions depend on each other in the following manner:
- Update of Transaction 1: id=4978
- Update of Transaction 2: id=5049
- Update of Transaction 1: id=4978
- Deletion of Transaction 2: id=5049
After a deadlock occurs, Transaction 64317 is rolled back and Transaction 64235 is committed. This is because Transaction 64317 was created later than Transaction 64235. You can determine whether one transaction was created earlier or later than another one by comparing the
request_timeparameter of their first SQL statements, which can be found in theGV$OB_SQL_AUDITview.SELECT REQUEST_TIME, SVR_IP, SVR_PORT, SQL_EXEC_ID, TRACE_ID, TENANT_ID, TX_ID, SQL_ID, QUERY_SQL, PLAN_ID, PARTITION_CNT, IS_INNER_SQL, ELAPSED_TIME, EXECUTE_TIME, APPLICATION_WAIT_TIME FROM gv$ob_sql_audit WHERE TX_ID=64317 ORDER BY REQUEST_TIME;Query logs by trace ID. You can view a rolled back transaction. The error code is 6002, which indicates a rollback error. The abort reason is 4101, which indicates a deadlock.
Find the SQL log that records a conflict of the terminated transaction. The error code in the last log is 4101. However, Sysbench does not return a deadlock error. Instead, it returns a 6002 rollback error.
[2023-07-11 15:32:40.263794] INFO [SERVER] send_error_packet (obmp_packet_sender.cpp:317) [28473][T1002_L0_G0][T1002][Y5B690B7C0505-00060030956354CB-0-0] [lt=14] sending error packet(err=-4101, extra_err_info=NULL, lbt()="0x1029cb40 0x832fadc 0x82e32f7 0x5b65d4f 0x5960718 0x595c58f 0x595a25a 0x8075f6c 0x595964c 0x80754ea 0x595665a 0x8075b24 0x1061b777 0x10613e9f 0x7f0215489e25 0x7f0214f48f1d")You can find the 6002 rollback error in the log corresponding to the last
beginstatement of the transaction. Because the commit statement is not explicitly executed in the current transaction, the begin statement in the next transaction may cause the current transaction to be implicitly committed. Consequently, begin is the last SQL statement of the current transaction. Sysbench does not handle the deadlock error. Instead, it returns the 6002 error.[2023-07-11 15:32:40.305922] INFO [SERVER] send_error_packet (obmp_packet_sender.cpp:317) [28472][T1002_L0_G0][T1002][Y5B690B7C0505-0006003095136574-0-0] [lt=37] sending error packet(err=-6002, extra_err_info=NULL, lbt()="0x1029cb40 0x832fadc 0x82e32f7 0x5a55e57 0x5960c40 0x595c58f 0x595a25a 0x8075f6c 0x595964c 0x80754ea 0x595665a 0x8075b24 0x1061b777 0x10613e9f 0x7f0215489e25 0x7f0214f48f1d")If you cannot find the log, log throttling may be enabled. We recommend that you increase the bandwidth limit for syslogs.
alter system set syslog_io_bandwidth_limit='2G';
Conclusion
Conclusion:
- If the default value
specialis specified for therand-typeparameter, a deadlock may occur due to too many key conflicts. In such a scenario, Sysbench exits. - If the default value
specialis specified for therand-typeparameter and the values of thetablesandtable_sizeparameters are small, deadlocks occur frequently. Many transactions are stuck and the test threads cannot promptly exit. As a result, the TPS decreases to 0 for a few minutes and may occasionally be greater than 0. - A deadlocked transaction is rolled back and an error is returned. The returned error code may be 6002 rather than 1213.
- If you specify a large value for the
ob_query_timeoutparameter and the deadlock detection feature does not take effect, transactions cannot be terminated and SQL statements are congested. In this case, the TPS stays at 0 for a long time and no error is returned.
Solution:
- Change the data distribution type. For example, you can set the value of the
rand-typeparameter touniform. - If you do not want to change the data distribution type, you can increase the size of the dataset. For example, you can set the value of the
tablesparameter to100and the value of thetable_sizeparameter to1000000. - If you do not want to change the data distribution type or the dataset size, you can try to ignore the specified errors. For example, you can set the value of the
mysql-ignore-errorsparameter to1062,1213,6002. However, this cannot avoid deadlocks.