Performance troubleshooting and tuning for data synchronization from OceanBase Community Edition to Kafka

2026-01-29 07:14:58  Updated

Performance troubleshooting steps

  1. Use the following methods to determine whether the Store component has performance bottlenecks.

    1. Query the number of records processed per second from the CDC log.

      grep NEXT_RECORD_RPS libobcdc.log
      

      If the CDC processing speed is slower than the business data speed of the source, run the following command to check whether the issue is caused by the OMS Community Edition server.

    2. Check whether the CDC process has triggered the traffic control.

      grep "NEED_SLOW_DOWN=1 PAUSED=1" libobcdc.log
      

      NEED_SLOW_DOWN=1 indicates that the traffic control is triggered because the memory usage is high, which limits the log pulling efficiency. CDC is paused to avoid further increasing the system pressure when the traffic control is triggered due to issues such as I/O or server load.

      You can modify the memory_limit parameter to adjust the throttling threshold. View the current value in the /home/ds/store/store{port}/etc/libobcdc.conf file and increase the parameter value if necessary. Here is an example:

      liboblog.memory_limit=20G
      liboblog.part_trans_task_active_count_upper_bound=500000
      
    3. If the traffic control is not triggered, query the logs for CLOG pulling to check the RPC latency.

      grep do_stat libobcdc.log
      [2025-04-21 16:05:13.905681] INFO  [TLOG.FETCHER] do_stat (ob_log_ls_fetch_stream.cpp:309) [20155][][T0][Y0-0000000000000000-0-0] [lt=9] [STAT] [FETCH_STREAM] stream="xxx.xxx.xxx.1:2882"(0x7fa62d4131f0:HOT)({tenant_id:1028, ls_id:{id:1002}})(FETCHED_LOG:153.11GB) traffic=41.85MB/sec log_size=438879806 size/rpc=13.50MB log_cnt/rpc=946 rpc_cnt=31(3/sec) single_rp
      c=0(0/sec)(upper_limit=0(0/sec),max_log=0(0/sec),no_log=0(0/sec),max_result=0(0/sec)) rpc_time=312357 svr_time=(queue=41,process=224677) net_time=(l2s=1146,s2l=83859) cb_time=2632 h andle_rpc_time=13739 flush_time=860 read_log_time=12870(log_entry=2600,trans=0) trans_count=0 trans_size=0.00B
      

      The rpc_time=312357 svr_time=(queue=41,process=224677) in the log indicates that the RPC latency is 312 ms, and the server spent 224 ms processing the RPC. Generally, the RPC latency is only several tens of milliseconds. This indicates that the RPC latency is excessively high. In this case, query the OBServer logs and adjust relevant parameters.

      Keywords in the OBServer log: fetch_log done. This line of log is expected to print the statistics of log pulling. If the value of fetch_archive_time is not 0 in this line of log, increase the value of log_disk_size to increase the storage space for CLOG.

      omsce7

  2. After the Store component is ruled out as the cause, check the performance-related parameters of the Full-Import/Incr-Sync component.

Usually, setting useSchemaCache to true in the Source is sufficient for most scenarios. If the required records per second (RPS) is still not met, you can set buildRecordConcurrent to true.

Source

Parameter Description
useSchemaCache Specifies whether to cache the schema. Valid values: true and false. Default value: false.
If you set this parameter to true, the Store component caches the schema when reading data, which accelerates the message conversion of the Store.
buildRecordConcurrent Specifies whether to asynchronously convert Store messages. Valid values: true and false.
If you set this parameter to true, data is pulled from the Store and message conversion is performed in parallel. The number of parallel threads is the same as workerNum.

Sink

The following two parameters configure the producer client properties of Kafka.

OMS Community Edition parameter Corresponding Kafka client parameter Description
lingerMs ProducerConfig.LINGER_MS_CONFIG The waiting time of Kafka for sending batches of data. If you want to increase the throughput, you can increase the amount of data sent in each batch. The default value is 10, in milliseconds.
batchSize ProducerConfig.BATCH_SIZE_CONFIG The maximum number of messages sent in each batch by the Kafka client. Default value: 1048576, in bytes (1 MB).
workerNum The number of concurrent worker threads of the Sink. Default value: 16.

If enablePreprocessConfig is set to true in the coordinator, lingerMs and batchSize will be automatically configured based on the JVM memory. If you manually configure these two parameters, your configurations take precedence.

# View the automatically configured parameters
grep "auto set " connector.log
# View the configurations finally used by the system
cat conf/runningConf.json

Coordinator

Shuffle-related configurations of OMS Community Edition

Parameter Description
shuffleBucketSize The number of buckets. OMS Community Edition usually reads and sends a batch of data in a bucket and then reads and sends the next batch of data in the bucket. The number of buckets determines the number of records that can be sent at the same time. Default value: 128.
shuffleFlushIntervalMs The time interval for reading bucketed data periodically. The smaller the interval, the more real-time it is. The unit is milliseconds, and the default value is 100.
shuffleMinBatchSize The number of records in a bucket must be greater than or equal to the value of this parameter before the bucket is read and sent. If the number of records in a bucket is less than the value of this parameter, the system waits for shuffleFlushIntervalMs and then reads and sends the records in the bucket. Default value: 20.
shuffleMaxBatchSize The maximum number of records to be read and sent in one time. Default value: 64.

Use Arthas for performance analysis

# Log in to the OMS Community Edition container
cp /root/arthas-bin.zip /home/ds
su - ds
unzip arthas-bin.zip

/opt/alibaba/java/bin/java -jar arthas-boot.jar pid(Incremental component process number)
profiler start
profiler getSamples
profiler status
# Enter the stop command after waiting for 1 minute. This will generate an HTML file containing flame graphs.
profiler stop --format html
# Exit Arthas.
exit

Contact Us