obcdc parameters

2024-04-19 11:42:34  Updated

For parameters of the Boolean type, the value 0 indicates False and the value 1 indicates True.

Cluster parameters

Metadata parameters

Parameter Description Default value Real-time modification supported Remarks
init_log_level The initial log level at startup. ALL.*:INFO;SHARE.SCHEMA:INFO No The initial log level is used to generate more detailed logs. The log level is changed to the specified log_level after startup.
log_level The log level after startup. ALL.*:INFO;SHARE.SCHEMA:WARN Yes /
max_log_file_count The maximum number of log files. 0 No The value must be greater than or equal to 0. If the value is 0, the number of logs is not restricted. If you change the value to a smaller one, restart obcdc for the configuration to take effect.
enable_log_limit Specifies whether to enable log throttling. 1 Yes 1: specifies to enable log throttling.
0: specifies to disable log throttling.
rootserver_list The RootService server specified at startup. \| Yes The format is server_ip:server_rpc_port:server_sql_port.
cluster_url The cluster URL specified at startup. \| No If OceanBase Database has a cluster URL, obcdc uses this URL to obtain the RootServer information. High availability is supported for servers.
cluster_user The username in the sys tenant. Empty (required, to be specified by the user) No The user in the sys tenant. This user must have the read privilege on internal tables.
cluster_password The password of the preceding user. Empty (required, to be specified by the user) No /
tenant_endpoint The list of servers where the tenant resides. \| No The format is host1:sql_port1| host2:sql_port2.
tenant_user The name of a user in the tenant. \| No The format is user_name@tenant_name or user_name@tenant_name#cluster.
tenant_password The password of the preceding user in the tenant. \| No /
config_fpath The dump path of the configuration file. etc/liboblog.conf No All obcdc configuration information will be dumped into this file.
timezone The time zone of obcdc. +8:00 No This parameter specifies the time zone in which data is formatted and exported, which must be consistent with that of the data source. The default value is UTC+8.
timezone_info_path The directory where the timezone metadata file is located. etc/timezone_info.conf Yes When meta_data_refresh_mode is set to data_dict, the timezone_info.conf file in the RPM package must be placed in this directory.
skip_ob_version_compat_check Specifies whether to skip the OceanBase version compatibility check. No No obcdc V3.x supports data synchronization with only OceanBase Database V2.x and V3.x. In addition, the obcdc version must be later than the OceanBase Database version.
cluster_version_refresh_interval_sec The interval for querying the earliest OBServer node version of the OceanBase cluster. 600 No The interval for refreshing the OceanBase cluster version, in seconds.
log_router_background_refresh_interval_sec The frequency for refreshing the log stream metadata. 10 Yes The interval, in seconds, for refreshing the corresponding server information at the backend for the log stream.
Parameter Description Default value Real-time modification supported Remarks
working_mode The working mode. storage (commercial edition) No The transaction size must be controlled in business scenarios. If large transactions exist in business scenarios, use the persistence mode. We recommend that you deploy obcdc on the SSD in persistence mode.
store_service_path The path for storing data in persistence mode. ./storage No This parameter is valid only in persistence mode.
Relative path: The persistent data is stored in a relative path in the path of the process that calls obcdc.
Absolute path: The persistent data is stored in the specified absolute path, such as store_service_path=/data/1.

Synchronization granularity

obcdc supports data synchronization in multiple dimensions. A single obcdc instance can synchronize data of one cluster, one or more tenants, several databases, and several tables.

Synchronization granularity Single obcdc instance Multiple obcdc instances
Cluster Supported (a single obcdc instance cannot support the data of multiple clusters) Supported
Tenant Supported Supported
Database Supported Supported
Table Supported Supported (see the following multi-instance-related parameters for details)

If the traffic in the tenant is heavy, a single obcdc instance may fail to synchronize the data of the entire cluster in real time. In this case, you can use multiple obcdc instances to share the load.

  1. Each obcdc instance synchronizes data of a different tenant.

  2. If it is difficult for a single obcdc instance to synchronize the data of a single tenant, you can configure multiple obcdc instances and use one instance for each table in the tenant.

    Notice

    You must configure a blocklist and allowlist in the preceding two scenarios.

  3. If it is difficult for a single obcdc instance to synchronize a single table, you can configure multiple obcdc instances for the table and use one instance for each hash partition of the table.

    Notice

    In this case, you must configure multiple obcdc instances.

Parameter Description Default value Real-time modification supported Remarks
tb_white_list The allowlist of the tables to be synchronized. *.*.* No The format is tenant name.database name.table name, for example, mytenant1.db1.tb1|mytenant2.db2.*.
tb_black_list The blocklist of tables that will not be synchronized. \| No The format is the same as above.
tablegroup_white_list The allowlist of table groups to be synchronized. *.* No The format is tenant name.table group name, for example, mytenant1.* or_tenant*.tg*.
tablegroup_black_list The blocklist of table groups that will not be synchronized. \| No The format is the same as above.
enable_filter_sys_tenant Specifies whether to filter synchronization data of the sys tenant. 0 No Enable this option when the sys tenant is abnormal but the user tenants are normal.

Note

obcdc provides data filtering capabilities based on table and table group allowlists and blocklists.

  • Only the tables and table groups in the allowlist but not in the blocklist will be synchronized. By default, all tables and table groups in all tenants are in the allowlist, and the blocklist is empty. We recommend that you configure an allowlist for the production environment to synchronize data of specified tenants. Do not use the default values, which may pull redundant data.
  • The fnmatch function is used to match tables and table groups based on regular expressions. By default, case sensitivity is disabled for matching tables and table groups. You can set enable_oracle_mode_match_case_sensitive to 1 to enable case sensitivity.
Parameter Description Default value Real-time modification supported Remarks
instance_num The total number of obcdc instances. 1 No /
instance_index The index of the current obcdc instance to the total number of obcdc instances (which can be understood as the hash bucket ID). 0 No The value is [0,instance_num). Each obcdc instance processes only the data whose hashed value of modinstance_num is equal to instance_index.
enable_global_unique_index_belong_to_multi_instance Specifies whether to synchronize tables with globally unique indexes in multi-instance scenarios. No Yes /

Throttling

To prevent out-of-memory (OOM) issues caused by data heaping in the memory in the case of exceptions, obcdc provides throttling capabilities. Throttling is triggered based on the internal status of obcdc. Data pulling stops during throttling and resumes when obcdc becomes normal. The following table describes the throttling control parameters. You can modify them to adjust the throttling level as needed.

Parameter Description Default value Real-time modification supported Remarks
memory_limit The memory limit for triggering throttling. 20G Yes It is one of the throttling metrics (the same as below). However, it does not restrict obcdc from using memory higher than memory_limit. The minimum value is 2G.
system_memory_avail_percentage_lower_bound, The lower limit of available system memory, in percentage. 10 Yes The value range is [10, 80].
part_trans_task_active_count_upper_bound The maximum number of active partition transactions. 200000 Yes The number of active partition transaction structures in memory (allocated and not recycled).
storager_task_count_upper_bound The throttling threshold of the number of tasks for storager to persist. 1000 Yes The minimum value is 1.
part_trans_task_reusable_count_upper_bound The maximum number of partition transactions to reuse. 10240 Yes The sum of tasks to be generated to the user queue, tasks to be generated by users, tasks to be returned by users, and tasks pending for GC.
ready_to_seq_task_upper_bound The maximum number of tasks to be sequenced. 2000 Yes The maximum number of tasks to be sequenced, including tasks pending for parsing, tasks pending for formatting, and tasks pending for persistence.
pause_fetcher Specifies whether to forcibly stop pulling logs. No (0) Yes If this feature is enabled, obcdc forcibly stops pulling logs regardless of the logical computing result of the throttling conditions.

The reference throttling conditions are as follows:

  1. Condition 1: The number of active partitions exceeds the threshold specified by part_trans_task_active_count_upper_bound. || The memory held by obcdc exceeds the threshold specified by memory_limit. || The available memory of the system is lower than the threshold specified by system_memory_avail_percentage_lower_bound.
  2. Condition 2: The number of partition transaction structures to be reused exceeds the threshold specified by part_trans_task_reusable_count_upper_bound || The number of tasks to be sequenced exceeds the threshold specified by ready_to_seq_task_upper_bound.
  3. Condition 3 (since OceanBase Database V3.x): The number of tasks to be persisted exceeds the threshold specified by storager_task_count_upper_bound. && The memory held by obcdc exceeds the throttling triggering threshold (memory_limit) multiplied by the percentage of memory used by the persistence module in the throttling memory (storager_mem_percentage).

Throttling triggering condition: ((Condition 1 && Condition 2) || Condition 3) || (pause_fetcher != 0)

Parameter Description Default value Real-time modification supported Remarks
enable_dump_pending_trans_info Specifies whether to regularly collect and print statistics on the status of distributed transactions in obcdc. No Yes If the value is set to 1, the statistics are printed once every 10 seconds.
print_fetcher_slowest_ls_num The number of partitions with the lowest log-pulling speed. 10 Yes /
print_rpc_handle_info Specifies whether to print RPC-related information. No Yes /
print_stream_dispatch_info Specifies whether to print the log stream information. No Yes /
print_ls_heartbeat_info Specifies whether to print partition heartbeat information. No Yes /
print_ls_serve_info Specifies whether to print the partition service information, such as the partition status and the number of transactions of the partition. No Yes /
print_participant_not_serve_info Specifies whether to print the information about all transactions that are not in service (no data is generated) for the distributed transaction. No Yes /
print_ls_server_list_update_info Specifies whether to print the details of the servers when the list of partition servers is updated. No Yes /
enable_formatter_print_log Specifies whether to enable formatter to generate logs whose formatting data is omitted. No Yes /

Metadata management

obcdc V4.1 can obtain metadata such as information about schemas and log streams from data dictionaries, without the need to pull logs from OBServer nodes by using RPCs.

Parameter Description Default value Real-time modification supported Remarks
meta_data_refresh_mode The mode for obtaining metadata such as information about schemas and log streams. online No
  • online mode: obcdc queries OBServer nodes for schema information by using SchemaService. In this mode, obcdc must be able to access OBServer nodes.
  • data_dict mode: obcdc parses schema information based on the data dictionaries in logs.
Parameter Description Default value Real-time modification supported Remarks
skip_rename_tenant_ddl Specifies whether to ignore DDL statements that rename a tenant. No Yes The renamed tenant is not affected if it is not included in the obcdc synchronization allowlist. Otherwise, an error message is displayed and the process is terminated. This is because the rename operation overruns the allowlist restriction. In this case, we recommend that you modify the allowlist to support the tenants before and after the rename operation. When the value is set to true, the system will skip the DDL statements of the renamed tenant and continue the data synchronization.
enable_oracle_mode_match_case_sensitive The case sensitivity of tenants against the blocklist or allowlist in the Oracle mode of OceanBase Database. No No /
Parameter Description Default value Real-time modification supported Remarks
ls_count_upper_limit The number of partition progress values. 2000000 No A maximum of 200 million partitions is supported.
check_switch_server_interval_min The interval, in minutes, of periodic partition check for the necessity of switching log-pulling servers. 30 Yes /
Parameter Description Default value Real-time modification supported Remarks
access_systable_helper_thread_num The maximum number of connections in the connection pool for SQL execution. 64 No The value range is [48, 1024].
rs_sql_connect_timeout_sec The timeout period for the SQL connection of the RootService. 40 Yes /
rs_sql_query_timeout_sec The timeout period for the SQL request of the RootService. 30 Yes /
tenant_sql_connect_timeout_sec The timeout period for the SQL connection of a tenant. 40 Yes /
tenant_sql_query_timeout_sec The timeout period for the SQL request of a tenant. 30 Yes /
sql_server_change_interval_sec The time interval, in seconds, for changing the SQL execution server. 60 Yes /

Filter data based on cluster_id

obcdc allows you to filter data based on the cluster_id blocklist. Although OceanBase Database specifies a valid cluster_id range, you can specify the cluster_id values to filter out the following unwanted data in a synchronization task: For more information about how to configure cluster_id values, see ob_org_cluster_id.

  • The data that is not included in the valid cluster_id range specified by OceanBase Database and OceanBase Migration Service (OMS). The value range is [1, 4294901759] U [4294901760, 4294967295].
  • The data that is specified in the cluster_id_black_list field.
  • Each cluster ID in the cluster_id_black_list field must range from cluster_id_value_min to cluster_id_value_max.
Parameter Description Default value Real-time modification supported Remarks
cluster_id_black_list The blocklist of cluster IDs. \| No Separate multiple cluster IDs with vertical bars (\|). Each cluster ID ranges from cluster_id_value_min to cluster_id_value_max.
cluster_id_black_value_min The lower limit of cluster IDs in the synchronization blocklist. 2147473648 No /
cluster_id_black_value_max The upper limit of cluster IDs in the synchronization blocklist. 2147483647 No /

Server-level blocklist parameters

obcdc can send SQL and RPC requests to the OBServer nodes. OceanBase Database provides parameters that allow you to block all requests sent to specific servers.

In addition, as obcdc pulls logs at the partition level, a built-in blocklist is provided to add servers that have timed out or errors when pulling partition logs to the blocklist. Servers are automatically added to or removed from the blocklist.

Parameter Description Default value Real-time modification supported Remarks
server_blacklist A blocklist of servers that are blocked from receiving RPC requests from obcdc. \| Yes Servers are separated with vertical bars (\|). Logs are not pulled from servers in the blocklist. You can manually add a server to the blocklist when the server cannot provide log services. For example, you can add a server to the blocklist in the case of network partitioning and high latency.
sql_server_blacklist A blocklist of servers that are blocked from receiving SQL requests from obcdc. \| Yes Multiple servers are separated with vertical bars (\|). SQL query requests will not be forwarded to the servers in the blocklist.

Partition-level blocklist parameters (generally not modified)

Parameter Description Default value Real-time modification supported Remarks
blacklist_survival_time_sec The waiting time, in seconds, of servers in the blocklist. 30 No /
blacklist_survival_time_upper_limit_min The maximum period, in minutes, of servers in the blocklist. 4 Yes /
blacklist_survival_time_penalty_period_min The penalty time interval threshold, in minutes. If a server is added to the blocklist again within this time interval, the time for the server to stay in the blocklist is doubled. 1 Yes /
blacklist_history_overdue_time_min The period, in minutes, after which a server is deleted from the blocklist history. 30 Yes The minimum value is 10.
blacklist_history_clear_interval_min The time interval, in minutes, for clearing the information about servers in the blocklist. 20 Yes The minimum value is 10.
Parameter Description Default value Real-time modification supported Remarks
all_server_cache_update_interval_sec The time interval, in seconds, for updating the cached list of servers. 5 Yes /
all_zone_cache_update_interval_sec The time interval, in seconds, for updating the cached zone information. 5 Yes /
Parameter Description Default value Real-time modification supported Remarks
idle_pool_thread_num The number of threads that process the pending partition-level log-pulling tasks. 4 No The value range is [1,32].
dead_pool_thread_num The number of threads that recycle the partition-level log-pulling tasks. 1 No The value range is [1,32].
stream_worker_thread_num The number of threads that dispatch the partition-level log-pulling tasks. 8 No The value range is [1,64].
start_lsn_locator_thread_num The number of threads that locate the ID of the starting partition log. 4 No The value range is [1,32].
lob_data_merger_thread_num The number of LOB data processing threads. 2 No The value range is [1, +∞).
start_lsn_locator_locate_count The number of requests that complete the task of locating the starting partition log. 1 No The minimum value is 1. To locate the ID of the starting partition log, the follower partitions of the specified number must return the same starting log ID.
skip_start_lsn_locator_result_consistent_check Specifies whether to skip checking the consistency of the starting log ID. No No /
svr_stream_cached_count The initial size of the object pool of the server log stream. 16 No The minimum value is 1.
fetch_stream_cached_count The initial number of logs in the object pool of the partition log stream. 16 No The minimum value is 1.
start_lsn_locator_rpc_timeout_sec The timeout period, in seconds, of RPC requests for locating the starting partition log ID. 60 Yes The minimum value is 1.
start_lsn_locator_batch_count The batch number of the RPC requests for locating the starting partition log ID. 2000 Yes The minimum value is 1.
heartbeat_interval_sec The time interval, in seconds, to send RPC requests for partition heartbeats. 1 Yes The minimum value is 1.
fetch_log_rpc_timeout_sec The timeout period, in seconds, of RPC requests for pulling partition logs. 15 Yes The minimum value is 1.
progress_limit_sec_for_dml The maximum number of logs that can be pulled for DML operations. 300 Yes The minimum value is 1. Log-pulling in some partitions can be much faster than others. To ensure that the log-pulling progress is roughly at the same pace across partitions, the maximum number of DML logs that can be provided by the server is limited to the sum of the minimum number of pulled logs among all partitions + the value of the progress_limit_sec_for_dml parameter.
ls_fetch_progress_update_timeout_sec The timeout threshold, in seconds, of RPC requests for pulling partition logs. 3 Yes The minimum value is 1.
ls_fetch_progress_update_timeout_sec_for_lagged_replica The timeout threshold, in seconds, of pulling logs from lagged replica. 15 Yes The minimum value is 1.
enable_continue_use_cache_server_list Specifies whether to use the server information about the cache. 0 No Enable this option when the sys tenant is abnormal but the user tenants are normal.
fetching_log_mode The log pulling mode. integrated No
  • integrated mode: This is the default log pulling mode in which obcdc sends RPC requests to an OBServer node to pull logs. In this case, you must configure the address (cluter_url or rootserver_list) of the OBServer node and the username (cluster_user) and password (cluster_password) in the sys tenant.
  • direct mode: In this mode, obcdc directly consumes archived logs without the need to send RPC requests to the OBServer node to pull logs. In this case, you must configure the archive destination in the form of an NFS path or Object Storage Service (OSS) information. For more information about the archive destination, see Overview. The server where obcdc is located must be able to directly access the archive destination.

Notice

obcdc cannot pull discontinuous archived logs.

Parameter Description Default value Real-time modification supported Remarks
region The region where the obcdc server belongs. default_region No The logs are first pulled from OBServer nodes in the region of obcdc.

RPC module parameters (modifications not recommended)

Parameter Description Default value Real-time modification supported Remarks
io_thread_num The number of I/O threads for RPC requests. 4 No The minimum value is 1.
rpc_result_cached_count The initial number of pulled logs in the object pool of RPC requests for pulling logs. 16 No The minimum value is 1.
rpc_process_handler_time_upper_limit_msec The maximum time, in seconds, for processing an RPC request. 200 Yes The minimum value is 1.
rpc_result_count_per_rpc_upper_limit The maximum number of unprocessed results of the log-pulling RPC request in a log stream. 16 Yes The minimum value is 1. When the number of results in a log-pulling stream reaches the maximum value, the log-pulling RPC request is suspended.
Parameter Description Default value Real-time modification supported Remarks
ssl_client_authentication Specifies whether secure communication is enabled. No No By default, secure communication is disabled.
ssl_external_kms_info The secure communication method. file No For external communication, only the file mode is supported. You must create a wallet directory in the boot directory of obcdc, and put the following key file in the wallet directory: ca.pem/client-cert.pem/client-key.pem.
Parameter Description Default value Real-time modification supported Remarks
sequencer_thread_num The number of threads in the thread pool for sequencing distributed transactions. 5 No The minimum value is 1.
sequencer_queue_length The number of partition transactions waiting to be sequenced. 102400 No The minimum value is 1.
Parameter Description Default value Real-time modification supported Remarks
redo_dispatched_memory_limit_exceed_ratio Determines the size limit of the redo logs to be processed when enable_output_trans_order_by_sql_operation is set to 1. To be specific, the size of the redo logs allowed to be processed in the memory is redo_dispatched_memory_limit_exceed_ratio * redo_dispatcher_memory_limi. 2 Yes The minimum value is 1.
redo_dispatcher_memory_limit The size limit of the redo logs to be processed. 512 M Yes The minimum value is 128 MB.
extra_redo_dispatch_memory_size The extra redo size allocated to skew participants to avoid deadlocks, which may cause OceanBase Change Data Capture (CDC) not to generate data in a data skew scenario. 4 MB Yes [0 M, 512 M]
Parameter Description Default value Real-time modification supported Remarks
msg_sorter_task_count_upper_limit The maximum number of tasks for the row data sorting module in a transaction. 200000 No The minimum value is 1.
msg_sorter_thread_num The total number of threads for the row data sorting module in a transaction. 1 No The minimum value is 1, and the maximum value is 32.

Configurations valid only in persistence mode

Parameter Description Default value Real-time modification supported Remarks
storager_thread_num The number of threads in the thread pool of the storager module. 10 No /
storager_queue_length The number of tasks to be persisted. 102400 No /
Parameter Description Default value Real-time modification supported Remarks
reader_thread_num The number of threads in the thread pool for reading data from the storager module. 10 No /
reader_queue_length The number of data deserialization tasks waiting to be processed. 102400 No /
Parameter Description Default value Real-time modification supported Remarks
batch_buf_count The number of memory buffers for batch redo log persistence. 10 No The minimum value is 5.
batch_buf_size The size of a memory buffer for batch redo log persistence. 20 MB No The minimum value is 2 MB.
Parameter Description Default value Real-time modification supported Remarks
dml_parser_thread_num The number of threads in the thread pool for parsing DML statements. 5 No The minimum value is 1.
ddl_parser_thread_num The number of threads in the thread pool for parsing DDL statements. 1 No The minimum value is 1.
skip_reversed_schema_version Specifies whether to ignore schema version rollback. No No Generally, the schema version monotonically increases during the processing of DDL statements. The rollback of the schema version indicates an exception. However, you can use this parameter to skip the exception.
Parameter Description Default value Real-time modification supported Remarks
formatter_thread_num The number of threads in the thread pool for data formatting. 10 No The minimum value is 1.
skip_dirty_data Specifies whether to skip data exceptions. No Yes Current exceptions:
1. Non-full column logging. This means the clogs only recorded the columns that have been changed. Parameter: binlog_row_image.
2. Exceptions in DDL statements of the tenant restored from a physical backup.
Parameter Description Default value Real-time modification supported Remarks
enable_output_trans_order_by_sql_operation Specifies whether to generate data rows in the transaction in the order that the SQL statements are executed. No No The execution order refers to the order in which OceanBase Database makes changes to the data. It may not be the same as the order in which the SQL statements are sent from the client.
sort_trans_participants Specifies whether to sort SQL statements by the distributed transaction participants. 1 No The parameter specifies whether to sort statements by the partition key. You can maintain a stable statement output sequence by using this parameter together with the enable_output_trans_order_by_sql_operation parameter.
enable_output_hidden_primary_key Specifies whether to generate the hidden primary keys of tables without primary keys. No No You can generate the hidden primary keys of tables without primary keys by setting the parameter value to 1.
enable_convert_timestamp_to_unix_timestamp Specifies whether to convert a timestamp to an integer in the Unix format. No No /
enable_output_invisible_column Specifies whether to generate the hidden columns. No No /
output_heartbeat_interval_msec The time interval, in seconds, for generating the information about security timestamps. 3 Yes /
Parameter Description Default value Real-time modification supported Remarks
resource_collector_thread_num The number of threads in the thread pool for recycling resources. 10 No The minimum value is 1.
resource_collector_thread_num_for_br The number of threads in the thread pool for recycling binlog records. 7 No The value of this parameter must be larger than that of the resource_collector_thread_num parameter.

Notice

We recommend that you do not modify the parameters that are not listed in this topic.

Contact Us