This topic describes the FAQ and considerations of OceanBase Change Data Capture (obcdc).
How can I judge whether the startup timestamp is too small?
obcdc allows you to configure a log-pulling point in time. obcdc will pull logs that are committed later than this point in time, which is referred to as the startup timestamp.
Considerations
- obcdc supports cluster-level synchronization among tenants. When logs are synchronized in the entire cluster, make sure that the clocks on all servers are synchronized. Significant Global Timestamp Service (GTS) time difference between the tenants will cause obcdc to exit due to the rollback of a security timestamp. We recommend that you use obcdc V4.x to synchronize data of only one tenant at a time.
- At present, obcdc does not support synchronizing data of a cluster that has a standby tenant.
- Logs can be consumed offline only when
fetching_log_modeis set todirectandmeta_data_refresh_modeis set todata_dict. In other cases, obcdc must send requests to the OBServer node to pull logs, and you must configure the parameters to access the OBServer node, such ascluster_urlorrootserver_list.
Limitations for startup timestamps
When obcdc pulls data of a single tenant, if the pulling timestamp is too small, obcdc may fail to start. The following limitations apply.
Startup timestamp limitations for synchronizing the data of a single tenant
When
fetching_log_modeis set tointegrated, the limitations differ in the following cases:Archiving is disabled
The startup timestamp specified for obcdc must be greater than the largest
BEGIN_SCNvalue of all log streams of the tenant. You can execute the following SQL statement to query this system change number (SCN):SELECT CEIL(MAX(BEGIN_SCN)/1000) AS START_TS_US FROM oceanbase.GV$OB_LOG_STAT;Archiving is enabled
The pulling timestamp specified for obcdc must be at least greater than the minimum of the following two values:
The largest
BEGIN_SCNvalue of all log streams of the tenant.The value of
START_SCN, which indicates the time when archiving is enabled for the tenant. You can execute the following SQL statement in a tenant to query the time when archiving is enabled for the tenant:SELECT CEIL(MAX(START_SCN)/1000) as START_TS_US FROM oceanbase.DBA_OB_ARCHIVELOG;
When
fetching_log_modeis set todirect, obcdc can synchronize the data of only a single tenant. Make sure that the startup time of obcdc is later than the point in time when log archiving is enabled.To use data dictionaries, make sure that the startup time is greater than the smallest
snapshot_scnvalue in the data dictionaries of the tenant. You can execute the following statement in a tenant to query the smallestsnapshot_scnvalue in the data dictionaries of the tenant.SELECT CEIL(MIN(snapshot_scn)/1000) FROM oceanbase.DBA_OB_DATA_DICTIONARY_IN_LOG;Note
You can also specify
tenant_idin the sys tenant to query theoceanbase.CDB_OB_DATA_DICTIONARY_IN_LOGview.
Startup timestamp limitations for synchronizing the data of multiple tenants
When you pull the data of multiple tenants in a cluster, make sure that each tenant meets the preceding conditions regarding the startup timestamp specified for obcdc.
Limitations for virtual generated columns
Starting from OceanBase Database V4.x, generated columns that are not explicitly marked as STORED are treated as virtual generated columns, and their values are no longer recorded in clog.
Note
obcdc does not support outputting virtual generated columns before V4.2.5.0.
When the OceanBase Database version is in the range [4.2.5.0, 4.3.0.0) U [4.3.4.0, +∞), and obcdc version is in the range [4.2.5.0, 4.3.0) U [4.3.5.5, 4.4.0) U [4.4.2.0, +∞) with enable_output_virtual_generated_column=1 configured, obcdc can synchronize virtual generated columns written in some scenarios. However, support for all types of generated column expressions is not guaranteed.
When you use this feature, test and verify the virtual generated columns in advance. If you need to synchronize a virtual generated column that is not supported, set it as a STORED generated column. The following cases are confirmed unsupported:
The generated column rule involves time zones.
The generated column rule involves system functions, such as
FROM_TZandNULLIF.The generated column uses JSON functions and the
JSON PARTIAL UPDATEfeature.
Limitations for table-level recovery
OceanBase Database provides table-level recovery starting from V4.2.1. Because the underlying implementation also uses direct load, data synchronization is not supported.
When both OceanBase Database and obcdc versions are in the range [4.2.5, 4.3.0) U [4.4.2, 4.5.0) U [4.6.0, +∞), obcdc can output incremental data changes of tables after table-level recovery completes, but does not output the DDL for creating the table.
Limitations for direct load
Full direct load: obcdc does not support data that is imported without going through the transaction path.
Incremental direct load: obcdc supports this starting from V4.2.5.0.
Incremental baseline import: obcdc does not support output.
Note
Starting from OceanBase Database V4.5.0, incremental import uses the incremental baseline import mode by default.
Trusted column marking logic
In OceanBase Database V4.x, clog may not record all column information of rows for DML operations, but obcdc outputs all columns of rows for DML operations. For column values recorded in the OceanBase cluster clog, obcdc marks the column as trusted (column value from Redo logs). For columns not recorded in the OceanBase cluster clog, obcdc marks them as untrusted (column value generated by obcdc, usually NULL).
Downstream consumers of obcdc must adjust their consumption logic based on whether column values are trusted, to prevent transmitting untrusted column data downstream and causing data correctness issues.
In the message library, the VALUE_ORIGIN enum type in ValueOrigin.h represents the column value source. VALUE_ORIGIN::REDO indicates that the column value comes from Redo logs and the column is trusted. VALUE_ORIGIN::PADDING indicates that the column value is generated by obcdc and does not represent the real column value. Downstream consumers of obcdc can obtain the trusted marking of each column by parsing the m_origin field in the binlogbuf of each column.
From which version does obcdc support MOW tables
For MOW table data written by OceanBase Database versions before V4.3.5 BP5, using obcdc V4.3.5 BP5 or earlier may encounter unexpected exceptions. Using obcdc V4.3.5 BP6 or later does not output the involved OUTROW LOB data (marked as untrusted columns).
obcdc can properly process MOW table data written by OceanBase Database V4.3.5 BP5 or later.
From which version does obcdc support MINIMAL MODE
obcdc can synchronize minimal mode data written by OceanBase Database V4.3.x or later. For columns that are not involved in changes, obcdc outputs the column value as NULL and marks it as an untrusted column.