Feature description
OceanBase Database supports the data dictionary feature starting from V4.1.0.0. The data dictionary is a tenant-level service that persists the metadata of OceanBase cluster tenants, including log stream information and database/table metadata, to the log (clog). Currently, the primary use of the data dictionary is to enable the data link (obcdc) to obtain tenant metadata solely based on logs.
The data dictionary serves only user tenants and is persisted to the system log stream (SYS_LS) of ordinary tenants through the PALF. Once a user tenant is created, the data dictionary service automatically starts and runs on the leader replica of the SYS_LS in the tenant.
When the log archiving service is enabled, it automatically triggers an asynchronous data dictionary generation task. OceanBase Database supports scheduling periodic data dictionary generation through the tenant-level configuration parameter dump_data_dictionary_to_log_interval (supported starting from V4.1.0.0) and the DBMS_DATA_DICT system package. We recommend using the DBMS_DATA_DICT system package to schedule data dictionary generation.
Starting from V4.2.0.0, OceanBase Database supports recording data dictionary generation records through DBA/CDB views. You can query the CDB_OB_DATA_DICTIONARY_IN_LOG view under the sys tenant or the DBA_OB_DATA_DICTIONARY_IN_LOG view under an ordinary tenant to obtain the dictionary location information.
Notice
The time required to generate the data dictionary varies from seconds to hours, depending on the complexity of the tenant metadata. When setting the configuration parameter, consider the complexity of the tenant metadata. A generation interval that is too short will result in frequent dictionary generation. Since the data dictionary generation is a single-threaded task, this will cause the thread to be occupied for a long time.
During data dictionary generation, the schema service will consume additional memory. If the number of schemas in the tenant increases or the memory size of the tenant is small, the memory consumption will be more significant.
Usage
obcdc can be configured in the configuration file to specify whether to parse schema information from the data dictionary in the logs. If set to online, it queries schema information from the OBServer nodes via the SchemaService, which requires access to the OBServer nodes. If set to data_dict, it parses schema information from the data dictionary in the logs.
Notice
When synchronizing data from OceanBase Database V4.2 or later, obcdc will ignore the user's configuration and force the use of the data_dict mode. To use the online mode, you must also set the skip_ob_version_compat_check parameter to 1.
For example, you can add the following content to the configuration file. For more information about the configuration file, see Configuration file example.
meta_data_refresh_mode=data_dict
After the configuration is updated, you need to restart the obcdc host process. For example, you can use the following command to start it:
[admin@test oceanbase]$ ./bin/obcdc_tailf -f libobcdc.conf -D output.txt
In this example, the relative path to the configuration file is libobcdc.conf, and the processed output data is saved to the output.txt file. You need to replace the configuration file name and output file name with the actual ones based on your situation. For more information about the obcdc_tailf command, see obcdc_tailf.
References
For more information about the log archiving service, see Log archiving.
For more information about the CDB_OB_DATA_DICTIONARY_IN_LOG view, see oceanbase.CDB_OB_DATA_DICTIONARY_IN_LOG.
For more information about the DBA_OB_DATA_DICTIONARY_IN_LOG view, see oceanbase.DBA_OB_DATA_DICTIONARY_IN_LOG (MySQL-compatible mode) and DBA_OB_DATA_DICTIONARY_IN_LOG (Oracle-compatible mode).