You can define rules in a control file by which OBLOADER can preprocess the data before importing it.
File template
Template for defining a control file:
lang=java
(
Column name Position of byte offsets (optional) "Preprocessing function" (optional) Mapping definition (optional),
Column name Position of byte offsets (optional) "Preprocessing function" (optional) Mapping definition (optional),
Column name Position of byte offsets (optional) "Preprocessing function" (optional) Mapping definition (optional),
Column name Position of byte offsets (optional) "Preprocessing function" (optional) Mapping definition (optional)
);
The following example shows the content of a control file:
lang=java
server=mysql|oracle (
c1 "nvl(c1, 'not null')" map(field_position),
c2 "none" map(field_position)
);
Notice
field_positionspecifies the column position of preprocessed data in the imported data file.- You can define byte offsets only for data in the
--posformat. - A control file is named in the format of <table name>.ctrl.
- Before data is imported, OBLOADER will retrieve the control file with the file name extension
.ctrland preprocess the data based on the functions configured in the control file.- If you import multiple tables with multiple control files, OBLOADER will retrieve all files with the file name extension
.ctrlin the path specified by thectl-pathoption and bind the control files to tables that you want to import based on the mappings between the control file names and table names. In this case, make sure that the name of a control file is the same as that of the corresponding table. For example, theloader1.ctrlcontrol file maps theloader1table, and theloader2.ctrlcontrol file maps theloader2table. - If you import a single table with a single control file, you can directly specify the control file name in the
--ctl-pathoption. OBLOADER will bind the specified control file to the corresponding table.
- If you import multiple tables with multiple control files, OBLOADER will retrieve all files with the file name extension
Considerations
If a column name defined in the table contains database keywords, you must add a grave accent (`) both before and after the column name to escape the column name. Example:
lang=java ( `WHEN` "lower(`WHEN`)", c2 "nanvl(c2,'0')" );In OBLOADER V3.x, the field mapping feature of map files is integrated into control files. Map files are no longer supported. Example:
lang=java( C1 "NONE" map(1), C2 "NONE" map(1), // Multi-column mapping C3 "CONSTANT('xxx')", // xxx is a specified constant. C4 "SEQUENCE(100, 1)" // 100 is the first sequence number, and 1 is the increment. );C3 "CONSTANT('xxx')"andC4 "SEQUENCE(100, 1)"are generated columns.Note
- OBLOADER versions earlier than V3.0.0 support only single-column mapping and do not support generated columns. OBLOADER V3.0.0 and later allow you to map multiple columns and generated columns in control files.
- Generated columns are columns that do not exist in data files. OBLOADER can use the following functions to preprocess generated columns:
CONSTANT,SEQUENCE, andDB_SEQUENCE. Example:C4 "SEQUENCE(100, 1)". For more information, see Preprocessing functions.
Scenarios and examples
Import data in the --cut, --csv, and --sql formats
Prepare data.
Sample code:
DROP TABLE IF EXISTS `student`; CREATE TABLE `mysql`.`loader` ( `c1` varchar(120) NULL, `c2` varchar(120) NULL, `c3` varchar(120) NULL, `c4` varchar(120) NULL, `c5` varchar(120) NULL, `c6` varchar(120) NULL, `c7` varchar(120) NULL, `c8` varchar(120) NULL, `c9` varchar(120) NULL, `c10` varchar(120) NULL, `c11` varchar(120) NULL, `c12` varchar(120) NULL, `c13` varchar(120) NULL, `c14` varchar(120) NULL, `c15` int(120) NULL, `c16` timestamp NULL, `c17` varchar(120) NULL, `c18` varchar(120) NULL, `c19` varchar(120) NULL, `c20` varchar(120) NULL, `c21` varchar(120) NULL, `c22` date NULL, `c23` varchar(120) NULL, `c24` varchar(120) NULL, `c25` varchar(120) NULL, `c26` varchar(120) NULL, `c27` varchar(120) NULL, `c28` varchar(120) NULL ) DEFAULT CHARSET = utf8mb4 COLLATE = utf8mb4_general_ci ; INSERT INTO `loader`(`c1`,`c2`,`c3`,`c4`,`c5`,`c6`,`c7`,`c8`,`c9`,`c10`,`c11`,`c12`,`c13`,`c14`,`c15`,`c16`,`c17`,`c18`,`c19`,`c20`,`c21`,`c22`,`c23`,`c24`,`c25`,`c26`,`c27`,`c28`) VALUES ('XIAOHONG', 1111111,'222222 ',123456,123,'hhhhh',1234,'abc','null',123456,'1','1','china','123',456,'2023-07-11 20:56:36','123','1','1',11,2,'2023-07-11','Abc123','Ab3','Ab3','12345Ab3','Ab312345','123456789');Define a control file. Configure a preprocessing function for each imported column in the control file.
Sample code:
lang=java ( c1 "lower(c1)" map(1), -- Converts letters in values of the c1 column to lowercase letters. c2 "ltrim(c2)" map(2), -- Truncates leading spaces for values of the c2 column. c3 "rtrim(c3)" map(3), -- Truncates trailing spaces for values of the c3 column. c4 "substr(c4,0,5)" map(4), -- Truncates a string of five characters from the first position of values in the c4 column. c5 "trim(c5)" map(5), -- Truncates the leading and trailing spaces for values of the c5 column. c6 "upper(c6)" map(6), -- Converts letters in values of the c6 column to uppercase letters. c7 "nanvl(c7,'0')" map(7), -- Verifies whether values in the c7 column are numbers and returns 0 for values that are not numbers. c8 "replace(c8,'a','A')" map(8), -- Replaces a with A in values of the c8 column. c9 "nvl(c9,'null')" map(9), -- Verifies whether values in the c9 column are null and returns a null string if a value is null. c10 "length(c10)" map(10), -- Calculates the length of values in the c10 column. c11 "lpad(c11,5,'x')" map(11), -- Prefixes values of the C11 column with a five-byte string of the letter 'x'. c12 "rpad(c12,5,'x')" map(12), -- Suffixes values of the c12 column with a five-byte string of the letter 'x'. c13 "convert(c13,'utf-8','gbk')" map(13), -- Converts GBK-encoded values in the c13 column to the UTF-8 format. c14 "concat(c14, '_suffix')" map(14), -- Concatenates values in the c14 column with a specified constant. c15 "none" map(15), -- Returns values in the c15 column without extra processing. c16 "systimestamp" map(16), -- Returns the timestamp of the current cluster as values of the c16 column. c17 "constant('1')" map(17), -- Returns constant 1 as values of the c17 column. c18 "lpadb(c18,5,'x')" map(18), -- Prefixes values in the c18 column with a five-byte string of the letter 'x'. c19 "rpadb(c19,5,'x')" map(19), -- Suffixes values in the c19 column with a five-byte string of the letter 'x'. c20 "case when length(trim(c20))<18 then 'Y' else 'N' end" map(20), -- Looks up values that meet the specified condition in the c20 column and returns the values that meet the condition. c21 "case length(trim(c21)) when '1' then 'one' when '2' then 'two' else 'unknown' end" map(21), -- Looks up values that match the specified values in the c21 column and returns the corresponding values. C22 "SYSDATE" map(22), -- Returns the current date as values of the c22 column. C23 "MASK(C23)" map(23), -- Masks values in the c23 column by replacing letters and numbers in this column with default desensitization characters (default desensitization characters are: X for uppercase letters, x for lowercase letters, and n for digits). This function takes effect only for column names. C24 "MASK_FIRST_N(C24,'A','a','b',3)" map(24), -- Specifies the desensitization characters for letters and digits in values of the c24 column. By default, N is 0, which means to start from the first character. C25 "MASK_LAST_N(C25,'A','a','b',3)" map(25), -- Specifies the desensitization characters for letters and digits in values of the c25 column. By default, N is 0, which means to start from the last character. C26 "MASK_SHOW_FIRST_N(C26,'A','a','b',3)" map(26), -- Specifies the number of characters in values of the c26 column that do not need to be desensitized. By default, N is 0, which means to start from the first character. C27 "MASK_SHOW_LAST_N(C27,'A','a','b',3)" map(27), -- Specifies the number of characters in values of the c27 column that do not need to be desensitized. By default, N is 0, which means to start from the last character. C28 "REVERSE(C28)" map(28), -- Reverses the order of characters in values of the c28 column. )Notice
- The file name extension of a control file is
.ctrl. - A control file maps a table. If you want to preprocess data of multiple tables, you must create multiple control files in the path specified by the
ctl-pathoption. The name of a control file must be the same as that of the corresponding table that you want to import.
- The file name extension of a control file is
After you use the
--ctl-pathoption to specify the absolute path of the control file for data preprocessing, import data in the--csvformat.Sample code:
$./obloader -h xx.x.x.x -P 2883 -u test -p ****** --sys-user **u*** --sys-password ****** -c cluster_a -t mysql -D USERA --csv --table '*' --ctl-path /home/controls/ -f /outputThe
–ctl-path /home/controls/option specifies the absolute path where the control file is located.Sample result:
... All Load Tasks Finished: ---------------------------------------------------------- | No.# | Type | Name | Count | Status | ---------------------------------------------------------- | 1 | TABLE | loader | 1 -> 1 | SUCCESS | ---------------------------------------------------------- Total Count: 1 End Time: 2023-07-04 07:13:01 ...Example of data to be imported:
... SELECT * FROM `loader`; +----------+---------+--------+-------+------+-------+------+------+------+------+-------+-------+-------+------------+------+---------------------+------+-------+-------+------+------+------------+--------+------+------+----------+----------+-----------+ | c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8 | c9 | c10 | c11 | c12 | c13 | c14 | c15 | c16 | c17 | c18 | c19 | c20 | c21 | c22 | c23 | c24 | c25 | c26 | c27 | c28 | +----------+---------+--------+-------+------+-------+------+------+------+------+-------+-------+-------+------------+------+---------------------+------+-------+-------+------+------+------------+--------+------+------+----------+----------+-----------+ | xiaohong | 1111111 | 222222 | 12345 | 123 | HHHHH | 1234 | Abc | null | 6 | xxxx1 | 1xxxx | china | 123_suffix | 456 | 2023-07-05 02:52:07 | 1 | xxxx1 | 1xxxx | Y | one | 2023-07-05 | Xxxnnn | Aab | Aab | 123bbAab | Aabbb345 | 987654321 | +----------+---------+--------+-------+------+-------+------+------+------+------+-------+-------+-------+------------+------+---------------------+------+-------+-------+------+------+------------+--------+------+------+----------+----------+-----------+ 1 row in set (0.04 sec) ...
Import data in the --pos format
Sample code for defining a control file:
lang=java
(
c1 position(1:5) "trim(c1)", -- Extracts the first to fifth bytes of characters from the values in the c1 column and truncates the leading and trailing spaces of the result.
c2 position(7:25) "replace(trim(c2),'',' ')" -- Extracts the seventh to twenty-fifth bytes of characters from the values in the c2 column, truncates the leading and trailing spaces of the result, and replaces the empty characters with spaces.
);
Notice
OBLOADER versions later than V3.0.0 no longer support the --null-replacer,--empty-replacer, --default-date, and --date-format options.